Eric Gorelik created PARQUET-1882:
-------------------------------------

             Summary: Writing an all-null column and then reading it with 
buffered_stream aborts the process
                 Key: PARQUET-1882
                 URL: https://issues.apache.org/jira/browse/PARQUET-1882
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
    Affects Versions: cpp-1.5.0
         Environment: Windows 10 64-bit, MSVC
            Reporter: Eric Gorelik


When writing a column unbuffered that contains only nulls, a 0-byte dictionary 
page gets written. When then reading the resulting file with buffered_stream 
enabled, the column reader gets the length of the page (which is 0), and then 
tries to read that many bytes from the underlying input stream.

parquet/column_reader.cc, SerializedPageReader::NextPage

 
{code:java}
int compressed_len = current_page_header_.compressed_page_size;
int uncompressed_len = current_page_header_.uncompressed_page_size;

// Read the compressed data page.
std::shared_ptr<Buffer> page_buffer;
PARQUET_THROW_NOT_OK(stream_->Read(compressed_len, &page_buffer));{code}
 

BufferedInputStream::Read, however, has an assertion that the bytes to read is 
strictly positive, so the assertion fails and aborts the process.

arrow/io/buffered.cc, BufferedInputStream::Impl

 
{code:java}
Status Read(int64_t nbytes, int64_t* bytes_read, void* out) {        
  ARROW_CHECK_GT(nbytes, 0);
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to