Hi, I've been trying to read data from a Parquet file into a stream using the Parquet::StreamReader class for a while. The first column of my data consists of int64s - thus, I have been streaming data as follows:
shared_ptr<arrow::io::ReadableFile> infile; PARQUET_ASSIGN_OR_THROW(infile, arrow::io::ReadableFile::Open(datapath)); parquet::StreamReader stream{ parquet::ParquetFileReader::Open(infile) }; int64_t c1; while (!stream.eof()) { stream >> c1; stream.SkipColumns(100); stream >> parquet::EndRow; cout << c1 << endl; My code throws a ParquetException in the CheckColumn() function when comparing length and node->type_length() [stream_reader.cc, Line 543]: if (length != node->type_length()) { throw ParquetException("Column length mismatch. Column '" + node->name() + "' has length " + std::to_string(node->type_length()) + "] not " + std::to_string(length)); } I figured out that this was because there are empty data fields in my parquet, meaning length is 0 but node->type_length() is 64. I've looked all over the internet trying to find a way to properly handle empty values in parquet files using Arrow, but have had no luck. Is there a way to check if a data field is empty for a Parquet::StreamReader object, or some other way to manage empty fields? Any help would be appreciated.