On 10/07/2014 05:32 AM, Mickaël Lacour wrote:
Hi,I can implement the method addNull is the recordConsumer(public void addNull()), and But If I do this, I have an issue when I'm reading the value again. This is normal because I'm trying to read an INT where I have an EOF (because I didn't have a way to say : skip it, it's null) Caused by: parquet.io.ParquetDecodingException: Can't read value in column [lstint, bag, array_element] INT32 at value 2 out of 2, 2 out of 2 in currentPage. repetition level: 1, definition level: 3 at parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466) at parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:368) at parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:400) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173) ... 23 more Caused by: parquet.io.ParquetDecodingException: could not read int [...] Caused by: java.io.EOFException at parquet.bytes.LittleEndianDataInputStream.readInt(LittleEndianDataInputStream.java:352) The thing is , how I am suppose to read a non existing value ? Do you think we could add this feature ? (having null value inside an array) ? -- Mickaël Lacour Senior Software Engineer Analytics Infrastructure team @Scalability
I think the problem is that the definition level indicates that there should be a value, but there isn't one. Isn't the definition level used to encode that a value is null? So if I have a required array with nullable elements, the definition level is either 0 (null) or 1 (not null). The fact that the definition level is there indicates that there is a value, but the only time a value is read is when the definition level equals the maximum definition level for a field. (Someone correct me if this is wrong!)
Then the repetition level, in this example also 0 or 1, indicates whether the value goes in a new list or an existing list.
So I think that the problem you're hitting is caused by not quite implementing the addNull correctly. It should produce a record with the definition level less than the maximum.
rb -- Ryan Blue Software Engineer Cloudera, Inc.
