On 10/07/2014 05:32 AM, Mickaël Lacour wrote:
Hi,

I can implement the method addNull is the recordConsumer(public void 
addNull()), and
But If I do this, I have an issue when I'm reading the value again. This is 
normal because I'm trying to read an INT where I have an EOF (because I didn't 
have a way to say : skip it, it's null)

Caused by: parquet.io.ParquetDecodingException: Can't read value in column 
[lstint, bag, array_element] INT32 at value 2 out of 2, 2 out of 2 in 
currentPage. repetition level: 1, definition level: 3
         at 
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
         at 
parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:368)
         at 
parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:400)
         at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
         ... 23 more
Caused by: parquet.io.ParquetDecodingException: could not read int
[...]
Caused by: java.io.EOFException
         at 
parquet.bytes.LittleEndianDataInputStream.readInt(LittleEndianDataInputStream.java:352)

The thing is , how I am suppose to read a non existing value ? Do you think we 
could add this feature ? (having null value inside an array) ?
--
Mickaël Lacour
Senior Software Engineer
Analytics Infrastructure team @Scalability

I think the problem is that the definition level indicates that there should be a value, but there isn't one. Isn't the definition level used to encode that a value is null? So if I have a required array with nullable elements, the definition level is either 0 (null) or 1 (not null). The fact that the definition level is there indicates that there is a value, but the only time a value is read is when the definition level equals the maximum definition level for a field. (Someone correct me if this is wrong!)

Then the repetition level, in this example also 0 or 1, indicates whether the value goes in a new list or an existing list.

So I think that the problem you're hitting is caused by not quite implementing the addNull correctly. It should produce a record with the definition level less than the maximum.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to