[ https://issues.apache.org/jira/browse/DRILL-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581620#comment-16581620 ]
salim achouche commented on DRILL-6685: --------------------------------------- Fixed a regression when addressing DRILL-6570: * When fixing DRILL-6570, we unified a bulk entry's max-values so that a false-positive (from fixed length to variable length) could happen smoothly * The regression was that the fixed length algorithm was relying on the previous bulk-entry max-value constraint Fix - * I have re-introduced the constraint within the fixed-length reader * Added a test-suite using [@robert|https://github.com/robert] Hou parquet data to prevent such regressions > Error in parquet record reader > ------------------------------ > > Key: DRILL-6685 > URL: https://issues.apache.org/jira/browse/DRILL-6685 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.14.0 > Reporter: Robert Hou > Assignee: salim achouche > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: drillbit.log.6685 > > > This is the query: > select VarbinaryValue1 from > dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet` limit > 36; > It appears to be caused by this commit: > DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader > aee899c1b26ebb9a5781d280d5a73b42c273d4d5 > This is the stack trace: > {noformat} > Error: INTERNAL_ERROR ERROR: Error in parquet record reader. > Message: > Hadoop path: > /drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB.parquet/0_0_0.parquet > Total records read: 0 > Row group index: 0 > Records in row group: 1250 > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root { > optional int64 Index; > optional binary VarbinaryValue1; > optional int64 BigIntValue; > optional boolean BooleanValue; > optional int32 DateValue (DATE); > optional float FloatValue; > optional binary VarcharValue1 (UTF8); > optional double DoubleValue; > optional int32 IntegerValue; > optional int32 TimeValue (TIME_MILLIS); > optional int64 TimestampValue (TIMESTAMP_MILLIS); > optional binary VarbinaryValue2; > optional fixed_len_byte_array(12) IntervalYearValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalDayValue (INTERVAL); > optional fixed_len_byte_array(12) IntervalSecondValue (INTERVAL); > optional binary VarcharValue2 (UTF8); > } > , metadata: {drill-writer.version=2, drill.version=1.14.0-SNAPSHOT}}, blocks: > [BlockMetaData{1250, 23750308 [ColumnMetaData{UNCOMPRESSED [Index] optional > int64 Index [PLAIN, RLE, BIT_PACKED], 4}, ColumnMetaData{UNCOMPRESSED > [VarbinaryValue1] optional binary VarbinaryValue1 [PLAIN, RLE, BIT_PACKED], > 10057}, ColumnMetaData{UNCOMPRESSED [BigIntValue] optional int64 BigIntValue > [PLAIN, RLE, BIT_PACKED], 8174655}, ColumnMetaData{UNCOMPRESSED > [BooleanValue] optional boolean BooleanValue [PLAIN, RLE, BIT_PACKED], > 8179722}, ColumnMetaData{UNCOMPRESSED [DateValue] optional int32 DateValue > (DATE) [PLAIN, RLE, BIT_PACKED], 8179916}, ColumnMetaData{UNCOMPRESSED > [FloatValue] optional float FloatValue [PLAIN, RLE, BIT_PACKED], 8184959}, > ColumnMetaData{UNCOMPRESSED [VarcharValue1] optional binary VarcharValue1 > (UTF8) [PLAIN, RLE, BIT_PACKED], 8190002}, ColumnMetaData{UNCOMPRESSED > [DoubleValue] optional double DoubleValue [PLAIN, RLE, BIT_PACKED], > 10230058}, ColumnMetaData{UNCOMPRESSED [IntegerValue] optional int32 > IntegerValue [PLAIN, RLE, BIT_PACKED], 10240111}, > ColumnMetaData{UNCOMPRESSED [TimeValue] optional int32 TimeValue > (TIME_MILLIS) [PLAIN, RLE, BIT_PACKED], 10245154}, > ColumnMetaData{UNCOMPRESSED [TimestampValue] optional int64 TimestampValue > (TIMESTAMP_MILLIS) [PLAIN, RLE, BIT_PACKED], 10250197}, > ColumnMetaData{UNCOMPRESSED [VarbinaryValue2] optional binary VarbinaryValue2 > [PLAIN, RLE, BIT_PACKED], 10260250}, ColumnMetaData{UNCOMPRESSED > [IntervalYearValue] optional fixed_len_byte_array(12) IntervalYearValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19632385}, ColumnMetaData{UNCOMPRESSED > [IntervalDayValue] optional fixed_len_byte_array(12) IntervalDayValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19647446}, ColumnMetaData{UNCOMPRESSED > [IntervalSecondValue] optional fixed_len_byte_array(12) IntervalSecondValue > (INTERVAL) [PLAIN, RLE, BIT_PACKED], 19662507}, ColumnMetaData{UNCOMPRESSED > [VarcharValue2] optional binary VarcharValue2 (UTF8) [PLAIN, RLE, > BIT_PACKED], 19677568}]}]} > Fragment 0:0 > [Error Id: 25852cdb-3217-4041-9743-66e9f3a2fbe4 on qa-node186.qa.lab:31010] > (state=,code=0) > {noformat} > Table can be found in 10.10.100.186:/tmp/fourvarchar_asc_nulls_16MB.parquet > sys.version is: > 1.15.0-SNAPSHOT a05f17d6fcd80f0d21260d3b1074ab895f457bac Changed > PROJECT_OUTPUT_BATCH_SIZE to System + Session 30.07.2018 @ 17:12:53 PDT > r...@mapr.com 30.07.2018 @ 17:25:21 PDT^M > fourvarchar_asc_nulls70.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)