[ https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181795#comment-15181795 ]
ASF GitHub Bot commented on DRILL-4184: --------------------------------------- Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/372#discussion_r55125267 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableVarLengthValuesColumn.java --- @@ -69,11 +73,16 @@ protected boolean readAndStoreValueSizeInformation() throws IOException { if ( currDefLevel == -1 ) { currDefLevel = pageReader.definitionLevels.readInteger(); } - if ( columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { + + if (columnDescriptor.getMaxDefinitionLevel() > currDefLevel) { nullsRead++; - // set length of zero, each index in the vector defaults to null so no need to set the nullability - variableWidthVector.getMutator().setValueLengthSafe( - valuesReadInCurrentPass + pageReader.valuesReadyToRead, 0); + // set length of zero, each index in the vector defaults to null so no + // need to set the nullability + if (variableWidthVector == null) { --- End diff -- I know this class hierarchy is a bit messy as is ( I am the original author of most if it, I should go back to clean it up). But we shouldn't be using the presence or absence of this field as a flag to know which class we are in. I'm > Drill does not support Parquet DECIMAL values in variable length BINARY fields > ------------------------------------------------------------------------------ > > Key: DRILL-4184 > URL: https://issues.apache.org/jira/browse/DRILL-4184 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.4.0 > Environment: Windows 7 Professional, Java 1.8.0_66 > Reporter: Dave Oshinsky > > Encoding a DECIMAL logical type in Parquet using the variable length BINARY > primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0. The > problem first surfaces with the ClassCastException shown below, but fixing > the immediate cause of the exception is not sufficient to support this > combination (DECIMAL, BINARY) in a Parquet file. > In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or > FIXED_LEN_BINARY_ARRAY. Are there any plans to support DECIMAL with variable > length BINARY? Avro definitely supports encoding DECIMAL in variable length > bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this > support in Parquet is less clear. > Selecting on a BINARY DECIMAL field in a parquet file throws an exception as > shown below (java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector). The successful query at > bottom selected on a string field in the same file. > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=70000020; > org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet > recor > d reader. > Message: Failure in setting up reader > Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr { > required binary ACCT_NO (DECIMAL(20,0)); > optional binary SF_NO (UTF8); > optional binary LF_NO (UTF8); > optional binary BRANCH_NO (DECIMAL(20,0)); > optional binary INTRO_CUST_NO (DECIMAL(20,0)); > optional binary INTRO_ACCT_NO (DECIMAL(20,0)); > optional binary INTRO_SIGN (UTF8); > optional binary TYPE (UTF8); > optional binary OPR_MODE (UTF8); > optional binary CUR_ACCT_TYPE (UTF8); > optional binary TITLE (UTF8); > optional binary CORP_CUST_NO (DECIMAL(20,0)); > optional binary APLNDT (UTF8); > optional binary OPNDT (UTF8); > optional binary VERI_EMP_NO (DECIMAL(20,0)); > optional binary VERI_SIGN (UTF8); > optional binary MANAGER_SIGN (UTF8); > optional binary CURBAL (DECIMAL(8,2)); > optional binary STATUS (UTF8); > } > , metadata: > {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace" > :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal > ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co > lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec > tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision": > 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s > ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_ > NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru > e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal > se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc > hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA > R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_ > NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale > ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math. > BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci > sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true > ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}]},{"nam > e":"INTRO_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","preci > sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla > ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab > le":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true," > cv_signed":true,"cv_subscript":5,"cv_type":2,"cv_typename":"NUMBER","cv_writable > ":true}]},{"name":"INTRO_ACCT_NO","type":["null",{"type":"bytes","logicalType":" > decimal","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false > ,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":f > alse,"cv_nullable":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_sea > rchable":true,"cv_signed":true,"cv_subscript":6,"cv_type":2,"cv_typename":"NUMBE > R","cv_writable":true}]},{"name":"INTRO_SIGN","type":["null",{"type":"string","c > v_auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String" > ,"cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"c > v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr > ipt":7,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"TYPE > ","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true, > "cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":false > ,"cv_nullable":1,"cv_precision":2,"cv_read_only":false,"cv_scale":0,"cv_searchab > le":true,"cv_signed":true,"cv_subscript":8,"cv_type":12,"cv_typename":"VARCHAR2" > ,"cv_writable":true}]},{"name":"OPR_MODE","type":["null",{"type":"string","cv_au > to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv > _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":2,"cv_re > ad_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript" > :9,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"CUR_ACCT > _TYPE","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive": > true,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable": > false,"cv_nullable":1,"cv_precision":4,"cv_read_only":false,"cv_scale":0,"cv_sea > rchable":true,"cv_signed":true,"cv_subscript":10,"cv_type":12,"cv_typename":"VAR > CHAR2","cv_writable":true}]},{"name":"TITLE","type":["null",{"type":"string","cv > _auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String", > "cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":30,"c > v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr > ipt":11,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"COR > P_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20 > ,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"jav > a.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"c > v_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signe > d":true,"cv_subscript":12,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true} > ]},{"name":"APLNDT","type":["null",{"type":"string","cv_auto_incr":false,"cv_cas > e_sensitive":false,"cv_column_class":"java.sql.Timestamp","cv_currency":false,"c > v_def_writable":false,"cv_nullable":1,"cv_precision":0,"cv_read_only":false,"cv_ > scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":13,"cv_type":93,"c > v_typename":"DATE","cv_writable":true}]},{"name":"OPNDT","type":["null",{"type": > "string","cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java. > sql.Timestamp","cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_p > recision":0,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":t > rue,"cv_subscript":14,"cv_type":93,"cv_typename":"DATE","cv_writable":true}]},{" > name":"VERI_EMP_NO","type":["null",{"type":"bytes","logicalType":"decimal","prec > ision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cl > ass":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nulla > ble":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true, > "cv_signed":true,"cv_subscript":15,"cv_type":2,"cv_typename":"NUMBER","cv_writab > le":true}]},{"name":"VERI_SIGN","type":["null",{"type":"string","cv_auto_incr":f > alse,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv_currency" > :false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_read_only":f > alse,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":16,"cv_ty > pe":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"MANAGER_SIGN","ty > pe":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,"cv_c > olumn_class":"java.lang.String","cv_currency":false,"cv_def_writable":false,"cv_ > nullable":1,"cv_precision":1,"cv_read_only":false,"cv_scale":0,"cv_searchable":t > rue,"cv_signed":true,"cv_subscript":17,"cv_type":12,"cv_typename":"VARCHAR2","cv > _writable":true}]},{"name":"CURBAL","type":["null",{"type":"bytes","logicalType" > :"decimal","precision":8,"scale":2,"cv_auto_incr":false,"cv_case_sensitive":fals > e,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable": > false,"cv_nullable":1,"cv_precision":8,"cv_read_only":false,"cv_scale":2,"cv_sea > rchable":true,"cv_signed":true,"cv_subscript":18,"cv_type":2,"cv_typename":"NUMB > ER","cv_writable":true}]},{"name":"STATUS","type":["null",{"type":"string","cv_a > uto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","c > v_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_r > ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript > ":19,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]}]}}}, blocks: > [B > lockMetaData{10, 1281 [ColumnMetaData{SNAPPY [ACCT_NO] BINARY [BIT_PACKED, > PLAI > N], 4}, ColumnMetaData{SNAPPY [SF_NO] BINARY [RLE, BIT_PACKED, > PLAIN_DICTIONARY > ], 88}, ColumnMetaData{SNAPPY [LF_NO] BINARY [RLE, BIT_PACKED, > PLAIN_DICTIONARY > ], 163}, ColumnMetaData{SNAPPY [BRANCH_NO] BINARY [RLE, BIT_PACKED, > PLAIN_DICTI > ONARY], 241}, ColumnMetaData{SNAPPY [INTRO_CUST_NO] BINARY [RLE, BIT_PACKED, > PL > AIN_DICTIONARY], 298}, ColumnMetaData{SNAPPY [INTRO_ACCT_NO] BINARY [RLE, > BIT_P > ACKED, PLAIN_DICTIONARY], 364}, ColumnMetaData{SNAPPY [INTRO_SIGN] BINARY > [RLE, > BIT_PACKED, PLAIN_DICTIONARY], 421}, ColumnMetaData{SNAPPY [TYPE] BINARY > [RLE, > BIT_PACKED, PLAIN_DICTIONARY], 478}, ColumnMetaData{SNAPPY [OPR_MODE] BINARY > [ > RLE, BIT_PACKED, PLAIN_DICTIONARY], 538}, ColumnMetaData{SNAPPY > [CUR_ACCT_TYPE] > BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 598}, ColumnMetaData{SNAPPY > [TITLE] > BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 658}, ColumnMetaData{SNAPPY > [CORP_ > CUST_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 736}, > ColumnMetaData{SNAPP > Y [APLNDT] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 802}, > ColumnMetaData{SNA > PPY [OPNDT] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 919}, > ColumnMetaData{SN > APPY [VERI_EMP_NO] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1036}, > ColumnMet > aData{SNAPPY [VERI_SIGN] BINARY [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1093}, > Col > umnMetaData{SNAPPY [MANAGER_SIGN] BINARY [RLE, BIT_PACKED, > PLAIN_DICTIONARY], 1 > 150}, ColumnMetaData{SNAPPY [CURBAL] BINARY [RLE, BIT_PACKED, > PLAIN_DICTIONARY] > , 1207}, ColumnMetaData{SNAPPY [STATUS] BINARY [RLE, BIT_PACKED, > PLAIN_DICTIONA > RY], 1270}]}]} > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader > .handleAndRaise(ParquetRecordReader.java:346) > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader > .setup(ParquetRecordReader.java:339) > at > org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:1 > 01) > at > org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch( > ParquetScanBatchCreator.java:168) > at > org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch( > ParquetScanBatchCreator.java:56) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:151) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr > eator.java:131) > at > org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat > or.java:174) > at > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreat > or.java:105) > at > org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.j > ava:79) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExec > utor.java:230) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable > .java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. > java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor > .java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector > at > org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColu > mn.<init>(VarLengthValuesColumn.java:44) > at > org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnRead > ers$Decimal28Column.<init>(VarLengthColumnReaders.java:52) > at > org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory > .getReader(ColumnReaderFactory.java:178) > at > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader > .setup(ParquetRecordReader.java:319) > ... 22 more > Error: SYSTEM ERROR: ClassCastException: > org.apache.drill.exec.vector.Decimal28S > parseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector > Fragment 0:0 > [Error Id: 22bfa8dd-1129-4300-9449-409e96d6c800 on > DaveOshinsky-PC.gp.cv.commvau > lt.com:31010] (state=,code=0) > 0: jdbc:drill:zk=local> select count(*) from > dfs.`c:/dao/DBArchivePredictor/tenr > ows.parquet` where opr_mode='JO'; > +---------+ > | EXPR$0 | > +---------+ > | 10 | > +---------+ > 1 row selected (0.406 seconds) > 0: jdbc:drill:zk=local> > The immediate cause of this exception is that Drill, in > org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader, > assumes that all BINARY values are encoded in VariableWidthVectors. For > BINARY DECIMAL, this is not true, as for example Decimal28SparseVector is a > FixedWidthVector, not a VariableWidthVector. The assumption that DECIMAL is > not encoded in variable length BINARY is found in a number of other places in > the Drill code, including: > org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory only > contains logic to handle DECIMAL with INT32, INT64, INT96, or > FIXED_LEN_BYTE_ARRAY. BINARY is not supported with DECIMAL. > org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders > does not support a nullable reader for BINARY in getNullableColumnReader > method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)