[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182339#comment-15182339
 ] 

ASF GitHub Bot commented on DRILL-4184:
---------------------------------------

Github user daveoshinsky commented on the pull request:

    https://github.com/apache/drill/pull/372#issuecomment-192999951
  
    Regarding the overall intent of the fix, as the "TODO" comment on 
decimalLengths implies, it's intended only as a short-term fix.  More 
long-term, I would suggest that decimal values should be stored in a 
VariableWidthVector (which was assumed by VarLenghValuesColumn, hence the class 
cast exception).  This would use memory more efficiently when most values are 
far smaller than full precision, as is often the case (think 
java.math.BigDecimal, which operates this way).  Moreover, there would be no 
need to have a whole bunch of separate (generated) classes for different 
decimal precisions.  Just one class, variable width, handling any precision.  I 
also suggest that some other "special cases" could be combined.  Fixed width is 
a special case of variable width, where there's no need to store a separate 
length for each value.  Non-nullable is a special case of nullable, where 
there's no need to store a nullable boolean (or equivalent) for each value.  
One last bit of feedback - it would be much easier to maintain the code if it 
did not involve generation of code (freemarker).  An old-fashioned class 
hierarchy, with no generated code, would probably work just fine for the 
vectoring mechanisms.


> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-4184
>                 URL: https://issues.apache.org/jira/browse/DRILL-4184
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.4.0
>         Environment: Windows 7 Professional, Java 1.8.0_66
>            Reporter: Dave Oshinsky
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=70000020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math.
> BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci
> sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true
> ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}]},{"nam
> e":"INTRO_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","preci
> sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla
> ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab
> le":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"
> cv_signed":true,"cv_subscript":5,"cv_type":2,"cv_typename":"NUMBER","cv_writable
> ":true}]},{"name":"INTRO_ACCT_NO","type":["null",{"type":"bytes","logicalType":"
> decimal","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false
> ,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":f
> alse,"cv_nullable":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":6,"cv_type":2,"cv_typename":"NUMBE
> R","cv_writable":true}]},{"name":"INTRO_SIGN","type":["null",{"type":"string","c
> v_auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String"
> ,"cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"c
> v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr
> ipt":7,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"TYPE
> ","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,
> "cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":false
> ,"cv_nullable":1,"cv_precision":2,"cv_read_only":false,"cv_scale":0,"cv_searchab
> le":true,"cv_signed":true,"cv_subscript":8,"cv_type":12,"cv_typename":"VARCHAR2"
> ,"cv_writable":true}]},{"name":"OPR_MODE","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":2,"cv_re
> ad_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript"
> :9,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"CUR_ACCT
> _TYPE","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":
> true,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":
> false,"cv_nullable":1,"cv_precision":4,"cv_read_only":false,"cv_scale":0,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":10,"cv_type":12,"cv_typename":"VAR
> CHAR2","cv_writable":true}]},{"name":"TITLE","type":["null",{"type":"string","cv
> _auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String",
> "cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":30,"c
> v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr
> ipt":11,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"COR
> P_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20
> ,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"jav
> a.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"c
> v_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signe
> d":true,"cv_subscript":12,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}
> ]},{"name":"APLNDT","type":["null",{"type":"string","cv_auto_incr":false,"cv_cas
> e_sensitive":false,"cv_column_class":"java.sql.Timestamp","cv_currency":false,"c
> v_def_writable":false,"cv_nullable":1,"cv_precision":0,"cv_read_only":false,"cv_
> scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":13,"cv_type":93,"c
> v_typename":"DATE","cv_writable":true}]},{"name":"OPNDT","type":["null",{"type":
> "string","cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.
> sql.Timestamp","cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_p
> recision":0,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":t
> rue,"cv_subscript":14,"cv_type":93,"cv_typename":"DATE","cv_writable":true}]},{"
> name":"VERI_EMP_NO","type":["null",{"type":"bytes","logicalType":"decimal","prec
> ision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cl
> ass":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nulla
> ble":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,
> "cv_signed":true,"cv_subscript":15,"cv_type":2,"cv_typename":"NUMBER","cv_writab
> le":true}]},{"name":"VERI_SIGN","type":["null",{"type":"string","cv_auto_incr":f
> alse,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv_currency"
> :false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_read_only":f
> alse,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":16,"cv_ty
> pe":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"MANAGER_SIGN","ty
> pe":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,"cv_c
> olumn_class":"java.lang.String","cv_currency":false,"cv_def_writable":false,"cv_
> nullable":1,"cv_precision":1,"cv_read_only":false,"cv_scale":0,"cv_searchable":t
> rue,"cv_signed":true,"cv_subscript":17,"cv_type":12,"cv_typename":"VARCHAR2","cv
> _writable":true}]},{"name":"CURBAL","type":["null",{"type":"bytes","logicalType"
> :"decimal","precision":8,"scale":2,"cv_auto_incr":false,"cv_case_sensitive":fals
> e,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":
> false,"cv_nullable":1,"cv_precision":8,"cv_read_only":false,"cv_scale":2,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":18,"cv_type":2,"cv_typename":"NUMB
> ER","cv_writable":true}]},{"name":"STATUS","type":["null",{"type":"string","cv_a
> uto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","c
> v_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":19,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]}]}}}, blocks: 
> [B
> lockMetaData{10, 1281 [ColumnMetaData{SNAPPY [ACCT_NO] BINARY  [BIT_PACKED, 
> PLAI
> N], 4}, ColumnMetaData{SNAPPY [SF_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY
> ], 88}, ColumnMetaData{SNAPPY [LF_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY
> ], 163}, ColumnMetaData{SNAPPY [BRANCH_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTI
> ONARY], 241}, ColumnMetaData{SNAPPY [INTRO_CUST_NO] BINARY  [RLE, BIT_PACKED, 
> PL
> AIN_DICTIONARY], 298}, ColumnMetaData{SNAPPY [INTRO_ACCT_NO] BINARY  [RLE, 
> BIT_P
> ACKED, PLAIN_DICTIONARY], 364}, ColumnMetaData{SNAPPY [INTRO_SIGN] BINARY  
> [RLE,
>  BIT_PACKED, PLAIN_DICTIONARY], 421}, ColumnMetaData{SNAPPY [TYPE] BINARY  
> [RLE,
>  BIT_PACKED, PLAIN_DICTIONARY], 478}, ColumnMetaData{SNAPPY [OPR_MODE] BINARY 
>  [
> RLE, BIT_PACKED, PLAIN_DICTIONARY], 538}, ColumnMetaData{SNAPPY 
> [CUR_ACCT_TYPE]
> BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 598}, ColumnMetaData{SNAPPY 
> [TITLE]
>  BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 658}, ColumnMetaData{SNAPPY 
> [CORP_
> CUST_NO] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 736}, 
> ColumnMetaData{SNAPP
> Y [APLNDT] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 802}, 
> ColumnMetaData{SNA
> PPY [OPNDT] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 919}, 
> ColumnMetaData{SN
> APPY [VERI_EMP_NO] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1036}, 
> ColumnMet
> aData{SNAPPY [VERI_SIGN] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1093}, 
> Col
> umnMetaData{SNAPPY [MANAGER_SIGN] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY], 1
> 150}, ColumnMetaData{SNAPPY [CURBAL] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY]
> , 1207}, ColumnMetaData{SNAPPY [STATUS] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONA
> RY], 1270}]}]}
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .handleAndRaise(ParquetRecordReader.java:346)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .setup(ParquetRecordReader.java:339)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:1
> 01)
>         at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
> ParquetScanBatchCreator.java:168)
>         at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
> ParquetScanBatchCreator.java:56)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:151)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreat
> or.java:105)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.j
> ava:79)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExec
> utor.java:230)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable
> .java:38)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColu
> mn.<init>(VarLengthValuesColumn.java:44)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnRead
> ers$Decimal28Column.<init>(VarLengthColumnReaders.java:52)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
> .getReader(ColumnReaderFactory.java:178)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .setup(ParquetRecordReader.java:319)
>         ... 22 more
> Error: SYSTEM ERROR: ClassCastException: 
> org.apache.drill.exec.vector.Decimal28S
> parseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector
> Fragment 0:0
> [Error Id: 22bfa8dd-1129-4300-9449-409e96d6c800 on 
> DaveOshinsky-PC.gp.cv.commvau
> lt.com:31010] (state=,code=0)
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenr
> ows.parquet` where opr_mode='JO';
> +---------+
> | EXPR$0  |
> +---------+
> | 10      |
> +---------+
> 1 row selected (0.406 seconds)
> 0: jdbc:drill:zk=local>
> The immediate cause of this exception is that Drill, in 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader, 
> assumes that all BINARY values are encoded in VariableWidthVectors.  For 
> BINARY DECIMAL, this is not true, as for example Decimal28SparseVector is a 
> FixedWidthVector, not a VariableWidthVector.   The assumption that DECIMAL is 
> not encoded in variable length BINARY is found in a number of other places in 
> the Drill code, including:
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory only 
> contains logic to handle DECIMAL with INT32, INT64, INT96, or 
> FIXED_LEN_BYTE_ARRAY.  BINARY is not supported with DECIMAL.
> org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders
>  does not support a nullable reader for BINARY in getNullableColumnReader 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to