[ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408748#comment-15408748 ]
ASF GitHub Bot commented on DRILL-4704: --------------------------------------- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/517 The plot thickens. I tried the fix of setting the precision to a constant of 10. This uncovered a larger issue. The template in question generates cast functions for (INTEGER, BIGINT) x (DECIMAL9, DECIMAL18, DECIMAL28) and perhaps others. The constant of 10 does not, of course, work for BIGINT (long) values. The trick is that precision=10 won't work for DECIMAL9 either. Dave's solution has a similar problem. Dave sets the precision to whatever is right for the input value, which seems great. But, that value could be too large for the output DECIMAL type. What we need is to set the precision to the min( max int precision, max decimal precision ). Or, if we use Dave's proposed solution, max( input arg precision, max decimal precision ). In either case, the code must handle overflow. Passing a Long.MAX_VALUE or even Integer.MAX_VALUE to CastBigIntDecimal9( ) should cause an overflow error or data truncation. I'll research how that worked previously to see if we've uncovered a new issue, or if a solution already exists. > select statement behavior is inconsistent for decimal values in parquet > ----------------------------------------------------------------------- > > Key: DRILL-4704 > URL: https://issues.apache.org/jira/browse/DRILL-4704 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.6.0 > Environment: Windows 7 Pro, Java 1.8.0_91 > Reporter: Dave Oshinsky > Fix For: Future > > > A select statement that searches a parquet file for a decimal value matching > a specific value behaves inconsistently. The query expressed most simply > finds nothing: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIRE_DATE | > +--------------+-------------+------------+--------+---------------+-----------+ > +--------------+-------------+------------+--------+---------------+-----------+ > No rows selected (0.348 seconds) > The query can be modified to find the matching row in a few ways, such as the > following (using between instead of '=', changing 100 to 100.0, or casting as > decimal: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id between 100 and 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.226 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100.0; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.259 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > cast(employee_id AS DECIMAL) = 100; > +--------------+-------------+------------+--------+---------------+-----------+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--------------+-------------+------------+--------+---------------+-----------+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--------------+-------------+------------+--------+---------------+-----------+ > 1 row selected (0.232 seconds) > 0: jdbc:drill:zk=local> > The schema of the parquet data that is being searched is as follows: > $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet > file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet > creator: parquet-mr version 1.8.1 (build > 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) > ..... > file schema: HR.EMPLOYEES > -------------------------------------------------------------------------------- > EMPLOYEE_ID: REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0 > FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1 > LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0 > EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0 > PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1 > HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0 > JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0 > SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1: RC:107 TS:9943 OFFSET:4 > -------------------------------------------------------------------------------- > EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 > VC:107 ENC:PLAIN,BIT_PACKED > FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 > ENC:PLAIN,BIT_PACKED > EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 > ENC:PLAIN,BIT_PACKED > PHONE_NUMBER: BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 > ENC:PLAIN,RLE,BIT_PACKED > HIRE_DATE: BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 > ENC:PLAIN_DICTIONARY,BIT_PACKED > JOB_ID: BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 > ENC:PLAIN_DICTIONARY,BIT_PACKED > SALARY: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > COMMISSION_PCT: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > MANAGER_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 > VC:107 ENC:PLAIN,RLE,BIT_PACKED > DEPARTMENT_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 > VC:107 ENC:PLAIN,RLE,BIT_PACKED -- This message was sent by Atlassian JIRA (v6.3.4#6332)