[ 
https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408689#comment-15408689
 ] 

ASF GitHub Bot commented on DRILL-4704:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/517
  
    The proximate cause of this particular issue is the mechanism by which the 
cast function is being generated. As others have noted, the code is passing 
through lines 316-316. My guess is that the materializer is working out how to 
call this function:
    
    Decimal28SparseFunctions.Decimal28SparseEqual( Decimal28Sparse left, 
Decimal28Sparse right )
    The materializer sees: D28SEqual( Decimal28Sparse, Integer )
    
    The code wants to convert the Integer to a Decimal28Sparse. It asks the 
argument for its precision and scale. But, since the argument just specifies 
the "minor type" (SQL type), it does not have the extended attributes of 
precision and scale. Thus, the annotations in the function definition are the 
ultimate cause of the precision being 0:
    
    @Param Decimal28SparseHolder left;
    @Param Decimal28SparseHolder right;
    
    Now, we might argue that the code should be using, as the target (p, s) the 
(p,s) of the left argument (the one that is already a decimal). But, the 
materializer does not know that this is an operator; it treats all functions 
the same (operators and otherwise.) So, the materializer does not know that we 
should treat arguments in this way. All it knows is that it has an Integer and 
wants a D28S (of, presumably, any p & s).
    
    Next, we can ask how DS28Equal handles the (p,s) of the two arguments: must 
they be the same or can they be different? Looking at the code (as best as I 
can tell), the code will align values with different (p,s) values. Thus, say, 
values of Decimal28( 10, 2 ) and Decimal28( 8, 0 ) will compare properly.
    
    So, back to the cast from Integer to D28S. As Dave suggests, we can 
interpret (precision == 0) to mean "you figure it out": we can set (p,s) way we 
want.
    
    Dave's fix computes the precision based on the actual integer value. But, 
since the comparison can handle any valid (p,s), we might as well set them to a 
constant. Since the maximum Integer value is 2 billion, we can just use a 
constant (10,0).
    
    If the selected precision is constant, then the cost is low and the 
solution is fine for both for constant integers (as in the original case), and 
for per-row conversions (as in Aman's concern.)


> select statement behavior is inconsistent for decimal values in parquet
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4704
>                 URL: https://issues.apache.org/jira/browse/DRILL-4704
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: Windows 7 Pro, Java 1.8.0_91
>            Reporter: Dave Oshinsky
>             Fix For: Future
>
>
> A select statement that searches a parquet file for a decimal value matching 
> a specific value behaves inconsistently.  The query expressed most simply 
> finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | 
> HIRE_DATE |
> +--------------+-------------+------------+--------+---------------+-----------+
> +--------------+-------------+------------+--------+---------------+-----------+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the 
> following (using between instead of '=', changing 100 to 100.0, or casting as 
> decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id between 100 and 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100.0;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> cast(employee_id AS DECIMAL) = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       
> HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file:           file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator:        parquet-mr version 1.8.1 (build 
> 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .....
> file schema:    HR.EMPLOYEES
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:    REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME:     OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME:      REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL:          REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE:      REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID:         REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY:         OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID:     OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:    RC:107 TS:9943 OFFSET:4
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:     FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 
> VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME:      BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME:       BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 
> ENC:PLAIN,BIT_PACKED
> EMAIL:           BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 
> ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER:    BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 
> ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE:       BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 
> ENC:PLAIN_DICTIONARY,BIT_PACKED
> JOB_ID:          BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 
> ENC:PLAIN_DICTIONARY,BIT_PACKED
> SALARY:          FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> COMMISSION_PCT:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> MANAGER_ID:      FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED
> DEPARTMENT_ID:   FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 
> VC:107 ENC:PLAIN,RLE,BIT_PACKED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to