[I] Join operator lost data due to decimal join key. [gluten]

via GitHub Thu, 23 Apr 2026 03:03:29 -0700


beliefer opened a new issue, #11980:
URL: https://github.com/apache/gluten/issues/11980


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   There are two tables: A and B. 
   
   ```
   // The main schema of A
   CREATE TABLE A (
     invoiceid DECIMAL(20,0),
     time TIMESTAMP,
     update_time TIMESTAMP,
     dt STRING,
     hour STRING)
   USING orc
   PARTITIONED BY (dt, hour)
   LOCATION 'hdfs://test/A'
   TBLPROPERTIES (
     'transient_lastDdlTime'='1718612895')
   ```
   
   ```
   // The main schema of B.
   CREATE TABLE B (
     `invoiceid` DECIMAL(20,0),
     `status` INT,
     `update_time` TIMESTAMP,
     `dt` STRING,
     `hour` STRING)
   USING orc
   PARTITIONED BY (dt, hour)
   LOCATION 'hdfs://test/B'
   TBLPROPERTIES (
     'transient_lastDdlTime'='1718612913')
   ```
   
   ```
   // The query info.
   SELECT count(*)
   FROM (
      SELECT c.invoiceid AS invoice_id,
          c.orderid AS order_id,
          c.time AS order_time,
          c.update_time AS update_time
   FROM
       (SELECT *
        FROM
            (SELECT *,
                    ROW_NUMBER() OVER (PARTITION BY invoiceid
                                       ORDER BY update_time DESC) AS rn
             FROM A
             WHERE dt >= '20260421') b
        WHERE rn = 1) c
   JOIN
       (SELECT *
        FROM B
        WHERE dt >= '20260421'
            AND status = 7) d ON c.invoiceid = d.invoiceid
   );
   ```
   
   Scan operator of A is Spark Scan hive due to the time and update_time are 
TIMESTAMP.
   Scan operator of B is Gluten NativeScan hive.
   
   ### Gluten version
   
   Gluten-1.5
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Join operator lost data due to decimal join key. [gluten]

Reply via email to