jessiedanwang opened a new issue, #6188:
URL: https://github.com/apache/iceberg/issues/6188

   ### Query engine
   
   Spark
   
   ### Question
   
   I am wondering if it is possible to use both effective and expiration date 
as partition column for SCD type 2 dimension data. The problem is that the 
dimension dataset is huge, and we would like to partition the dataset using 
both effective and expiration date so that we can filter out irrelevant data. 
Here is an example,
   
   create table mytable (id bigint autoincrement, name text, city text, 
effective date);
   
   insert into mytable values ('Jen', 'Austin', '2017-01-01');
   insert into mytable values ('Mike', 'Austin', '2017-07-01');
   Upsert mytable values ('Jen', 'Tokyo', '2018-01-01');
   
   what's in mytable
   id name city effective
   1 Jen Tokyo 2018-01-01
   2 Mike Austin 2017-07-01
   
   Traditional scd 2 state based on the same event stream:
   create table mytable_scd2 (id bigint autoincrement, dimid bigint, name text, 
city text)
   partitioned by (effective bigint, expiration bigint)
   
   what's in mytable_scd2
   1 1 Ken Austin '2017-01-01' '2018-01-01' <--- this row would change 
partition when it goes from null to a value
   2 2 Mark Austin '2017-07-01' null
   3 1 Ken Tokyo '2018-01-01' null
   
   Given the above example, my question is whether the row (1 1 Ken Austin 
'2017-01-01' '2018-01-01') will change to a different partition if the 
expiration date has been updated from null to '2018-01-01'? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to