bithw1 opened a new issue, #11933:
URL: https://github.com/apache/hudi/issues/11933
Hi,
I have following sql snippet that I want to do join against two streaming
read tables.For now, I am not able to do the experiment to find out the
behavior, so I would ask here.
1. I would ask how the join works(What data participates in the join
calculation)
1.1. When new data comes from A, these new data will do join with full data
of Z?
1.2. When new data comes from Z, these new data will do join with full data
of A?
1.3. When new comes at the same time from A and Z, then new data from A do
join will full Z and new data from Z will do join with full
A?
1.4 I am not sure whether only new data from two tables will do the join. If
so,then there would very few data will join successfully because of the
difference of data arrival?
2.. `new data comes from A ` means there is new commits in A? Query won't
see data that hasn't been committed?
```
create table A(
a string,
b string,
c string,
d string
) WITH
'connector' = 'hudi',
'path' = 'hdfs://tmp/hudi_table_a',
'table.type' = 'MERGE_ON_READ',
'read.streaming.enabled' = 'true',
'read.streaming.start-commit' = 'earliest',
'write.precombine.field' = 'a',
'hoodie.datasource.write.keygenerator.type' = 'complex',
'hoodie.datasource.write.recordkey.field' = 'b,c',
'read.streaming.check-interval' = '5',
'read.tasks' = '4',
'read.rate.limit' = '10000'
);
create table Z(
a string,
x string,
y string,
z string
) WITH
'connector' = 'hudi',
'path' = 'hdfs://tmp/hudi_table_z',
'table.type' = 'MERGE_ON_READ',
'read.streaming.enabled' = 'true',
'read.streaming.start-commit' = 'earliest',
'write.precombine.field' = 'a',
'hoodie.datasource.write.keygenerator.type' = 'complex',
'hoodie.datasource.write.recordkey.field' = 'x,y',
'read.streaming.check-interval' = '5',
'read.tasks' = '4',
'read.rate.limit' = '10000'
);
select A.a, b,c,d,x,y,z from A join Z
on A.a = Z.a
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]