[jira] [Comment Edited] (KAFKA-3705) Support non-key joining in KTable

Jan Filipiak (JIRA) Wed, 27 Jul 2016 04:25:36 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395478#comment-15395478
 ]


Jan Filipiak edited comment on KAFKA-3705 at 7/27/16 11:24 AM:
---------------------------------------------------------------

Something that starts happening to us is that for low cardinality columns on 
the join, the prefix scan on the rocks can return a big amount of values. That 
leads to to much time spent between poll() and us loosing group membership. One 
could check the need for a poll() on the consumer while context.forward() 
maybe, as we do context.forward() for every row that comes from the prefix 
scan. The fix with setting session time-out very high, that we are currently 
using is not that good IMO


was (Author: jfilipiak):
Something that starts happening to us is that for low cardinality columns on 
the join, the prefix scan on the rocks can return a big amount of values. That 
leads to to much time spent between poll() and us loosing group membership. One 
could check the need for a poll() on the consumer while context.forward() 
maybe. The fix with setting session time-out very high, that we are currently 
using is not that good IMO

> Support non-key joining in KTable
> ---------------------------------
>
>                 Key: KAFKA-3705
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3705
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Liquan Pei
>              Labels: api
>             Fix For: 0.10.1.0
>
>
> Today in Kafka Streams DSL, KTable joins are only based on keys. If users 
> want to join a KTable A by key {{a}} with another KTable B by key {{b}} but 
> with a "foreign key" {{a}}, and assuming they are read from two topics which 
> are partitioned on {{a}} and {{b}} respectively, they need to do the 
> following pattern:
> {code}
> tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' 
> is partitioned on "a"
> tableA.join(tableB', joiner);
> {code}
> Even if these two tables are read from two topics which are already 
> partitioned on {{a}}, users still need to do the pre-aggregation in order to 
> make the two joining streams to be on the same key. This is a draw-back from 
> programability and we should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3705) Support non-key joining in KTable

Reply via email to