[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832325#comment-16832325
 ] 

Andrew commented on KAFKA-8315:
-------------------------------

The fudge factor is the \{windowstore.changelog.additional.retention.ms} which 
is set to 24h by default, so that seems to add up to 5 days (120 hours).

You understand my use case well. The choice of limiting the grace was to reduce 
the performance overhead during the large historical processing.

I definitely have lots of data beyond this that should join and the joins and 
aggregation within the latest 120 hours seem to be correct and complete. This 
seems to be just related to retention. I will experiment with increasing 
\{windowstore.changelog.additional.retention.ms} to see if it brings in more 
data. 

One oddity of our left stream is that it contains records in batches from 
different devices. Each batch is about 1000 records and contiguous within the 
stream. Within a batch the records are in increasing timestamp order. 
Subsequent batches from different devices will be within 1-2 days of the 
previous batch (we don't have, say, a batch for 1/1/2019 followed by a batch 
for 4/1/2019 or a batch for 24/12/2019 for example. The right stream is single 
records not batches with similar date ordering (i.e. subsequent records should 
be within 1-2 days of each other.

My understanding is that the windows are closed when streamtime - windowstart > 
windowsize + grace. So as stream time increases as newer batches arrive, 
joins/aggregations should continue, until a batch arrives after windowsize + 
grace. However, we are not seeing this.

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8315
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8315
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Andrew
>            Assignee: John Roesler
>            Priority: Major
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to