[
https://issues.apache.org/jira/browse/SAMZA-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110856#comment-14110856
]
Yan Fang commented on SAMZA-388:
--------------------------------
+1 for the temporal solution. Few nits posted in RB. feel free to commit. Maybe
we want to revisit this ticket after KAFKA-1374, or use another ticket to track
in case we forget to update. Thank you.
> Log compaction on checkpoint topics fails with compression
> ----------------------------------------------------------
>
> Key: SAMZA-388
> URL: https://issues.apache.org/jira/browse/SAMZA-388
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Assignee: Chris Riccomini
> Attachments: SAMZA-388-0.patch
>
>
> I have a job that has 10,000+ partitions that it's consuming from. After
> SAMZA-123, it's been switched to use the GroupBySystemStreamPartition
> strategy, which means it's got 10,000+ tasks, and thus, 10,000+ checkpoint
> messages being sent every minute.
> To keep the checkpoint topic from getting too large, we enabled log
> compaction on the Kafka topic, but we discovered that the topic then grew to
> be very large. This behavior was triggered because we were sending compressed
> messages to the Kafka checkpoint topic.
> Based on KAFKA-1374, it appears that we can't use compressed checkpoint
> topics with log compaction.
> I'm mostly opening this ticket as a place holder for KAFKA-1374. Once the
> ticket is resolved, we can update the Samza code to default the checkpoint
> topics to be log compacted (with a small segment size), and not worry about
> the compression anymore.
--
This message was sent by Atlassian JIRA
(v6.2#6252)