[
https://issues.apache.org/jira/browse/KAFKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750549#comment-13750549
]
Tejas Patil commented on KAFKA-1012:
------------------------------------
Re transactionality:
In the current patch (with embedded producer), a commit request (aka produce
request for offsets topic) gets written to 2 places: logs of offsets topic and
offset manager backend storage. If there is an error in writing any offset
message to the logs, then this would be indicated in the response of the
produce request. Embedded producer would internally retry the request (with
failed messages only) after checking the error status in response. Only those
messages which could make it to the logs are passed to the 2nd part (offset
manager backend storage). As the backend would be basically a hash table or Zk,
it is assumed that the offset manager won't fail to write data to the backend.
To sum up, there is no notion of transactions. Brokers would "greedily" try to
commit as many messages they can, if some offset messages fail, embedded
producer would re-send a request just for the failed ones.
Re "per-topic max message size":
Does Kafka support per-topic max message size ? : I could not find such config.
What does "server impact" includes : volume of offsets data stored in logs or
having large metadata in memory ? Log cleaner would dedupe the logs of this
topic frequently and so the size of logs would be pruned from time to time.
About holding this metadata in in-memory hash table, I think its a nice thing
to have a cap on message size to prevent in-memory table consuming large
memory. Would include the same in coming patch. It would be helpful even for
next phase when we move off embedded producer and start using offset commit
request.
Re "partitioning the offset topic":
Its recommended to have a #(partitions for offsets topic) >= #brokers so that
all brokers get a somewhat similar[*] amount of traffic of offset commits. The
replication factor of the offsets topic should be more than that of any normal
kafka topic to achieve high availability of the offset information.
[*]: There can be a imbalance in server load if some consumer groups have lot
of consumers or if some consumers have shorter offset commit interval than the
others. So we cannot get a guarantee about "equal" load across all brokers. We
expect that the load to be "similar" across all brokers.
Thanks for all your comments [~criccomini] !!!
> Implement an Offset Manager and hook offset requests to it
> ----------------------------------------------------------
>
> Key: KAFKA-1012
> URL: https://issues.apache.org/jira/browse/KAFKA-1012
> Project: Kafka
> Issue Type: Sub-task
> Components: consumer
> Reporter: Tejas Patil
> Assignee: Tejas Patil
> Priority: Minor
> Attachments: KAFKA-1012.patch, KAFKA-1012-v2.patch
>
>
> After KAFKA-657, we have a protocol for consumers to commit and fetch offsets
> from brokers. Currently, consumers are not using this API and directly
> talking with Zookeeper.
> This Jira will involve following:
> 1. Add a special topic in kafka for storing offsets
> 2. Add an OffsetManager interface which would handle storing, accessing,
> loading and maintaining consumer offsets
> 3. Implement offset managers for both of these 2 choices : existing ZK based
> storage or inbuilt storage for offsets.
> 4. Leader brokers would now maintain an additional hash table of offsets for
> the group-topic-partitions that they lead
> 5. Consumers should now use the OffsetCommit and OffsetFetch API
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira