[
https://issues.apache.org/jira/browse/KAFKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jay Kreps updated KAFKA-642:
----------------------------
Attachment: KAFKA-642-v1.patch
This patch implements the changes described above with the following exceptions:
1. I punted on fixing OffsetRequest. This change kind of depends on the log
refactoring and is somewhat larger than I expected. It would be nice to fix it
but I was going to do that as a separate patch and maybe not for 0.8.
2. I also changed instances where were using shorts for array lengths. There
were a few of these and it complicates the protocol definition since you can't
have a general definition of an array.
3. I changed ClientUtils to not require Broker instances, since that is crazy.
OffsetRequest, TopicMetadataRequest
- Add correlation id. Not all are being set, but the point is just to get it
in the protocol
TopicMetadata
- Change the serialization format so that we store only broker ids, not full
brokers
- "no leader" is encoded as leader_id=-1
- The object itself doesn't change
- Change sizes to all be 4 bytes to be consistent with all other arrays
TopicMetadataResponse
- Add broker list to response. This is guaranteed to have all "relevant"
brokers--i.e. all leaders and replicas for topics included in the request
- Add correlation id
ClientUtils
- fetchTopicMetadata should take a list of addresses not a list of brokers
Broker
- remove creatorid
Other files
- carry through the above changes (i.e. pass in the new argument)
> Protocol tweaks for 0.8
> -----------------------
>
> Key: KAFKA-642
> URL: https://issues.apache.org/jira/browse/KAFKA-642
> Project: Kafka
> Issue Type: Bug
> Reporter: Jay Kreps
> Attachments: KAFKA-642-v1.patch
>
>
> There are a couple of things in the protocol that are not idea. It would be
> good to tweak these for 0.8 so we start clean.
> Here is a set of problems and proposals:
> Problems:
> 1. Correlation id is not used across all the requests. I don't think it can
> work as intended because of this.
> 2. On reflection I am not sure that we need a correlation id field. I think
> that since we need to guarantee that processing is sequential on any
> particular socket we can correlate with a simple queue. (e.g. as the client
> sends messages it adds them to a queue and as it receives responses it just
> correlates to whatever is at the head of the queue).
> 3. The metadata response seems to have a number of problems. Among them is
> that it weirdly repeats all the broker information many times. The response
> includes the ISR, leader (maybe), and the replicas. Each of these repeat all
> the broker information. This is super weird. I think what we should be doing
> here is including all broker information for all brokers and then just having
> the appropriate ids for the isr, leader, and replicas.
> 4. For topic discovery I think we need to support the case where no topics
> are specified in the metadata request and for this return information about
> all topics. I don't think we do this now.
> 5. I don't understand what the creator id is.
> 6. The offset request and response is not fully thought through and should be
> generalized.
> Proposals:
> 1, 2. Correlation id. This is not strictly speaking needed, but it is maybe
> useful for debugging to be able to trace a particular request from client to
> server. So we will extend this across all the requests.
> 3. For metadata response I will try to fix this up by normalizing out the
> broker list and having the isr, replicas, and leader field just have the node
> id.
> 4. This should be uncontroversial and easy to add.
> 5. Let's remove creator id, it isn't used.
> 6. Let's generalize offset request. My proposal is below:
> Rename TopicMetadata API to ClusterMetadata, as this will contain all the
> data that is known cluster-wide. Then let's generalize the offset request to
> be PartitionMetadata--namely stuff about a particular partition on a
> particular server.
> The format of PartitionMetdata would be the following:
> PartitionMetadataRequest => [TopicName [PartitionId MinSegmentTime
> MaxSegmentInfos]]
> TopicName => string
> PartitionId => uint32
> MinSegmentTime => uint64
> MaxSegmentInfos => int32
> PartitionMetadataResponse => [TopicName [PartitionMetadata]]
> TopicName => string
> PartitionMetadata => PartitionId LogSize NumberOfSegments LogEndOffset
> HighwaterMark [SegmentData]
> SegmentData => StartOffset LastModifiedTime
> LogSize => uint64
> NumberOfSegments => int32
> LogEndOffset => int64
> HighwaterMark => int64
> This would be general enough that we could continue to add to it for any new
> pieces of data we need.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira