Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-02 Thread Jun Rao
Hi, Henry, Thanks for the reply. 3. Regarding EOS, supporting read_committed makes sense. 1. Regarding re-implementing 30 methods. Could we structure the code such that the implementation of most of the methods could be shared? Thanks, Jun On Thu, Apr 1, 2021 at 11:18 PM Henry Cai wrote: >

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-02 Thread Henry Cai
Tom, I don't think we need to refactor the Consumer/Producer class hierarchy for this KIP. If we introduce a separate class for BatchConsumer, there are more than 30 methods in Consumer interface this new class needs to implement and the implementation would be exactly the same as KafkaConsumer.

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Henry Cai
Jun, On the EOS support, I looked at KIP-98 and it seems to me all the control is specified in the RecordBatch header fields: 1. Whether the batch is a control batch which contains commit or abort marker 2. Whether the batch is transactional which contains PID In read_committed isolation level

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Ryanne Dolan
Henry, a suggestion: instead of introducing a new configuration property which enables the proposed behavior in the existing clients, we could expose this behavior only thru package-private classes (RecordBatchConsumer/Producer or something) which wrap or extend the existing clients. In other

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Colin McCabe
On Thu, Apr 1, 2021, at 09:45, Jun Rao wrote: > Hi, Henry, > > Thanks for the response. > > 1. I agree with Tom that it's worth thinking about a separate class for > shallow iteration instead of trying to add more complexity into the > existing producer/consumer API. We could potentially make

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Jun Rao
Hi, Henry, Thanks for the response. 1. I agree with Tom that it's worth thinking about a separate class for shallow iteration instead of trying to add more complexity into the existing producer/consumer API. We could potentially make the new class an internal API if it's only useful for MM. 3.

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Tom Bentley
Hi Henry, Jun and Ismael, A few things make me wonder if building this into the existing Producer and Consumer APIs is really the right thing to do: 1. Type safety. The existing Producer and Consumer are both generic in K and V, but those type parameters are meaningless in the batch case. For

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-04-01 Thread Henry Cai
Jun, Thanks for your insight looking into this KIP, we do believe the shallow iteration will give quite a significant performance boost. On your concerns: 1. Cleaner API. One alternative is to create new batch APIs. On consumer, it would become Consumer.pollBatch returns a ConsumerBatch

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-31 Thread Jun Rao
Hi, Henry, Thanks for the KIP. Sorry for the late reply. A few comments below. 1. The 'shallow' feature is potentially useful. I do agree with Tom that the proposed API changes seem unclean. Quite a few existing stuff don't really work together with this (e.g., generics, serializer,

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-30 Thread Henry Cai
Tom, Thanks for your comments. Yes it's a bit clumsy to use the existing consumer and producer API to carry the underlying record batch, but creating a new set of API would also mean other use cases (e.g. MM2) wouldn't be able to use that feature easily. We can throw exceptions if we see

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-30 Thread Henry Cai
Ismael, On the network performance side, the issue is on the throughput. For networking purposes, you gain throughout by combining data into bigger batch otherwise you pay higher overhead on network handshaking/roudtrip-delay and wastage on the underlying network packet buffer. On mirrormaker,

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-30 Thread Tom Bentley
Hi Henry and Ryanne, Related to Ismael's point about the producer & consumer configs being dangerous, I can see two parts to this: 2a. Both the proposed configs seem to be fundamentally incompatible with the Producer's existing key.serializer, value.serializer and compression.type configs,

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-30 Thread Ismael Juma
Hi Henry, Can you clarify why this "network performance" issue is only related to shallow mirroring? Generally, we want the protocol to be generic and not have a number of special cases. The more special cases you have, the tougher it becomes to test all the edge cases. Ismael On Mon, Mar 29,

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-29 Thread Henry Cai
It's interesting this VOTE thread finally becomes a DISCUSS thread. For MM2 concern, I will take a look to see whether I can add the support for MM2. For Ismael's concern on multiple batches in the ProduceRequest (conflicting with KIP-98), here is my take: 1. We do need to group multiple

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-29 Thread Ryanne Dolan
Ah, I see, thanks Ismael. Now I understand your concern. >From KIP-98, re this change in v3: "This allows us to remove the message set size since each message set already contains a field for the size. More importantly, since there is only one message set to be written to the log, partial

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-29 Thread Ismael Juma
Ryanne, You misunderstood the referenced comment. It is about the produce request change to have multiple batches: "Up to ProduceRequest V2, a ProduceRequest can contain multiple batches of messages stored in the record_set field, but this was disabled in V3. We are proposing to bring the

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-28 Thread Ryanne Dolan
Ismael, I don't think KIP-98 is related. Shallow iteration was removed in KAFKA-732, which predates KIP-98 by a few years. Ryanne On Sun, Mar 28, 2021, 11:25 PM Ismael Juma wrote: > Thanks for the KIP. I have a few high level comments: > > 1. Like Tom, I'm not convinced about the proposal to

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-28 Thread Ismael Juma
Thanks for the KIP. I have a few high level comments: 1. Like Tom, I'm not convinced about the proposal to make this change to MirrorMaker 1 if we intend to deprecate it and remove it. I would rather us focus our efforts on the implementation we intend to support going forward. 2. The

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-26 Thread Ryanne Dolan
> so as to avoid creating a perverse incentive for users to not adopt MM2? Tom, I know there are several of us chomping at the bit to enable this in MM2, so I don't think that's a problem. AFAICT there won't be a separate KIP required -- we just need to enable this new option in MM2 by default.

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-26 Thread Tom Bentley
Hi, Thanks for the KIP. I've not yet read it in detail, so have no technical points to make at this point. I'm having trouble reconciling KIP-720's deprecation of MM1 with this proposal to add a new feature to MM1 and not to MM2. I think this would create a confusing impression to users. Given

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-03-22 Thread Henry Cai
If there are no additional comments you’d start a vote in a couple of days. On Sat, Feb 27, 2021 at 9:26 AM A S wrote: > +1 to adding latency metrics. > > To add context on why CPU, memory and GC has a bigger impact than network > in a Mirror for compressed topics without KIP-712 is: *a failing

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-27 Thread A S
+1 to adding latency metrics. To add context on why CPU, memory and GC has a bigger impact than network in a Mirror for compressed topics without KIP-712 is: *a failing / unstable mirror cluster will have lag perpetually spiking having much larger impact on e2e latencies*. To explain a bit more:

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-22 Thread Vahid Hashemian
As Henry mentions in the KIP, we are seeing a great deal of improvements and efficiency by using the mirroring enhancement proposed in this KIP, and believe it would be equally beneficial to everyone that runs Kafka and Kafka Mirror at scale. I'm bumping up this thread in case there are

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-13 Thread Ryanne Dolan
Glad to hear that latency and thruput aren't negatively affected somehow. I would love to see this KIP move forward. Ryanne On Sat, Feb 13, 2021, 3:00 PM Henry Cai wrote: > Ryanne, > > Yes, network performance is also important. In our deployment, we are > bottlenecked on the CPU/memory on

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-13 Thread Henry Cai
Ryanne, Yes, network performance is also important. In our deployment, we are bottlenecked on the CPU/memory on the mirror hosts. We are using c5.2x and m5.2x nodes in AWS, before the deployment, CPU would peak to 80% but there is enough network bandwidth left on those hosts. Having said that,

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-12 Thread Ryanne Dolan
Hey Henry, great KIP. The performance improvements are impressive! However, often cpu, ram, gc are not the metrics most important to a replication pipeline -- often the network is mostly saturated anyway. Do you know how this change affects latency or thruput? I suspect less GC pressure means

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-10 Thread Henry Cai
On the question "whether shallow mirror is only applied on mirror maker v1", the code change is mostly on consumer and producer code path, the change to mirrormaker v1 is very trivial. We chose to modify the consumer/producer path (instead of creating a new mirror product) so other use cases can

Re: [DISCUSS] KIP-712: Shallow Mirroring

2021-02-10 Thread Vahid Hashemian
Retitled the thread to conform to the common format. On Fri, Feb 5, 2021 at 4:00 PM Ning Zhang wrote: > Hello Henry, > > This is a very interesting proposal. > https://issues.apache.org/jira/browse/KAFKA-10728 reflects the similar > concern of re-compressing data in mirror maker. > > Probably