> On Feb. 1, 2015, 6:34 a.m., Gwen Shapira wrote:
> > I like the idea of a message handler in MirrorMaker, but I think we can do 
> > better. Let me know if you think I'm taking it far beyond your original 
> > scope... I can add it as a follow up jira.
> > 
> > 1. I think we need to let users pass parameters to the handlers. We need a 
> > "configure" or "init" method in the handler, which MirrorMaker will call 
> > once with the right properties and the handler can use them for basic 
> > setup. For example, imagine a "regexp filter" handler - I get a regexp from 
> > the commandline and filter messages that don't match. My "init" method will 
> > set up the regexp so it will be available for all handle() calls.
> > 2. I think the handle() method should take List<Record> as input, not just 
> > a Record. MirrorMaker will be able to consume until it fills a batch (or 
> > until we waited too long), "handle" a batch - which will be able to use 
> > Scala's Sequence operators - Filter, Map, etc, and then produce an entire 
> > list of records. This sounds more efficient to me. 
> > 
> > Another comment - I think the addition of "destination partition" is 
> > unrelated to this change and Jira?

I think it's a great idea to extend message handler to take commandline 
arguments. I'll add that to the this patch.
I'm not sure I understand why it will improve performance by passing in a list 
of record. Shouldn't the message handler decide whether it wants to buffer some 
records or not? Some worries about buffering messages outside message handler 
are:
1. It is usually very difficult to decide how many messages to buffer because 
it is likely depending on the business logic and could vary from time to time.
2. It might further increase the delay.

The addition of destination partition is actually mainly to serve partition to 
partition copy, which is something related to message handler. I feel it could 
be put into either KAFKA-1840 or KAFKA-1839. I put it here as it seems to be 
more related to the message handler because the handler has to assigne 
partition explicitly. 

Maybe as a follow up patch, I'm thinking that we can add a commandline option 
of --mirror-cluster so it does the following:
1. Automatically create same number of partitions in target cluster when new 
topic is created in source cluster. (Based on KAFKA-1839)
2. Produce the messages to the destination partition same as the source 
partition. (Follow-up patch for KAFKA-1840)


- Jiangjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30063/#review70523
-----------------------------------------------------------


On Jan. 31, 2015, 2:25 a.m., Jiangjie Qin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30063/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2015, 2:25 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1840
>     https://issues.apache.org/jira/browse/KAFKA-1840
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into 
> KAFKA-1840
> 
> 
> Addressed Joel's comments
> 
> 
> Allow message handler to specify partitions for produce
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> 81ae205ef7b2050d0152f29f8da7dd91b17b8b00 
> 
> Diff: https://reviews.apache.org/r/30063/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>

Reply via email to