[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-09-25 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148735#comment-14148735
 ] 

Neha Narkhede commented on KAFKA-1586:
--

+1 on not supporting sticky partitioning strategy in the new producer. It 
already offers flexibility to the user to use any partitioning strategy. If 
there are no objections, I'm leaning towards closing this JIRA.

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-09-25 Thread Jim Hoagland (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148771#comment-14148771
 ] 

Jim Hoagland commented on KAFKA-1586:
-

+1 agreeing with Jay and Neha.  I found the sticky behavior confusing and made 
me wonder what the heck was happening (I thought I was doing something wrong).

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092237#comment-14092237
 ] 

Jun Rao commented on KAFKA-1586:


One way to address this issue is introduce a new config 
"partition.sticky.time.ms" in the new producer. The producer will then stick to 
a partition for the configured amount of time before switching to another. 
"partition.sticky.time.ms" can default to 0, which means every message will 
switch to a new partition.

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093047#comment-14093047
 ] 

Jun Rao commented on KAFKA-1586:


Created reviewboard https://reviews.apache.org/r/24565/
 against branch origin/trunk

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093110#comment-14093110
 ] 

Guozhang Wang commented on KAFKA-1586:
--

I think while we are designing the new producer, we have decided to use a fixed 
partitioner, and let application logic to specify the partition id if they want 
some sort of stickiness. I am wondering if we shall just implement this sticky 
logic in the application level, such as MirrorMaker instead of make it inside 
the new producer?

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093140#comment-14093140
 ] 

Jun Rao commented on KAFKA-1586:


The question is whether sticky partitioning only helps MirrorMaker or all 
producers. If it's the former, it may make sense to do it just in MirrorMaker. 
Otherwise, doing this in the producer itself is more convenient. To me, this 
feature can be useful for any producer that cares about compression ratio 
and/or # of concurrent socket connections.

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093185#comment-14093185
 ] 

Jay Kreps commented on KAFKA-1586:
--

We made partitioning pluggable so you could plug in the partitioning strategy 
of your choice. I think this is the right route rather than trying to implement 
every possible partitioning strategy in the producer. In my experience ~100% of 
people who have experienced the sticky partitioning feature think it is a bug 
and don't understand how to turn it off. ~0% of people want this feature 
outside LinkedIn which is trying to reduce the connection count. So I think it 
makes sense to have LinkedIn just implement their own partitioning strategy.

> support sticky partitioning in the new producer
> ---
>
> Key: KAFKA-1586
> URL: https://issues.apache.org/jira/browse/KAFKA-1586
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2
>Reporter: Jun Rao
>Assignee: Jun Rao
> Attachments: KAFKA-1586.patch
>
>
> If a message doesn't specify a key or a partition, the new producer selects a 
> partition for each message in a round-robin way. As a result, in a window of 
> linger.ms, messages are spread around in all partitions of a topic. Compared 
> with another strategy that assigns all messages to a single partition in the 
> same time window, this strategy may not compress the message set as well 
> since the batch is smaller. Another potential problem with this strategy is 
> that the compression ratio could be sensitive to the change of # partitions 
> in a topic. If # partitions are increased in a topic, the produced data may 
> not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)