[jira] [Resolved] (KAFKA-1933) Fine-grained locking in log append

2016-11-15 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov resolved KAFKA-1933.
-
Resolution: Won't Fix

This patch is not relevant anymore as Kafka finally matured enough to not do 
recompression of incoming batches just to set offsets

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Maxim Ivanov
>Priority: Minor
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> Results on a server with 16 cores CPU available:
> gzip: 564.0 sec -> 45.2 sec (12.4x speedup)
> LZ4: 56.7 sec -> 9.9 sec (5.7x speedup)
> Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in 
> total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-10 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Fix Version/s: (was: 0.8.2.0)
   0.8.3

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 Results on a server with 16 cores CPU available:
 gzip: 564.0 sec - 45.2 sec (12.4x speedup)
 LZ4: 56.7 sec - 9.9 sec (5.7x speedup)
 Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in 
 total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311977#comment-14311977
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used shallowCount to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression in critical path, just to reserve correct range 
of offsets, that in turn will make performance improvement of this patch 
smaller, but still substantial. I'll redo it in near future and resubmit 
version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. Non-compressed case as well as assignOffset = false mode should be 
refactored into separate code paths. My thinking was that once idea got 
approval, I'll do it. So at the end there should be no impact for 
non-compressed case

3. There must be more than that. Raw gzip compressor is capable of processing 
~40MB/sec, Kafka with single topic,single partition, no replication and 5 
netcat clients pushing into it prerecorded messages (== infinite pipelining)  
is capable of doing 8.18 MB/sec, so the overhead of whole Kafka system is 
massive, especially given the fact that network handling and parsing is done in 
separate threads. Paralellizing compression seemed to bring most value for time 
spent on the patch and wouldn't prevent any other optimisations to take place.

4. That was my thinking the moment I discovered why our Kafka servers are 
choking on CPU, not network or disks, when massive push from hadoop is 
happening. Kafka had a chance to implement that when protocol was changing in 
0.8, but now it is very intrusive, I certainly would not propose it as my first 
patch :) Changes to Log.append are self containing, least intrusive, very local 
and most important give a relief to our immediate problem without thinking 
about migration to new log format or changing client protocol. When there will 
be anoter breaking change, please do so, but seeing how kafka 0.7 - 0.8 
migration is ongoing, you wont make many friends.


 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 3.9 sec
 Gzip: 62.3 sec - 24.8 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312349#comment-14312349
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Got a results on a beefier machine  with 16 cores CPU available, kafka 
configured with 16 IO threads, 32 netcat clients pushing messages:

gzip: 564.0 sec - 45.2 sec (12.4x speedup)
LZ4: 56.7 sec - 9.9 sec (5.7x speedup)


 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Description: 
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to reserve offsets in non
overlapping ranges, then do compression in parallel and then
commit write to log in the same order offsets where reserved.

Results on a server with 16 cores CPU available:
gzip: 564.0 sec - 45.2 sec (12.4x speedup)
LZ4: 56.7 sec - 9.9 sec (5.7x speedup)

Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in total)

  was:
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to reserve offsets in non
overlapping ranges, then do compression in parallel and then
commit write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec - 4.2 sec
Gzip: 62.3 sec - 26.9 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)



 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 Results on a server with 16 cores CPU available:
 gzip: 564.0 sec - 45.2 sec (12.4x speedup)
 LZ4: 56.7 sec - 9.9 sec (5.7x speedup)
 Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in 
 total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311977#comment-14311977
 ] 

Maxim Ivanov edited comment on KAFKA-1933 at 2/9/15 10:38 AM:
--

Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used shallowCount to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression before reservin correct range of offsets. I'll 
redo it in near future and resubmit version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. Non-compressed case as well as assignOffset = false mode should be 
refactored into separate code paths. My thinking was that once idea got 
approval, I'll do it. So at the end there should be no impact for 
non-compressed case

3. There must be more than that. Raw gzip compressor is capable of processing 
~40MB/sec, Kafka with single topic,single partition, no replication and 5 
netcat clients pushing into it prerecorded messages (== infinite pipelining)  
is capable of doing 8.18 MB/sec, so the overhead of whole Kafka system is 
massive, especially given the fact that network handling and parsing is done in 
separate threads. Paralellizing compression seemed to bring most value for time 
spent on the patch and wouldn't prevent any other optimisations to take place.

4. That was my thinking the moment I discovered why our Kafka servers are 
choking on CPU, not network or disks, when massive push from hadoop is 
happening. Kafka had a chance to implement that when protocol was changing in 
0.8, but now it is very intrusive, I certainly would not propose it as my first 
patch :) Changes to Log.append are self containing, least intrusive, very local 
and most important give a relief to our immediate problem without thinking 
about migration to new log format or changing client protocol. When there will 
be anoter breaking change, please do so, but seeing how kafka 0.7 - 0.8 
migration is ongoing, you wont make many friends.



was (Author: rossmohax):
Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used shallowCount to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression in critical path, just to reserve correct range 
of offsets, that in turn will make performance improvement of this patch 
smaller, but still substantial. I'll redo it in near future and resubmit 
version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. 

[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312160#comment-14312160
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated reviewboard https://reviews.apache.org/r/30775/diff/
 against branch origin/0.8.2

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 3.9 sec
 Gzip: 62.3 sec - 24.8 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Description: 
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to reserve offsets in non
overlapping ranges, then do compression in parallel and then
commit write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec - 4.2 sec
Gzip: 62.3 sec - 26.9 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)


  was:
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to reserve offsets in non
overlapping ranges, then do compression in parallel and then
commit write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec - 3.9 sec
Gzip: 62.3 sec - 24.8 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)



 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312171#comment-14312171
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated reviewboard https://reviews.apache.org/r/30775/diff/
 against branch origin/0.8.2

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: (was: KAFKA-1933_2015-02-09_12:15:06.patch)

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: KAFKA-1933_2015-02-09_12:15:06.patch

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 3.9 sec
 Gzip: 62.3 sec - 24.8 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: KAFKA-1933_2015-02-09_12:27:06.patch

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312181#comment-14312181
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated patch, now it correctly calculates offset ranges. On next iteration 
I'll clenup spaghetti code in Log.append so it is more clear  what is happening

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 4.2 sec
 Gzip: 62.3 sec - 26.9 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1933) Fine-grained locking in log append

2015-02-08 Thread Maxim Ivanov (JIRA)
Maxim Ivanov created KAFKA-1933:
---

 Summary: Fine-grained locking in log append
 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2


This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to reserve offsets in non
overlapping ranges, then do compression in parallel and then
commit write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec - 3.9 sec
Gzip: 62.3 sec - 24.8 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-08 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311527#comment-14311527
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Created reviewboard https://reviews.apache.org/r/30775/diff/
 against branch origin/0.8.2

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 3.9 sec
 Gzip: 62.3 sec - 24.8 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-08 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: KAFKA-1933.patch

 Fine-grained locking in log append
 --

 Key: KAFKA-1933
 URL: https://issues.apache.org/jira/browse/KAFKA-1933
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Maxim Ivanov
Assignee: Jay Kreps
Priority: Minor
 Fix For: 0.8.2

 Attachments: KAFKA-1933.patch


 This patch adds finer locking when appending to log. It breaks
 global append lock into 2 sequential and 1 parallel phase.
 Basic idea is to allow every thread to reserve offsets in non
 overlapping ranges, then do compression in parallel and then
 commit write to log in the same order offsets where reserved.
 On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
 performance boost:
 LZ4: 7.2 sec - 3.9 sec
 Gzip: 62.3 sec - 24.8 sec
 Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
 instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
 total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)