[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka
[ https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119804#comment-13119804 ] Chris Burroughs commented on KAFKA-79: -- - I think we should have a clear convention for ids. For example: core < 1, contrib < 2, HERE-BE-DRAGONS > 2. - I think there is room for gzip, and something else in the LZF/Snappy area in the default kafka install. - I'm mildly uncomfortable with native code dependencies, but the Hadoop guys seem to have gotten something working. > Introduce the compression feature in Kafka > -- > > Key: KAFKA-79 > URL: https://issues.apache.org/jira/browse/KAFKA-79 > Project: Kafka > Issue Type: New Feature >Affects Versions: 0.6 >Reporter: Neha Narkhede > Fix For: 0.7 > > > With this feature, we can enable end-to-end block compression in Kafka. The > idea is to enable compression on the producer for some or all topics, write > the data in compressed format on the server and make the consumers > compression aware. The data will be decompressed only on the consumer side. > Ideally, there should be a choice of compression codecs to be used by the > producer. That means a change to the message header as well as the network > byte format. On the consumer side, the state maintenance behavior of the > zookeeper consumer changes. For compressed data, the consumed offset will be > advanced one compressed message at a time. For uncompressed data, consumed > offset will be advanced one message at a time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka
[ https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120043#comment-13120043 ] Chris Burroughs commented on KAFKA-79: -- What Jay called "compression ids" (ie 1==gzip). > Introduce the compression feature in Kafka > -- > > Key: KAFKA-79 > URL: https://issues.apache.org/jira/browse/KAFKA-79 > Project: Kafka > Issue Type: New Feature >Affects Versions: 0.6 >Reporter: Neha Narkhede > Fix For: 0.7 > > > With this feature, we can enable end-to-end block compression in Kafka. The > idea is to enable compression on the producer for some or all topics, write > the data in compressed format on the server and make the consumers > compression aware. The data will be decompressed only on the consumer side. > Ideally, there should be a choice of compression codecs to be used by the > producer. That means a change to the message header as well as the network > byte format. On the consumer side, the state maintenance behavior of the > zookeeper consumer changes. For compressed data, the consumed offset will be > advanced one compressed message at a time. For uncompressed data, consumed > offset will be advanced one message at a time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-92) upgrade to latest stable 0.7.x sbt
[ https://issues.apache.org/jira/browse/KAFKA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121701#comment-13121701 ] Chris Burroughs commented on KAFKA-92: -- FWIW I can't reproduce KAFKA-134, but if you have confirmed that reverting fixes it I'm cool with going back to 0.7.5. (Sorry for the crazy bug hunt!) > upgrade to latest stable 0.7.x sbt > -- > > Key: KAFKA-92 > URL: https://issues.apache.org/jira/browse/KAFKA-92 > Project: Kafka > Issue Type: Improvement >Reporter: Chris Burroughs >Assignee: Chris Burroughs >Priority: Minor > Fix For: 0.7 > > Attachments: k92-v1.txt > > > None of the changes look particularly important, but there isn't a reason to > stay with an older version. > http://code.google.com/p/simple-build-tool/wiki/Changes > Old google group thread: > http://groups.google.com/group/kafka-dev/browse_thread/thread/1c7b06a8c550fe23/08f3b2f9cd2059b0?lnk=gst&q=sbt#08f3b2f9cd2059b0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-149) Current perf directory has buggy perf tests
[ https://issues.apache.org/jira/browse/KAFKA-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122486#comment-13122486 ] Chris Burroughs commented on KAFKA-149: --- I'm not following what was wrong with the perf/ tools, or what changed to make them unreliable. > Current perf directory has buggy perf tests > --- > > Key: KAFKA-149 > URL: https://issues.apache.org/jira/browse/KAFKA-149 > Project: Kafka > Issue Type: Bug >Reporter: Neha Narkhede > Fix For: 0.7 > > > The scripts in the current perf directory are buggy and not useful to run any > reliable Kafka performance tests. The performance tools that work correctly > are - > ProducerPerformance.scala > SimpleConsumerPerformance.scala > ConsumerPerformance.scala > Currently, the above are in the tools directory. Ideally, a Kafka performance > suite should repackage these tools with some sample performance load and > output data in csv format that can be graphed. > I suggest deleting the perf directory and redoing this cleanly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-142) Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and red
[ https://issues.apache.org/jira/browse/KAFKA-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123760#comment-13123760 ] Chris Burroughs commented on KAFKA-142: --- jopt-simple is MIT, but that's still Category A. Alan, don't we have to clear everything under contrib/ and clients/ as well? > Check and make sure that for all code included with the distribution that is > not under the Apache license, we have the right to combine with > Apache-licensed code and redistribute > -- > > Key: KAFKA-142 > URL: https://issues.apache.org/jira/browse/KAFKA-142 > Project: Kafka > Issue Type: Task >Reporter: Alan Cabrera >Priority: Blocker > Fix For: 0.7 > > > Check and make sure that for all code included with the distribution that is > not under the Apache license, we have the right to combine with > Apache-licensed code and redistribute -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-149) Current perf directory has buggy perf tests
[ https://issues.apache.org/jira/browse/KAFKA-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123788#comment-13123788 ] Chris Burroughs commented on KAFKA-149: --- Thanks, that makes sense. Is there a ticket yet for the New and Improved perf scripts? > Current perf directory has buggy perf tests > --- > > Key: KAFKA-149 > URL: https://issues.apache.org/jira/browse/KAFKA-149 > Project: Kafka > Issue Type: Bug >Reporter: Neha Narkhede > Fix For: 0.7 > > > The scripts in the current perf directory are buggy and not useful to run any > reliable Kafka performance tests. The performance tools that work correctly > are - > ProducerPerformance.scala > SimpleConsumerPerformance.scala > ConsumerPerformance.scala > Currently, the above are in the tools directory. Ideally, a Kafka performance > suite should repackage these tools with some sample performance load and > output data in csv format that can be graphed. > I suggest deleting the perf directory and redoing this cleanly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-143) Check and make sure that all source code distributed by the project is covered by one or more approved licenses
[ https://issues.apache.org/jira/browse/KAFKA-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124099#comment-13124099 ] Chris Burroughs commented on KAFKA-143: --- - LICENSE files need to have the "Copyright [] [name of copyright owner]" line filled in. - We can't remove the FSF copyright notice that says we can redistribute "as long as this notice is preserved." I'm not sure how m4/autoconf type files are supposed to be handled. > Check and make sure that all source code distributed by the project is > covered by one or more approved licenses > --- > > Key: KAFKA-143 > URL: https://issues.apache.org/jira/browse/KAFKA-143 > Project: Kafka > Issue Type: Task >Reporter: Alan Cabrera >Priority: Blocker > Fix For: 0.7 > > Attachments: KAFKA-143.patch > > > Check and make sure that all source code distributed by the project is > covered by one or more of the following approved licenses: Apache, BSD, > Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same > terms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-142) Check and make sure that for all code included with the distribution that is not under the Apache license, we have the right to combine with Apache-licensed code and red
[ https://issues.apache.org/jira/browse/KAFKA-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124100#comment-13124100 ] Chris Burroughs commented on KAFKA-142: --- Sorry, I meant the *dependencies* of clients/ and contrib/ > Check and make sure that for all code included with the distribution that is > not under the Apache license, we have the right to combine with > Apache-licensed code and redistribute > -- > > Key: KAFKA-142 > URL: https://issues.apache.org/jira/browse/KAFKA-142 > Project: Kafka > Issue Type: Task >Reporter: Alan Cabrera >Priority: Blocker > Fix For: 0.7 > > > Check and make sure that for all code included with the distribution that is > not under the Apache license, we have the right to combine with > Apache-licensed code and redistribute -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-156) Messages should not be dropped when brokers are unavailable
[ https://issues.apache.org/jira/browse/KAFKA-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129018#comment-13129018 ] Chris Burroughs commented on KAFKA-156: --- This is more or less what we do with SyncProducer . But I'm not sure there is one right choice for durability, shedding load, exponential vs linear backoff, etc. > Messages should not be dropped when brokers are unavailable > --- > > Key: KAFKA-156 > URL: https://issues.apache.org/jira/browse/KAFKA-156 > Project: Kafka > Issue Type: Improvement >Reporter: Sharad Agarwal > Fix For: 0.8 > > > When none of the broker is available, producer should spool the messages to > disk and keep retrying for brokers to come back. > This will also enable brokers upgrade/maintenance without message loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-156) Messages should not be dropped when brokers are unavailable
[ https://issues.apache.org/jira/browse/KAFKA-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129024#comment-13129024 ] Chris Burroughs commented on KAFKA-156: --- Sorry, wrong "we". I meant this is what the company I work for does when calling SyncProducer. > Messages should not be dropped when brokers are unavailable > --- > > Key: KAFKA-156 > URL: https://issues.apache.org/jira/browse/KAFKA-156 > Project: Kafka > Issue Type: Improvement >Reporter: Sharad Agarwal > Fix For: 0.8 > > > When none of the broker is available, producer should spool the messages to > disk and keep retrying for brokers to come back. > This will also enable brokers upgrade/maintenance without message loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-156) Messages should not be dropped when brokers are unavailable
[ https://issues.apache.org/jira/browse/KAFKA-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129301#comment-13129301 ] Chris Burroughs commented on KAFKA-156: --- Perhaps the producers themselves should just take some sort of MessageSendErrorHandler. And implementations could queue, retry with backoff, writ to disk, drop but log, etc. FWIW Clearspring has a pipeline with: ConcurrentQueue --> spill to disk queue with max size (then drops messages) --> SyncProducer with retry/backoff. > Messages should not be dropped when brokers are unavailable > --- > > Key: KAFKA-156 > URL: https://issues.apache.org/jira/browse/KAFKA-156 > Project: Kafka > Issue Type: Improvement >Reporter: Sharad Agarwal > Fix For: 0.8 > > > When none of the broker is available, producer should spool the messages to > disk and keep retrying for brokers to come back. > This will also enable brokers upgrade/maintenance without message loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-50) kafka intra-cluster replication support
[ https://issues.apache.org/jira/browse/KAFKA-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129830#comment-13129830 ] Chris Burroughs commented on KAFKA-50: -- > I would even argue that if possible Kafka might consider removing Zookeeper > as a dependency - or at least make it optional. It's already optional (enable.zookeeper=false), but you loose a lot if you disable it. Taylor, maybe you could elaborate in the mailing list or another ticket what subset of functionality you would be willing to give up to not use ZK? > kafka intra-cluster replication support > --- > > Key: KAFKA-50 > URL: https://issues.apache.org/jira/browse/KAFKA-50 > Project: Kafka > Issue Type: New Feature >Reporter: Jun Rao >Assignee: Jun Rao > Fix For: 0.8 > > Attachments: kafka_replication_highlevel_design.pdf, > kafka_replication_lowlevel_design.pdf > > > Currently, Kafka doesn't have replication. Each log segment is stored in a > single broker. This limits both the availability and the durability of Kafka. > If a broker goes down, all log segments stored on that broker become > unavailable to consumers. If a broker dies permanently (e.g., disk failure), > all unconsumed data on that node is lost forever. Our goal is to replicate > every log segment to multiple broker nodes to improve both the availability > and the durability. > We'd like to support the following in Kafka replication: > 1. Configurable synchronous and asynchronous replication > 2. Small unavailable window (e.g., less than 5 seconds) during broker > failures > 3. Auto recovery when a failed broker rejoins > 4. Balanced load when a broker fails (i.e., the load on the failed broker is > evenly spread among multiple surviving brokers) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-153) Add compression to C# client
[ https://issues.apache.org/jira/browse/KAFKA-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129985#comment-13129985 ] Chris Burroughs commented on KAFKA-153: --- Is this something we want to re-spin the release on, start work on a 0.7.1? > Add compression to C# client > > > Key: KAFKA-153 > URL: https://issues.apache.org/jira/browse/KAFKA-153 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 0.7 >Reporter: Eric Hauser > Fix For: 0.7 > > Attachments: KAFKA-153-take2.patch, KAFKA-153.patch > > > Compression support to the C# client. Configuration was also refactored to > support changes along with various performance fixes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-164) Config should default to a higher-throughput configuration for log.flush.interval
[ https://issues.apache.org/jira/browse/KAFKA-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131854#comment-13131854 ] Chris Burroughs commented on KAFKA-164: --- There are good things to be said about having the default configuration be the "overly paranoid do not loose your data" one, even though once they think about what that means no one actually wants that. I'd stick with round numbers and a note elaborating on the trade-offs. > Config should default to a higher-throughput configuration for > log.flush.interval > - > > Key: KAFKA-164 > URL: https://issues.apache.org/jira/browse/KAFKA-164 > Project: Kafka > Issue Type: Improvement > Components: config >Affects Versions: 0.7 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.7 > > Attachments: KAFKA-164.patch > > > Currently we default the flush interval to log.flush.interval=1. This is very > slow as it immediately flushes each message. I recommend we change this to > 2 and drop the time-based flush to 1 second. This should be a good > default trade-off between latency and throughput. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-166) Create a tool to jump JMX data to a csv file to help build out performance tests
[ https://issues.apache.org/jira/browse/KAFKA-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134056#comment-13134056 ] Chris Burroughs commented on KAFKA-166: --- Coda Hale's metrics package has built in support for export: https://github.com/codahale/metrics/commit/fce41d75046129a72e3582ae24ac698812d0fe53 Since we will inevitable want more than just average throughput (such as percentiles for latency to judge jitter). We might want to consider switching to that. > Create a tool to jump JMX data to a csv file to help build out performance > tests > > > Key: KAFKA-166 > URL: https://issues.apache.org/jira/browse/KAFKA-166 > Project: Kafka > Issue Type: New Feature > Components: core >Affects Versions: 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Attachments: KAFKA-166.patch > > > In order to get sane performance stats we need to be able to integrate the > values we keep in JMX. To enable this it would be nice to have a generic tool > that dumped JMX stats to a csv file. We could use this against the producer, > consumer, and broker to collect kafka metrics while the tests were running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-150) Confusing NodeExistsException failing kafka broker startup
[ https://issues.apache.org/jira/browse/KAFKA-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134365#comment-13134365 ] Chris Burroughs commented on KAFKA-150: --- trunk is still 0.7, right? > Confusing NodeExistsException failing kafka broker startup > -- > > Key: KAFKA-150 > URL: https://issues.apache.org/jira/browse/KAFKA-150 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.7 >Reporter: Neha Narkhede >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-150.patch > > > Sometimes, broker startup fails with the following exception > [2011-10-03 15:33:22,193] INFO Awaiting connections on port 9092 > (kafka.network.Acceptor) > [2011-10-03 15:33:22,193] INFO Registering broker /brokers/ids/0 > (kafka.server.KafkaZooKeeper) > [2011-10-03 15:33:22,229] INFO conflict in /brokers/ids/0 data: > 10.98.20.109-1317681202194:10.98.20.109:9092 stored data: > 10.98.20.109-1317268078266:10.98.20.109:9092 (kafka.utils.ZkUtils$) > [2011-10-03 15:33:22,230] FATAL > org.I0Itec.zkclient.exception.ZkNodeExistsException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /brokers/ids/0 (kafka.server.KafkaServer) > [2011-10-03 15:33:22,231] FATAL > org.I0Itec.zkclient.exception.ZkNodeExistsException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /brokers/ids/0 >at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:55) >at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) >at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:304) >at org.I0Itec.zkclient.ZkClient.createEphemeral(ZkClient.java:328) >at kafka.utils.ZkUtils$.createEphemeralPath(ZkUtils.scala:55) >at > kafka.utils.ZkUtils$.createEphemeralPathExpectConflict(ZkUtils.scala:71) >at > kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:54) >at kafka.log.LogManager.startup(LogManager.scala:122) >at kafka.server.KafkaServer.startup(KafkaServer.scala:77) >at > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:40) >at kafka.Kafka$.main(Kafka.scala:56) >at kafka.Kafka.main(Kafka.scala) > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /brokers/ids/0 >at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) >at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) >at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:87) >at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:308) >at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:304) >at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) >... 10 more > (kafka.server.KafkaServer) > [2011-10-03 15:33:22,231] INFO Shutting down... (kafka.server.KafkaServer) > [2011-10-03 15:33:22,232] INFO shutdown scheduler kafka-logcleaner- > (kafka.utils.KafkaScheduler) > [2011-10-03 15:33:22,239] INFO shutdown scheduler kafka-logflusher- > (kafka.utils.KafkaScheduler) > [2011-10-03 15:33:22,481] INFO zkActor stopped (kafka.log.LogManager) > [2011-10-03 15:33:22,482] INFO Closing zookeeper client... > (kafka.server.KafkaZooKeeper) > [2011-10-03 15:33:22,482] INFO Terminate ZkClient event thread. > (org.I0Itec.zkclient.ZkEventThread) > There could be 3 things that might have happened > (1) you restarted kafka within the zk timeout, in which case as far as zk is > concerned your old broker still exists...this is weird but actually correct > behavior, > (2) you have two brokers with the same id, > (3) zk has a bug and is not deleting ephemeral nodes. > Instead of just throwing the ZK NodeExistsException, we should include the > above information in a well-named Kafka exception, for clarity. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-91) zkclient does not show up in pom
[ https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134364#comment-13134364 ] Chris Burroughs commented on KAFKA-91: -- Could I get a review? > zkclient does not show up in pom > > > Key: KAFKA-91 > URL: https://issues.apache.org/jira/browse/KAFKA-91 > Project: Kafka > Issue Type: Bug > Components: packaging >Reporter: Chris Burroughs >Assignee: Chris Burroughs >Priority: Minor > Fix For: 0.8 > > Attachments: k91-v1.txt > > > The pom from created by `make-pom`. Does not include zkclient, which is of > course a key dependency. Not sure yet how to pull in zkclient while > excluding sbt itself. > $ cat core/target/scala_2.8.0/kafka-0.7.pom | grep -i zkclient | wc -l > 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-91) zkclient does not show up in pom
[ https://issues.apache.org/jira/browse/KAFKA-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134381#comment-13134381 ] Chris Burroughs commented on KAFKA-91: -- Our custom build of zkclient isn't in any repo and does not have a pom, so it can not pull in any dependencies. > zkclient does not show up in pom > > > Key: KAFKA-91 > URL: https://issues.apache.org/jira/browse/KAFKA-91 > Project: Kafka > Issue Type: Bug > Components: packaging >Reporter: Chris Burroughs >Assignee: Chris Burroughs >Priority: Minor > Fix For: 0.8 > > Attachments: k91-v1.txt > > > The pom from created by `make-pom`. Does not include zkclient, which is of > course a key dependency. Not sure yet how to pull in zkclient while > excluding sbt itself. > $ cat core/target/scala_2.8.0/kafka-0.7.pom | grep -i zkclient | wc -l > 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-171) Kafka producer should do a single write to send message sets
[ https://issues.apache.org/jira/browse/KAFKA-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136622#comment-13136622 ] Chris Burroughs commented on KAFKA-171: --- Even if this doesn't measurably improve node to node performance (and I'm not sure we should expect it to since we don't have to wait for an ACK to send the next packet), isn't it definitely making life better for network engineer? > Kafka producer should do a single write to send message sets > > > Key: KAFKA-171 > URL: https://issues.apache.org/jira/browse/KAFKA-171 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.7, 0.8 >Reporter: Jay Kreps >Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-171-draft.patch > > > From email thread: > http://mail-archives.apache.org/mod_mbox/incubator-kafka-dev/201110.mbox/%3ccafbh0q1pyuj32thbayq29e6j4wt_mrg5suusfdegwj6rmex...@mail.gmail.com%3e > > Before sending an actual message, kafka producer do send a (control) > > message of 4 bytes to the server. Kafka producer always does this action > > before send some message to the server. > I think this is because in BoundedByteBufferSend.scala we do essentially > channel.write(sizeBuffer) > channel.write(dataBuffer) > The correct solution is to use vector I/O and instead do > channel.write(Array(sizeBuffer, dataBuffer)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-170) Support for non-blocking polling on multiple streams
[ https://issues.apache.org/jira/browse/KAFKA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136630#comment-13136630 ] Chris Burroughs commented on KAFKA-170: --- > Do users care about the topic/partition I think yes. If we allow users to provide an arbitrary partitioner, the partitioning may be meaningful through their pipeline. > Support for non-blocking polling on multiple streams > > > Key: KAFKA-170 > URL: https://issues.apache.org/jira/browse/KAFKA-170 > Project: Kafka > Issue Type: New Feature > Components: core >Affects Versions: 0.8 >Reporter: Jay Kreps > > Currently we provide a blocking iterator in the consumer. This is a good > mechanism for consuming data from a single topic, but is limited as a > mechanism for polling multiple streams. > For example if one wants to implement a non-blocking union across multiple > streams this is hard to do because calls may block indefinitely. A similar > situation arrises if trying to implement a streaming join of between two > streams. > I would propose two changes: > 1. Implement a next(timeout) interface on KafkaMessageStream. This will > easily handle some simple cases with minimal change. This handles certain > limited cases nicely and is easy to implement, but doesn't actually cover the > two cases above. > 2. Add an interface to poll streams. > I don't know the best approach for the later api, but it is important to get > it right. One option would be to add a > ConsumerConnector.drainTopics("topic1", "topic2", ...) which blocks until > there is at least one message and then returns a list of triples (topic, > partition, message). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-174) Add performance suite for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136635#comment-13136635 ] Chris Burroughs commented on KAFKA-174: --- - If you are willing to ask OS tools. iostat has some relevant stuff (such as await) - Coda Hale's metrics have some nice tools for latency, I'd be happy to take that as a sub task. > Add performance suite for Kafka > --- > > Key: KAFKA-174 > URL: https://issues.apache.org/jira/browse/KAFKA-174 > Project: Kafka > Issue Type: New Feature >Reporter: Neha Narkhede > Fix For: 0.8 > > > This is a placeholder JIRA for adding a perf suite to Kafka. The high level > proposal is here - > https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing > There will be more JIRAs covering smaller tasks to fully implement this. They > will be linked to this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-139) cross-compile multiple Scala versions
[ https://issues.apache.org/jira/browse/KAFKA-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136639#comment-13136639 ] Chris Burroughs commented on KAFKA-139: --- Agreed that the need for project_scaleVersion suffixes for everything is obnoxious, but is the least bad option and would be appreciated by everyone. Internally at Clearspring we wrapped kafka with a pom with a dep on 2.8.1, since that was the version we were using. But several projects will soon want to use 2.9.x *and* kafka, and it would be nice to have the right fix upstream. So as far as input goes, +1! > cross-compile multiple Scala versions > - > > Key: KAFKA-139 > URL: https://issues.apache.org/jira/browse/KAFKA-139 > Project: Kafka > Issue Type: Improvement > Components: packaging >Reporter: Chris Burroughs > Fix For: 0.8 > > > Since scala does not maintain binary compatibly between versions, > organizations tend to have to move all of there code at the same time. It > would thus be very helpful if we could cross build multiple scala versions. > http://code.google.com/p/simple-build-tool/wiki/CrossBuild > Unclear if this would require KAFKA-134 or just work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-186) no clean way to getCompressionCodec from Java-the-language
[ https://issues.apache.org/jira/browse/KAFKA-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144248#comment-13144248 ] Chris Burroughs commented on KAFKA-186: --- That requires everyone to write their own switch statement though. > no clean way to getCompressionCodec from Java-the-language > -- > > Key: KAFKA-186 > URL: https://issues.apache.org/jira/browse/KAFKA-186 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7 >Reporter: Chris Burroughs > > The obvious thing fails: > CompressionCodec.getCompressionCodec(1) results in cannot find symbol > symbol : method getCompressionCodec(int) > location: interface kafka.message.CompressionCodec > Writing a switch statement with kafka.message.NoCompressionCodec$.MODULE$ > and duplicating the logic in CompressionCodec.getCompressionCodec is no fun, > nor is creating a Hashtable just to call Utils.getCompressionCodec. I'm not > sure if there is a magic keyword to make it easy for javac to understand > which CompressionCodec I'm referring to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-196) Topic creation fails on large values
[ https://issues.apache.org/jira/browse/KAFKA-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148024#comment-13148024 ] Chris Burroughs commented on KAFKA-196: --- '/' is particularly complicated since we want to eventually have support for hierarchical topics, in which case '/' (or whatever we choose) will have special meaning to us, ZK, and the local filesystem. I'd also prefer to have one way to represent topics as strings and not have separate ZK and local fs escaping schemes. That said, unless Pierre-Yves feels like biting off a big patch lets keep this one for a configurable max topic length so that the problem users are running into now is fixed. > Topic creation fails on large values > > > Key: KAFKA-196 > URL: https://issues.apache.org/jira/browse/KAFKA-196 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Pierre-Yves Ritschard > Attachments: > 0001-Set-a-hard-limit-on-topic-width-this-fixes-KAFKA-196.patch > > > Since topic logs are stored in a directory holding the topic's name, creation > of the directory might fail for large strings. > This is not a problem per-se but the exception thrown is rather cryptic and > hard to figure out for operations. > I propose fixing this temporarily with a hard limit of 200 chars for topic > names, it would also be possible to hash the topic name. > Another concern is that the exception raised stops the broker, effectively > creating a simple DoS vector, I'm concerned about how tests or wrong client > library usage can take down the whole broker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-204) BoundedByteBufferReceive hides OutOfMemoryError
[ https://issues.apache.org/jira/browse/KAFKA-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151220#comment-13151220 ] Chris Burroughs commented on KAFKA-204: --- Tiny patch for this one case, created KAFKA-205 to follow up. > BoundedByteBufferReceive hides OutOfMemoryError > --- > > Key: KAFKA-204 > URL: https://issues.apache.org/jira/browse/KAFKA-204 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7 >Reporter: Chris Burroughs >Assignee: Chris Burroughs >Priority: Critical > Attachments: k204-v1.txt > > > private def byteBufferAllocate(size: Int): ByteBuffer = { > var buffer: ByteBuffer = null > try { > buffer = ByteBuffer.allocate(size) > } > catch { > case e: OutOfMemoryError => > throw new RuntimeException("OOME with size " + size, e) > case e2 => > throw e2 > } > buffer > } > This hides the fact that an Error occurred, and will likely result in some > log handler printing a message, instead of exiting with non-zero status. > Knowing how large the allocation was that caused an OOM is really nice, so > I'd suggest logging in byteBufferAllocate and then re-throwing > OutOfMemoryError -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-204) BoundedByteBufferReceive hides OutOfMemoryError
[ https://issues.apache.org/jira/browse/KAFKA-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155857#comment-13155857 ] Chris Burroughs commented on KAFKA-204: --- > I mean once you hit that all bets are off, you need to restart your > process...basically I think we shouldn't be messing with that. Yeah, the most important thing to do is get out of the way and let the process exit with a non-zero status code. So the options as I see it are: (1) Do something ugly (like pass the original fetch request to byteBufferAllocate) for the purposes of a valiant but possible futile logging attempt (there is no guarantee we will be able to allocate the logging Strings we are already asking for, everything we ad makes that less likely). (2) Just rethrow e after a logging attempt in byteBufferAllocate. My preference is (2), but if someone prefers (1) that's a reasonable trade off. > BoundedByteBufferReceive hides OutOfMemoryError > --- > > Key: KAFKA-204 > URL: https://issues.apache.org/jira/browse/KAFKA-204 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.7 >Reporter: Chris Burroughs >Assignee: Chris Burroughs >Priority: Critical > Attachments: k204-v1.txt > > > private def byteBufferAllocate(size: Int): ByteBuffer = { > var buffer: ByteBuffer = null > try { > buffer = ByteBuffer.allocate(size) > } > catch { > case e: OutOfMemoryError => > throw new RuntimeException("OOME with size " + size, e) > case e2 => > throw e2 > } > buffer > } > This hides the fact that an Error occurred, and will likely result in some > log handler printing a message, instead of exiting with non-zero status. > Knowing how large the allocation was that caused an OOM is really nice, so > I'd suggest logging in byteBufferAllocate and then re-throwing > OutOfMemoryError -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-134) Upgrade Kafka to sbt 0.10.1
[ https://issues.apache.org/jira/browse/KAFKA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155962#comment-13155962 ] Chris Burroughs commented on KAFKA-134: --- Is 0.10 still the most recent version, or would it be easier to just jump to 0.11? > Upgrade Kafka to sbt 0.10.1 > --- > > Key: KAFKA-134 > URL: https://issues.apache.org/jira/browse/KAFKA-134 > Project: Kafka > Issue Type: Improvement > Components: packaging >Reporter: Joshua Hartman > Attachments: kafka_patch.txt > > > Upgrading to sbt 0.10.1 is a nice to have as sbt moves forward. Plus, it's a > requirement for me to help publish Kafka to maven :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-203) Improve Kafka internal metrics
[ https://issues.apache.org/jira/browse/KAFKA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155970#comment-13155970 ] Chris Burroughs commented on KAFKA-203: --- I *think* metrics is source compatible with 2.8, but I would need to investigate more. > Improve Kafka internal metrics > -- > > Key: KAFKA-203 > URL: https://issues.apache.org/jira/browse/KAFKA-203 > Project: Kafka > Issue Type: New Feature > Components: core >Reporter: Jay Kreps >Assignee: Jay Kreps > > Currently metrics in kafka are using old-school JMX directly. This makes > adding metrics a pain. It would be good to do one of the following: > 1. Convert to Coda Hale's metrics package > (https://github.com/codahale/metrics) > 2. Write a simple metrics package > The new metrics package should make metrics easier to add and work with and > package up the common logic of keeping windowed gauges, histograms, counters, > etc. JMX should be just one output of this. > The advantage of the Coda Hale package is that it exists so we don't need to > write it. The downsides are (1) introduces another client dependency which > causes conflicts, and (2) seems a bit heavy on design. The good news is that > the metrics-core package doesn't seem to bring in a lot of dependencies which > is nice, though the scala wrapper seems to want scala 2.9. I am also a little > skeptical of the approach for histograms--it does sampling instead of > bucketing though that may be okay. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-216) Add nunit license to the NOTICE file
[ https://issues.apache.org/jira/browse/KAFKA-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159503#comment-13159503 ] Chris Burroughs commented on KAFKA-216: --- Could you add something like "For Nunit used in clients/foo/bar" to the LICENSE text so this is easier to make sense of? Otherwise looks good to me. > Add nunit license to the NOTICE file > > > Key: KAFKA-216 > URL: https://issues.apache.org/jira/browse/KAFKA-216 > Project: Kafka > Issue Type: Bug > Components: packaging >Reporter: Neha Narkhede >Assignee: Jakob Homan >Priority: Blocker > Fix For: 0.7 > > Attachments: KAFKA-216.patch > > > According to yet some more feedback from general@, we need to add NUnit > (http://www.nunit.org/) to the NOTICE file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-230) Update website documentation to include changes in 0.7.0 release
[ https://issues.apache.org/jira/browse/KAFKA-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179002#comment-13179002 ] Chris Burroughs commented on KAFKA-230: --- See http://incubator.apache.org/guides/releasemanagement.html#understanding-upload for mirrors > Update website documentation to include changes in 0.7.0 release > > > Key: KAFKA-230 > URL: https://issues.apache.org/jira/browse/KAFKA-230 > Project: Kafka > Issue Type: Improvement > Components: website >Reporter: Neha Narkhede >Assignee: Neha Narkhede > Attachments: KAFKA-230.patch, KAFKA-230.patch > > > We need to update the website to reflect the changes to the code in the 0.7.0 > release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-230) Update website documentation to include changes in 0.7.0 release
[ https://issues.apache.org/jira/browse/KAFKA-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179555#comment-13179555 ] Chris Burroughs commented on KAFKA-230: --- The download link goes directly to people.a.o instead of the mirror system. > Update website documentation to include changes in 0.7.0 release > > > Key: KAFKA-230 > URL: https://issues.apache.org/jira/browse/KAFKA-230 > Project: Kafka > Issue Type: Improvement > Components: website >Reporter: Neha Narkhede >Assignee: Neha Narkhede > Attachments: KAFKA-230-Neha-v2.patch, KAFKA-230-design.patch, > KAFKA-230-v2.patch, KAFKA-230.patch, KAFKA-230.patch > > > We need to update the website to reflect the changes to the code in the 0.7.0 > release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira