Re: What to do when file.rename fails?
I think you are right, good catch. It could be that this user deleted the files manually, but I wonder if there isn't some way that is a Kafka bug--e.g. if multiple types of retention policies kick in at the same time do we synchronize that properly? -Jay On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: Hi Jay, I spent some more time over this today and went back to the original thread which brought up the issue with file leaks [1]. I think that output of lsof in that logs has a very important hint: /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/68818668.log (deleted) java 8446 root 725u REG 253,2 536910838 26087364 /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/69457098.log (deleted) java 8446 root 726u REG 253,2 536917902 26087368 Notice the (deleted) text in that output. The last time I looked at that output, I thought it was the user who had added that deleted text to help us understand that problem. But today I read up on the output format of lsof and it turns out that it's lsof which itself adds that hint whenever a file has already been deleted possibly by a different process but some other process is still holding on to open resources of that (deleted) file [2]. So in the context of the issue that we are discussing and the way Kafka deals with async deletes (i.e. first by attempting a rename of the log/index files), I think this all makes sense now. So what I think is happening is, some (other?) process (not sure what/why) has already deleted the log file that Kafka is using for the LogSegment. The LogSegment however still has open FileChannel resource on that deleted file (and that's why the open file descriptor is held on and shows up in that output). Now Kafka, at some point in time, triggers an async delete of the LogSegment, which involves a file rename of that (already deleted) log file. The rename fails (because the original file path isn't there anymore). As a result, we end up throwing that failed to rename, KafkaStorageException and thus leave behind the open FileChannel to continue being open forever (till the Kafka program exits). So I think we should: 1) Find what/why deletes that underlying log file(s). I'll add a reply to that original mail discussion asking the user if he can provide more details. 2) Handle this case and close the FileChannel. The patch that's been uploaded to review board https://reviews.apache.org/r/29755/ does that. The immediate delete on failure to rename, involves (safely) closing the open FileChannel and (safely) deleting the (possibly non-existent) file. By the way, this entire thing can be easily reproduced by running the following program which first creates a file and open a filechannel to that file and then waits for the user to delete that file externally (I used the rm command) and then go and tries to rename that deleted file, which then fails. In between each of these steps, you can run the lsof command externally to see the open file resources (I used 'lsof | grep test.log'): public static void main(String[] args) throws Exception { // Open a file and file channel for read/write final File originalLogFile = new File(/home/jaikiran/deleteme/test.log); // change this path relevantly if you plan to run it final FileChannel fileChannel = new RandomAccessFile(originalLogFile, rw).getChannel(); System.out.println(Opened file channel to + originalLogFile); // wait for underlying file to be deleted externally System.out.println(Waiting for the + originalLogFile + to be deleted externally. Press any key after the file is deleted); System.in.read(); // wait for the user to check the lsof output System.out.println(originalLogFile + seems to have been deleted externally, check lsof command output to see open file resources.); System.out.println(Press any key to try renaming this already deleted file, from the program); System.in.read(); // try rename final File fileToRenameTo = new File(originalLogFile.getPath() + .deleted); System.out.println(Trying to rename + originalLogFile + to + fileToRenameTo); final boolean renamedSucceeded = originalLogFile.renameTo( fileToRenameTo); if (renamedSucceeded) { System.out.println(Rename SUCCEEDED. Renamed file exists? + fileToRenameTo.exists()); } else { System.out.println(FAILED to rename file + originalLogFile + to + fileToRenameTo); } // wait for the user to check the lsof output, after our rename failed System.out.println(Check the lsof output and press any key to close the open file channel to a deleted file); System.in.read(); // close the file channel fileChannel.close();
0.9 security features and release schedule
Hi, We're assessing Kafka for use at Acquia. We run tens of thousands of customer websites and many internal applications on ~9k AWS EC2 instances. We're currently weighing the pros and cons of starting with 0.82 plus custom security, or waiting until the security features land in 0.9. How likely is Kafka 0.9 to ship in April 2015? (As seen here - https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.) How stable is the 0.9 branch? Is it crazy to consider running a 0.9 beta in production? Are there any existing patch sets against 0.8x that implement security features? thanks, Justin
Re: 0.9 security features and release schedule
Justin, I'd be interested to learn specifically what security features you are interested in. There is a doc on the wiki here https://cwiki.apache.org/confluence/display/KAFKA/Security As well as the Umbrella JIRA Here: https://issues.apache.org/jira/browse/KAFKA-1682 Most security-related JIRAs have a component listed as security https://issues.apache.org/jira/browse/KAFKA-1882?jql=project%20%3D%20KAFKA%20AND%20component%20%3D%20security There is a quite a bit of plumbing work to be done in order for all of the security features to come together. There technically isn't a 0.9 branch at the moment, all of the development is being applied to trunk and since there isn't as of yet even a 0.8.3 branch, I don't think you'd be in a good spot trying to pull in a bunch of stuff that isn't complete. Just my 2 cents. Thanks Jeff On Sun, Jan 25, 2015 at 10:33 AM, Justin Randell justin.rand...@acquia.com wrote: Hi, We're assessing Kafka for use at Acquia. We run tens of thousands of customer websites and many internal applications on ~9k AWS EC2 instances. We're currently weighing the pros and cons of starting with 0.82 plus custom security, or waiting until the security features land in 0.9. How likely is Kafka 0.9 to ship in April 2015? (As seen here - https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.) How stable is the 0.9 branch? Is it crazy to consider running a 0.9 beta in production? Are there any existing patch sets against 0.8x that implement security features? thanks, Justin -- Jeff Holoman Systems Engineer
Re: What to do when file.rename fails?
This feels like another type of symptom from people using /tmp/ for their logs. Perosnally, I would rather use /mnt/data or something and if that doesn't exist on their machine we can exception, or no default and force set it. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Jan 25, 2015 11:37 AM, Jay Kreps jay.kr...@gmail.com wrote: I think you are right, good catch. It could be that this user deleted the files manually, but I wonder if there isn't some way that is a Kafka bug--e.g. if multiple types of retention policies kick in at the same time do we synchronize that properly? -Jay On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: Hi Jay, I spent some more time over this today and went back to the original thread which brought up the issue with file leaks [1]. I think that output of lsof in that logs has a very important hint: /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/68818668.log (deleted) java 8446 root 725u REG 253,2 536910838 26087364 /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/69457098.log (deleted) java 8446 root 726u REG 253,2 536917902 26087368 Notice the (deleted) text in that output. The last time I looked at that output, I thought it was the user who had added that deleted text to help us understand that problem. But today I read up on the output format of lsof and it turns out that it's lsof which itself adds that hint whenever a file has already been deleted possibly by a different process but some other process is still holding on to open resources of that (deleted) file [2]. So in the context of the issue that we are discussing and the way Kafka deals with async deletes (i.e. first by attempting a rename of the log/index files), I think this all makes sense now. So what I think is happening is, some (other?) process (not sure what/why) has already deleted the log file that Kafka is using for the LogSegment. The LogSegment however still has open FileChannel resource on that deleted file (and that's why the open file descriptor is held on and shows up in that output). Now Kafka, at some point in time, triggers an async delete of the LogSegment, which involves a file rename of that (already deleted) log file. The rename fails (because the original file path isn't there anymore). As a result, we end up throwing that failed to rename, KafkaStorageException and thus leave behind the open FileChannel to continue being open forever (till the Kafka program exits). So I think we should: 1) Find what/why deletes that underlying log file(s). I'll add a reply to that original mail discussion asking the user if he can provide more details. 2) Handle this case and close the FileChannel. The patch that's been uploaded to review board https://reviews.apache.org/r/29755/ does that. The immediate delete on failure to rename, involves (safely) closing the open FileChannel and (safely) deleting the (possibly non-existent) file. By the way, this entire thing can be easily reproduced by running the following program which first creates a file and open a filechannel to that file and then waits for the user to delete that file externally (I used the rm command) and then go and tries to rename that deleted file, which then fails. In between each of these steps, you can run the lsof command externally to see the open file resources (I used 'lsof | grep test.log'): public static void main(String[] args) throws Exception { // Open a file and file channel for read/write final File originalLogFile = new File(/home/jaikiran/deleteme/test.log); // change this path relevantly if you plan to run it final FileChannel fileChannel = new RandomAccessFile(originalLogFile, rw).getChannel(); System.out.println(Opened file channel to + originalLogFile); // wait for underlying file to be deleted externally System.out.println(Waiting for the + originalLogFile + to be deleted externally. Press any key after the file is deleted); System.in.read(); // wait for the user to check the lsof output System.out.println(originalLogFile + seems to have been deleted externally, check lsof command output to see open file resources.); System.out.println(Press any key to try renaming this already deleted file, from the program); System.in.read(); // try rename final File fileToRenameTo = new File(originalLogFile.getPath() + .deleted); System.out.println(Trying to rename + originalLogFile + to + fileToRenameTo); final boolean renamedSucceeded
[jira] [Updated] (KAFKA-1810) Add IP Filtering / Whitelists-Blacklists
[ https://issues.apache.org/jira/browse/KAFKA-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Holoman updated KAFKA-1810: Component/s: security Add IP Filtering / Whitelists-Blacklists - Key: KAFKA-1810 URL: https://issues.apache.org/jira/browse/KAFKA-1810 Project: Kafka Issue Type: New Feature Components: core, network, security Reporter: Jeff Holoman Assignee: Jeff Holoman Priority: Minor Fix For: 0.8.3 Attachments: KAFKA-1810.patch, KAFKA-1810_2015-01-15_19:47:14.patch While longer-term goals of security in Kafka are on the roadmap there exists some value for the ability to restrict connection to Kafka brokers based on IP address. This is not intended as a replacement for security but more of a precaution against misconfiguration and to provide some level of control to Kafka administrators about who is reading/writing to their cluster. 1) In some organizations software administration vs o/s systems administration and network administration is disjointed and not well choreographed. Providing software administrators the ability to configure their platform relatively independently (after initial configuration) from Systems administrators is desirable. 2) Configuration and deployment is sometimes error prone and there are situations when test environments could erroneously read/write to production environments 3) An additional precaution against reading sensitive data is typically welcomed in most large enterprise deployments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1885) Allow test methods in core to be individually run from outside of the IDE
[ https://issues.apache.org/jira/browse/KAFKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1885: - Fix Version/s: 0.8.3 Allow test methods in core to be individually run from outside of the IDE --- Key: KAFKA-1885 URL: https://issues.apache.org/jira/browse/KAFKA-1885 Project: Kafka Issue Type: Improvement Components: system tests Reporter: jaikiran pai Assignee: jaikiran pai Fix For: 0.8.3 Attachments: KAFKA-1885.patch, KAFKA-1885_2015-01-24_10:42:46.patch, KAFKA-1885_2015-01-24_17:21:59.patch Gradle in combination with Java plugin allows test filtering which lets users run select test classes or even select test methods from the command line. See Test filtering section here http://www.gradle.org/docs/2.0/userguide/java_plugin.html#sec:java_test which has examples of the commands. Currently we have this working in the clients and I can run something like: {code} ./gradlew clients:test --tests org.apache.kafka.clients.producer.MetadataTest.testMetadataUpdateWaitTime {code} and that command then only runs that specific test method (testMetadataUpdateWaitTime) from the MetadataTest class. {code} To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. Building project 'core' with Scala version 2.10.4 :clients:compileJava UP-TO-DATE :clients:processResources UP-TO-DATE :clients:classes UP-TO-DATE :clients:compileTestJava UP-TO-DATE :clients:processTestResources UP-TO-DATE :clients:testClasses UP-TO-DATE :clients:test org.apache.kafka.clients.producer.MetadataTest testMetadataUpdateWaitTime PASSED BUILD SUCCESSFUL Total time: 12.714 secs {code} I've found this useful when I need to do some quick tests and also reproduce issues that aren't noticed sometimes if the whole test class is run. This currently only works for the clients and not for core --because the core doesn't have the Java plugin applied to it in the gradle build--. I've a patch which does that (and one other related thing) which then allowed me to run individual test methods even for the core tests. I will create a review request for it. Edit: I was wrong about the java plugin not being applied to core. It is indeed already applied but my attempt to get test methods running individually for core were failing for a different reason related to JUnit version dependency. I'll be addressing that in the patch and uploading for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1885) Allow test methods in core to be individually run from outside of the IDE
[ https://issues.apache.org/jira/browse/KAFKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1885: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk Allow test methods in core to be individually run from outside of the IDE --- Key: KAFKA-1885 URL: https://issues.apache.org/jira/browse/KAFKA-1885 Project: Kafka Issue Type: Improvement Components: system tests Reporter: jaikiran pai Assignee: jaikiran pai Attachments: KAFKA-1885.patch, KAFKA-1885_2015-01-24_10:42:46.patch, KAFKA-1885_2015-01-24_17:21:59.patch Gradle in combination with Java plugin allows test filtering which lets users run select test classes or even select test methods from the command line. See Test filtering section here http://www.gradle.org/docs/2.0/userguide/java_plugin.html#sec:java_test which has examples of the commands. Currently we have this working in the clients and I can run something like: {code} ./gradlew clients:test --tests org.apache.kafka.clients.producer.MetadataTest.testMetadataUpdateWaitTime {code} and that command then only runs that specific test method (testMetadataUpdateWaitTime) from the MetadataTest class. {code} To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. Building project 'core' with Scala version 2.10.4 :clients:compileJava UP-TO-DATE :clients:processResources UP-TO-DATE :clients:classes UP-TO-DATE :clients:compileTestJava UP-TO-DATE :clients:processTestResources UP-TO-DATE :clients:testClasses UP-TO-DATE :clients:test org.apache.kafka.clients.producer.MetadataTest testMetadataUpdateWaitTime PASSED BUILD SUCCESSFUL Total time: 12.714 secs {code} I've found this useful when I need to do some quick tests and also reproduce issues that aren't noticed sometimes if the whole test class is run. This currently only works for the clients and not for core --because the core doesn't have the Java plugin applied to it in the gradle build--. I've a patch which does that (and one other related thing) which then allowed me to run individual test methods even for the core tests. I will create a review request for it. Edit: I was wrong about the java plugin not being applied to core. It is indeed already applied but my attempt to get test methods running individually for core were failing for a different reason related to JUnit version dependency. I'll be addressing that in the patch and uploading for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1867) liveBroker list not updated on a cluster with no topics
[ https://issues.apache.org/jira/browse/KAFKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291468#comment-14291468 ] Sriharsha Chintalapani commented on KAFKA-1867: --- Thanks [~nehanarkhede] . Please check the latest patch. liveBroker list not updated on a cluster with no topics --- Key: KAFKA-1867 URL: https://issues.apache.org/jira/browse/KAFKA-1867 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Sriharsha Chintalapani Fix For: 0.8.3 Attachments: KAFKA-1867.patch, KAFKA-1867.patch, KAFKA-1867_2015-01-25_21:07:47.patch Currently, when there is no topic in a cluster, the controller doesn't send any UpdateMetadataRequest to the broker when it starts up. As a result, the liveBroker list in metadataCache is empty. This means that we will return incorrect broker list in TopicMetatadataResponse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30007: Patch for KAFKA-1867
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30007/ --- (Updated Jan. 26, 2015, 5:07 a.m.) Review request for kafka. Bugs: KAFKA-1867 https://issues.apache.org/jira/browse/KAFKA-1867 Repository: kafka Description (updated) --- KAFKA-1867. liveBroker list not updated on a cluster with no topics. Diffs (updated) - core/src/main/scala/kafka/controller/ControllerChannelManager.scala fbef34cad16afd0f77ba5e8d0bc63ea2b8e860b6 core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 90c0b7a19c7af8e5416e4bdba62b9824f1abd5ab core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 5b93239cdc26b5be7696f4e7863adb9fbe5f0ed5 Diff: https://reviews.apache.org/r/30007/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Updated] (KAFKA-1867) liveBroker list not updated on a cluster with no topics
[ https://issues.apache.org/jira/browse/KAFKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriharsha Chintalapani updated KAFKA-1867: -- Attachment: KAFKA-1867_2015-01-25_21:07:47.patch liveBroker list not updated on a cluster with no topics --- Key: KAFKA-1867 URL: https://issues.apache.org/jira/browse/KAFKA-1867 Project: Kafka Issue Type: Bug Reporter: Jun Rao Assignee: Sriharsha Chintalapani Fix For: 0.8.3 Attachments: KAFKA-1867.patch, KAFKA-1867.patch, KAFKA-1867_2015-01-25_21:07:47.patch Currently, when there is no topic in a cluster, the controller doesn't send any UpdateMetadataRequest to the broker when it starts up. As a result, the liveBroker list in metadataCache is empty. This means that we will return incorrect broker list in TopicMetatadataResponse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1693) Issue sending more messages to single Kafka server (Load testing for Kafka transport)
[ https://issues.apache.org/jira/browse/KAFKA-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291487#comment-14291487 ] rajendram kathees commented on KAFKA-1693: -- Thanks Rasmeet. The issue is resolved. Thanks, Kathees Issue sending more messages to single Kafka server (Load testing for Kafka transport) - Key: KAFKA-1693 URL: https://issues.apache.org/jira/browse/KAFKA-1693 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: Ubuntu 14, Java 6 Reporter: rajendram kathees Original Estimate: 24h Remaining Estimate: 24h I tried to send 5 messages to single Kafka server.I sent the messages to ESB using JMeter and ESB sent to Kafka server. After 28000 message I am getting following exception.Do I need to change any parameter value in Kafka server.Please give me the solution. [2014-10-06 11:41:05,182] ERROR - Utils$ fetching topic metadata for topics [Set(test1)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed kafka.common.KafkaException: fetching topic metadata for topics [Set(test1)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:67) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowError(Logging.scala:106) at kafka.utils.Utils$.swallowError(Utils.scala:46) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67) at kafka.producer.Producer.send(Producer.scala:76) at kafka.javaapi.producer.Producer.send(Producer.scala:33) at org.wso2.carbon.connector.KafkaProduce.send(KafkaProduce.java:71) at org.wso2.carbon.connector.KafkaProduce.connect(KafkaProduce.java:27) at org.wso2.carbon.connector.core.AbstractConnector.mediate(AbstractConnector.java:32) at org.apache.synapse.mediators.ext.ClassMediator.mediate(ClassMediator.java:78) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:77) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:47) at org.apache.synapse.mediators.template.TemplateMediator.mediate(TemplateMediator.java:77) at org.apache.synapse.mediators.template.InvokeMediator.mediate(InvokeMediator.java:129) at org.apache.synapse.mediators.template.InvokeMediator.mediate(InvokeMediator.java:78) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:77) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:47) at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:131) at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:166) at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180) at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEnclosingRESTHandler(ServerWorker.java:344) at org.apache.synapse.transport.passthru.ServerWorker.processEntityEnclosingRequest(ServerWorker.java:385) at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.java:183) at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:465) at sun.nio.ch.Net.connect(Net.java:457) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.producer.SyncProducer.connect(SyncProducer.scala:141) at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:156) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68) at kafka.producer.SyncProducer.send(SyncProducer.scala:112) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1693) Issue sending more messages to single Kafka server (Load testing for Kafka transport)
[ https://issues.apache.org/jira/browse/KAFKA-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajendram kathees resolved KAFKA-1693. -- Resolution: Fixed Issue sending more messages to single Kafka server (Load testing for Kafka transport) - Key: KAFKA-1693 URL: https://issues.apache.org/jira/browse/KAFKA-1693 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Environment: Ubuntu 14, Java 6 Reporter: rajendram kathees Original Estimate: 24h Remaining Estimate: 24h I tried to send 5 messages to single Kafka server.I sent the messages to ESB using JMeter and ESB sent to Kafka server. After 28000 message I am getting following exception.Do I need to change any parameter value in Kafka server.Please give me the solution. [2014-10-06 11:41:05,182] ERROR - Utils$ fetching topic metadata for topics [Set(test1)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed kafka.common.KafkaException: fetching topic metadata for topics [Set(test1)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:67) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:67) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowError(Logging.scala:106) at kafka.utils.Utils$.swallowError(Utils.scala:46) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:67) at kafka.producer.Producer.send(Producer.scala:76) at kafka.javaapi.producer.Producer.send(Producer.scala:33) at org.wso2.carbon.connector.KafkaProduce.send(KafkaProduce.java:71) at org.wso2.carbon.connector.KafkaProduce.connect(KafkaProduce.java:27) at org.wso2.carbon.connector.core.AbstractConnector.mediate(AbstractConnector.java:32) at org.apache.synapse.mediators.ext.ClassMediator.mediate(ClassMediator.java:78) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:77) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:47) at org.apache.synapse.mediators.template.TemplateMediator.mediate(TemplateMediator.java:77) at org.apache.synapse.mediators.template.InvokeMediator.mediate(InvokeMediator.java:129) at org.apache.synapse.mediators.template.InvokeMediator.mediate(InvokeMediator.java:78) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:77) at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:47) at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:131) at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:166) at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180) at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEnclosingRESTHandler(ServerWorker.java:344) at org.apache.synapse.transport.passthru.ServerWorker.processEntityEnclosingRequest(ServerWorker.java:385) at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.java:183) at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:465) at sun.nio.ch.Net.connect(Net.java:457) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) at kafka.producer.SyncProducer.connect(SyncProducer.scala:141) at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:156) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68) at kafka.producer.SyncProducer.send(SyncProducer.scala:112) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Kafka-trunk #381
See https://builds.apache.org/job/Kafka-trunk/381/changes Changes: [neha.narkhede] KAFKA-1885 Upgrade junit dependency in core to 4.6 version to allow running individual test methods via gradle command line; reviewed by Neha Narkhede [neha.narkhede] KAFKA-1818 KAFKA-1818 clean up code to more idiomatic scala usage; reviewed by Neha Narkhede and Gwen Shapira -- [...truncated 1310 lines...] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:39) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:36) at kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:39) at kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:39) kafka.integration.PrimitiveApiTest testPipelinedProduceRequests FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:39) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:36) at kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:39) at kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33) at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:39) kafka.integration.AutoOffsetResetTest testResetToEarliestWhenOffsetTooHigh FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:36) at kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46) kafka.integration.AutoOffsetResetTest testResetToEarliestWhenOffsetTooLow FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32) at kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:36) at kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46) kafka.integration.AutoOffsetResetTest testResetToLatestWhenOffsetTooHigh FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at
Re: What to do when file.rename fails?
Hmm, but I don't think tmp gets cleaned while the server is running... The reason for using tmp was because we don't know which directory they will use and we don't want them to have to edit configuration for the simple out of the box getting started tutorial. I actually do think that is important. Maybe an intermediate step we could do is just call out this setting in the quickstart so people know where data is going and know they need to configure it later... -Jay On Sun, Jan 25, 2015 at 9:32 AM, Joe Stein joe.st...@stealth.ly wrote: This feels like another type of symptom from people using /tmp/ for their logs. Perosnally, I would rather use /mnt/data or something and if that doesn't exist on their machine we can exception, or no default and force set it. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Jan 25, 2015 11:37 AM, Jay Kreps jay.kr...@gmail.com wrote: I think you are right, good catch. It could be that this user deleted the files manually, but I wonder if there isn't some way that is a Kafka bug--e.g. if multiple types of retention policies kick in at the same time do we synchronize that properly? -Jay On Sat, Jan 24, 2015 at 9:26 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: Hi Jay, I spent some more time over this today and went back to the original thread which brought up the issue with file leaks [1]. I think that output of lsof in that logs has a very important hint: /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/68818668.log (deleted) java 8446 root 725u REG 253,2 536910838 26087364 /home/work/data/soft/kafka-0.8/data/_oakbay_v2_search_ topic_ypgsearch_yellowpageV2-0/69457098.log (deleted) java 8446 root 726u REG 253,2 536917902 26087368 Notice the (deleted) text in that output. The last time I looked at that output, I thought it was the user who had added that deleted text to help us understand that problem. But today I read up on the output format of lsof and it turns out that it's lsof which itself adds that hint whenever a file has already been deleted possibly by a different process but some other process is still holding on to open resources of that (deleted) file [2]. So in the context of the issue that we are discussing and the way Kafka deals with async deletes (i.e. first by attempting a rename of the log/index files), I think this all makes sense now. So what I think is happening is, some (other?) process (not sure what/why) has already deleted the log file that Kafka is using for the LogSegment. The LogSegment however still has open FileChannel resource on that deleted file (and that's why the open file descriptor is held on and shows up in that output). Now Kafka, at some point in time, triggers an async delete of the LogSegment, which involves a file rename of that (already deleted) log file. The rename fails (because the original file path isn't there anymore). As a result, we end up throwing that failed to rename, KafkaStorageException and thus leave behind the open FileChannel to continue being open forever (till the Kafka program exits). So I think we should: 1) Find what/why deletes that underlying log file(s). I'll add a reply to that original mail discussion asking the user if he can provide more details. 2) Handle this case and close the FileChannel. The patch that's been uploaded to review board https://reviews.apache.org/r/29755/ does that. The immediate delete on failure to rename, involves (safely) closing the open FileChannel and (safely) deleting the (possibly non-existent) file. By the way, this entire thing can be easily reproduced by running the following program which first creates a file and open a filechannel to that file and then waits for the user to delete that file externally (I used the rm command) and then go and tries to rename that deleted file, which then fails. In between each of these steps, you can run the lsof command externally to see the open file resources (I used 'lsof | grep test.log'): public static void main(String[] args) throws Exception { // Open a file and file channel for read/write final File originalLogFile = new File(/home/jaikiran/deleteme/test.log); // change this path relevantly if you plan to run it final FileChannel fileChannel = new RandomAccessFile(originalLogFile, rw).getChannel(); System.out.println(Opened file channel to + originalLogFile); // wait for underlying file to be deleted externally System.out.println(Waiting for the + originalLogFile + to be deleted
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291482#comment-14291482 ] Ashish Kumar Singh commented on KAFKA-1888: --- [~gwenshap] this is indeed an important step towards easing out upgrade pains. [~joestein] gauntlet looks really nice. However, I agree with [~nehanarkhede] that we should try to improve upon existing framework and save ourselves from YAFS and maintenance cost. Just my 2cents. However, if you guys are not working on this, then I can take this one. Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Gwen Shapira Fix For: 0.9.0 To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1885) Allow test methods in core to be individually run from outside of the IDE
[ https://issues.apache.org/jira/browse/KAFKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1885: - Reviewer: Neha Narkhede Allow test methods in core to be individually run from outside of the IDE --- Key: KAFKA-1885 URL: https://issues.apache.org/jira/browse/KAFKA-1885 Project: Kafka Issue Type: Improvement Components: system tests Reporter: jaikiran pai Assignee: jaikiran pai Attachments: KAFKA-1885.patch, KAFKA-1885_2015-01-24_10:42:46.patch, KAFKA-1885_2015-01-24_17:21:59.patch Gradle in combination with Java plugin allows test filtering which lets users run select test classes or even select test methods from the command line. See Test filtering section here http://www.gradle.org/docs/2.0/userguide/java_plugin.html#sec:java_test which has examples of the commands. Currently we have this working in the clients and I can run something like: {code} ./gradlew clients:test --tests org.apache.kafka.clients.producer.MetadataTest.testMetadataUpdateWaitTime {code} and that command then only runs that specific test method (testMetadataUpdateWaitTime) from the MetadataTest class. {code} To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. Building project 'core' with Scala version 2.10.4 :clients:compileJava UP-TO-DATE :clients:processResources UP-TO-DATE :clients:classes UP-TO-DATE :clients:compileTestJava UP-TO-DATE :clients:processTestResources UP-TO-DATE :clients:testClasses UP-TO-DATE :clients:test org.apache.kafka.clients.producer.MetadataTest testMetadataUpdateWaitTime PASSED BUILD SUCCESSFUL Total time: 12.714 secs {code} I've found this useful when I need to do some quick tests and also reproduce issues that aren't noticed sometimes if the whole test class is run. This currently only works for the clients and not for core --because the core doesn't have the Java plugin applied to it in the gradle build--. I've a patch which does that (and one other related thing) which then allowed me to run individual test methods even for the core tests. I will create a review request for it. Edit: I was wrong about the java plugin not being applied to core. It is indeed already applied but my attempt to get test methods running individually for core were failing for a different reason related to JUnit version dependency. I'll be addressing that in the patch and uploading for review. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1818) Code cleanup in ReplicationUtils including unit test
[ https://issues.apache.org/jira/browse/KAFKA-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1818: - Fix Version/s: 0.8.3 Code cleanup in ReplicationUtils including unit test Key: KAFKA-1818 URL: https://issues.apache.org/jira/browse/KAFKA-1818 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 0.8.1.1 Reporter: Eric Olander Assignee: Neha Narkhede Priority: Trivial Fix For: 0.8.3 Attachments: 0001-KAFKA-1818-clean-up-code-to-more-idiomatic-scala-usa.patch Code in getLeaderIsrAndEpochForPartition() and parseLeaderAndIsr() was essentially reimplementing the flatMap function on the Option type. The attached patch refactors that code to more idiomatic Scala and provides a unit test over the affected code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1901) Move Kafka version to be generated in code by build (instead of in manifest)
Jason Rosenberg created KAFKA-1901: -- Summary: Move Kafka version to be generated in code by build (instead of in manifest) Key: KAFKA-1901 URL: https://issues.apache.org/jira/browse/KAFKA-1901 Project: Kafka Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jason Rosenberg With 0.8.2 (rc2), I've started seeing this warning in the logs of apps deployed to our staging (both server and client): {code} 2015-01-23 00:55:25,273 WARN [async-message-sender-0] common.AppInfo$ - Can't read Kafka version from MANIFEST.MF. Possible cause: java.lang.NullPointerException {code} The issues is that in our deployment, apps are deployed with single 'shaded' jars (e.g. using the maven shade plugin). This means the MANIFEST.MF file won't have a kafka version. Instead, suggest the kafka build generate the proper version in code, as part of the build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Cannot stop Kafka server if zookeeper is shutdown first
For a clean shutdown, the broker tries to talk to the controller and also issues reads to zookeeper. Possibly that is where it tries to reconnect to zk. It will help to look at the thread dump. Thanks Neha On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai jai.forums2...@gmail.com wrote: I was just playing around with the RC2 of 0.8.2 and noticed that if I shutdown zookeeper first I can't shutdown Kafka server at all since it goes into a never ending attempt to reconnect with zookeeper. I had to kill the Kafka process to stop it. I tried it against trunk too and there too I see the same issue. Should I file a JIRA for this and see if I can come up with a patch? FWIW, here's the unending (and IMO too frequent) attempts at trying to reconnect. I've a thread dump too which shows that the other thread which is trying to complete a controlled shutdown of Kafka is blocked forever for the zookeeper to be up. I can attach it to the JIRA. 2015-01-24 10:15:46,278] WARN Session 0x14b1a413680 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect( SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run( ClientCnxn.java:1081) [2015-01-24 10:15:47,437] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2015-01-24 10:15:47,438] WARN Session 0x14b1a413680 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect( SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run( ClientCnxn.java:1081) [2015-01-24 10:15:49,056] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2015-01-24 10:15:49,057] WARN Session 0x14b1a413680 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect( SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run( ClientCnxn.java:1081) [2015-01-24 10:15:50,801] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2015-01-24 10:15:50,802] WARN Session 0x14b1a413680 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect( SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run( ClientCnxn.java:1081) -Jaikiran -- Thanks, Neha
[jira] [Updated] (KAFKA-1862) Pass in the Time object into OffsetManager
[ https://issues.apache.org/jira/browse/KAFKA-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1862: - Assignee: Aditya Auradkar Pass in the Time object into OffsetManager -- Key: KAFKA-1862 URL: https://issues.apache.org/jira/browse/KAFKA-1862 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Aditya Auradkar Labels: newbie++ Fix For: 0.9.0 We should improve OffsetManager to take in a Time instance as we do for LogManager and ReplicaManager. That way we can advance time with MockTime in test cases. Then we can move the testOffsetExpiration case from OffsetCommitTest to OffsetManagerTest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30196: Patch for KAFKA-1886
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30196/#review69579 --- core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala https://reviews.apache.org/r/30196/#comment114287 what is the purpose of this sleep? - Neha Narkhede On Jan. 22, 2015, 10:35 p.m., Aditya Auradkar wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30196/ --- (Updated Jan. 22, 2015, 10:35 p.m.) Review request for kafka and Joel Koshy. Bugs: KAFKA-1886 https://issues.apache.org/jira/browse/KAFKA-1886 Repository: kafka Description --- Fixing KAFKA-1886. Forcing the SimpleConsumer to throw a ClosedByInterruptException if thrown and not retry Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into trunk Diffs - core/src/main/scala/kafka/consumer/SimpleConsumer.scala cbef84ac76e62768981f74e71d451f2bda995275 core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala a5386a03b62956bc440b40783247c8cdf7432315 Diff: https://reviews.apache.org/r/30196/diff/ Testing --- Added an integration test to PrimitiveAPITest.scala. Thanks, Aditya Auradkar
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291340#comment-14291340 ] Neha Narkhede commented on KAFKA-1888: -- This is a very useful test. However, I'd recommend at least looking into the system test framework before deciding on adding yet another way to do write a Kafka test. It will be helpful to see a summary of why this cannot be done using the existing system test framework. Arguably, there may be another way to design the system test framework which is easier to use, but given that we have the current one, we should know what it doesn't support so we can address that in the next redesign of the test framework :) Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Gwen Shapira Fix For: 0.9.0 To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1886) SimpleConsumer swallowing ClosedByInterruptException
[ https://issues.apache.org/jira/browse/KAFKA-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284963#comment-14284963 ] Neha Narkhede edited comment on KAFKA-1886 at 1/26/15 1:25 AM: --- If interested, I hacked an existing test for this. {code} def testConsumerEmptyTopic() { val newTopic = new-topic TestUtils.createTopic(zkClient, newTopic, numPartitions = 1, replicationFactor = 1, servers = servers) val thread = new Thread { override def run { System.out.println(Starting the fetch) val start = System.currentTimeMillis() try { val fetchResponse = consumer.fetch(new FetchRequestBuilder().minBytes(10).maxWait(3000).addFetch(newTopic, 0, 0, 1).build()) } catch { case e: Throwable ={ val end = System.currentTimeMillis() System.out.println(Caught exception + e + . Took + (end - start)); System.out.println(Fetch interrupted + Thread.currentThread().isInterrupted) } } } } thread.start() Thread.sleep(1000) thread.interrupt() thread.join() System.out.println(Ending test) } {code} was (Author: auradkar): If interested, I hacked an existing test for this. def testConsumerEmptyTopic() { val newTopic = new-topic TestUtils.createTopic(zkClient, newTopic, numPartitions = 1, replicationFactor = 1, servers = servers) val thread = new Thread { override def run { System.out.println(Starting the fetch) val start = System.currentTimeMillis() try { val fetchResponse = consumer.fetch(new FetchRequestBuilder().minBytes(10).maxWait(3000).addFetch(newTopic, 0, 0, 1).build()) } catch { case e: Throwable ={ val end = System.currentTimeMillis() System.out.println(Caught exception + e + . Took + (end - start)); System.out.println(Fetch interrupted + Thread.currentThread().isInterrupted) } } } } thread.start() Thread.sleep(1000) thread.interrupt() thread.join() System.out.println(Ending test) } SimpleConsumer swallowing ClosedByInterruptException Key: KAFKA-1886 URL: https://issues.apache.org/jira/browse/KAFKA-1886 Project: Kafka Issue Type: Bug Components: producer Reporter: Aditya A Auradkar Assignee: Jun Rao Attachments: KAFKA-1886.patch This issue was originally reported by a Samza developer. I've included an exchange of mine with Chris Riccomini. I'm trying to reproduce the problem on my dev setup. From: criccomi Hey all, Samza's BrokerProxy [1] threads appear to be wedging randomly when we try to interrupt its fetcher thread. I noticed that SimpleConsumer.scala catches Throwable in its sendRequest method [2]. I'm wondering: if blockingChannel.send/receive throws a ClosedByInterruptException when the thread is interrupted, what happens? It looks like sendRequest will catch the exception (which I think clears the thread's interrupted flag), and then retries the send. If the send succeeds on the retry, I think that the ClosedByInterruptException exception is effectively swallowed, and the BrokerProxy will continue fetching messages as though its thread was never interrupted. Am I misunderstanding how things work? Cheers, Chris [1] https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala#L126 [2] https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L75 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1886) SimpleConsumer swallowing ClosedByInterruptException
[ https://issues.apache.org/jira/browse/KAFKA-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1886: - Assignee: Aditya Auradkar (was: Jun Rao) SimpleConsumer swallowing ClosedByInterruptException Key: KAFKA-1886 URL: https://issues.apache.org/jira/browse/KAFKA-1886 Project: Kafka Issue Type: Bug Components: producer Reporter: Aditya A Auradkar Assignee: Aditya Auradkar Attachments: KAFKA-1886.patch This issue was originally reported by a Samza developer. I've included an exchange of mine with Chris Riccomini. I'm trying to reproduce the problem on my dev setup. From: criccomi Hey all, Samza's BrokerProxy [1] threads appear to be wedging randomly when we try to interrupt its fetcher thread. I noticed that SimpleConsumer.scala catches Throwable in its sendRequest method [2]. I'm wondering: if blockingChannel.send/receive throws a ClosedByInterruptException when the thread is interrupted, what happens? It looks like sendRequest will catch the exception (which I think clears the thread's interrupted flag), and then retries the send. If the send succeeds on the retry, I think that the ClosedByInterruptException exception is effectively swallowed, and the BrokerProxy will continue fetching messages as though its thread was never interrupted. Am I misunderstanding how things work? Cheers, Chris [1] https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala#L126 [2] https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L75 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1886) SimpleConsumer swallowing ClosedByInterruptException
[ https://issues.apache.org/jira/browse/KAFKA-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1886: - Reviewer: Neha Narkhede SimpleConsumer swallowing ClosedByInterruptException Key: KAFKA-1886 URL: https://issues.apache.org/jira/browse/KAFKA-1886 Project: Kafka Issue Type: Bug Components: producer Reporter: Aditya A Auradkar Assignee: Aditya Auradkar Attachments: KAFKA-1886.patch This issue was originally reported by a Samza developer. I've included an exchange of mine with Chris Riccomini. I'm trying to reproduce the problem on my dev setup. From: criccomi Hey all, Samza's BrokerProxy [1] threads appear to be wedging randomly when we try to interrupt its fetcher thread. I noticed that SimpleConsumer.scala catches Throwable in its sendRequest method [2]. I'm wondering: if blockingChannel.send/receive throws a ClosedByInterruptException when the thread is interrupted, what happens? It looks like sendRequest will catch the exception (which I think clears the thread's interrupted flag), and then retries the send. If the send succeeds on the retry, I think that the ClosedByInterruptException exception is effectively swallowed, and the BrokerProxy will continue fetching messages as though its thread was never interrupted. Am I misunderstanding how things work? Cheers, Chris [1] https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala#L126 [2] https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L75 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29840: Patch for KAFKA-1818
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29840/#review69580 --- core/src/main/scala/kafka/utils/ReplicationUtils.scala https://reviews.apache.org/r/29840/#comment114288 minor formatting nit: Need to include a space after flatMap - Neha Narkhede On Jan. 17, 2015, 3:32 p.m., Eric Olander wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29840/ --- (Updated Jan. 17, 2015, 3:32 p.m.) Review request for kafka. Bugs: KAFKA-1818 https://issues.apache.org/jira/browse/KAFKA-1818 Repository: kafka Description --- KAFKA-1818 clean up code to more idiomatic scala usage Diffs - core/src/main/scala/kafka/utils/ReplicationUtils.scala 715767380f7c284148689fd34d4bfba51abd96a0 core/src/test/scala/unit/kafka/utils/ReplicationUtilsTest.scala 84e08557de5acdcf0a98b192feac72836ea359b8 Diff: https://reviews.apache.org/r/29840/diff/ Testing --- This patch replaces some pattern matching on Option with calls to flatMap. Unit test added to cover both possible return values from ReplicationUtils.getLeaderIsrAndEpochForPartition. Thanks, Eric Olander
Re: 0.9 security features and release schedule
hi, On Sun, Jan 25, 2015 at 11:06 AM, Jeff Holoman jholo...@cloudera.com wrote: Justin, I'd be interested to learn specifically what security features you are interested in. we are interested in both simple authentication and per-topic authorization. we can get started with simple authentication, but will need per-topic authorization or encrypted message bodies before we can send some event data over Kafka. There is a doc on the wiki here https://cwiki.apache.org/confluence/display/KAFKA/Security As well as the Umbrella JIRA Here: https://issues.apache.org/jira/browse/KAFKA-1682 thanks, i've read the security page in the wiki, and i'm following the jira issues. Most security-related JIRAs have a component listed as security https://issues.apache.org/jira/browse/KAFKA-1882?jql=project%20%3D%20KAFKA%20AND%20component%20%3D%20security There is a quite a bit of plumbing work to be done in order for all of the security features to come together. There technically isn't a 0.9 branch at the moment, all of the development is being applied to trunk and since there isn't as of yet even a 0.8.3 branch, I don't think you'd be in a good spot trying to pull in a bunch of stuff that isn't complete. right, our initial PoC work and planning for 0.8.2 and security has focused on simple wrappers around Kafka 0.8.2. for example, adding an iptables/security group whitelist. this only gets us part of the way, as by necessity we run customer code on a large minority of our productions machines (we host a large number of high traffic Drupal sites), and that code can legitimately talk to the internet. those instances should be significant producers connected to Kafka, but we can't simply trust traffic from those instances. we tested haproxy in front of Kafka, requiring a client SSL certificate for any code that talks to haproxy. then, we used advertised.port and advertised.hostname, and monkey-patched the posseidon ruby client to speak SSL. this works up to a point, but has it's own problems. if we use one Kafka cluster, we have to teach all clients to speak SSL and offer a valid certificate. or, we teach some clients to ignore advertised.hostname, and some to honour it, depending which method we use to trust the traffic. another option is to mirror traffic from the SSL-wrapped kafka to the iptables-wrapped Kafka. i also wrote a simple websocket - Kafka bridge using the Shopify sarama client, and that works nicely but would take some effort to bring to production readiness, and doesn't have any per-topic authorization. we think we can use a some combination of the above to start using Kafka for a large chunk of our log pipeline needs, but it's far from ideal compared to simply using built-in security features within Kafka. how do others solve these sorts of problems with Kafka? cheers Justin Just my 2 cents. Thanks Jeff On Sun, Jan 25, 2015 at 10:33 AM, Justin Randell justin.rand...@acquia.com wrote: Hi, We're assessing Kafka for use at Acquia. We run tens of thousands of customer websites and many internal applications on ~9k AWS EC2 instances. We're currently weighing the pros and cons of starting with 0.82 plus custom security, or waiting until the security features land in 0.9. How likely is Kafka 0.9 to ship in April 2015? (As seen here - https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan.) How stable is the 0.9 branch? Is it crazy to consider running a 0.9 beta in production? Are there any existing patch sets against 0.8x that implement security features? thanks, Justin -- Jeff Holoman Systems Engineer
Review Request 30259: Add static code coverage reporting capability
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30259/ --- Review request for kafka. Bugs: KAFKA-1722 https://issues.apache.org/jira/browse/KAFKA-1722 Repository: kafka Description --- KAFKA-1722: Add static code coverage capability Diffs - build.gradle 1cbab29ce83e20dae0561b51eed6fdb86d522f28 core/src/main/scala/kafka/utils/ZkUtils.scala c14bd455b6642f5e6eb254670bef9f57ae41d6cb Diff: https://reviews.apache.org/r/30259/diff/ Testing --- Thanks, Ashish Singh
Re: Review Request 30259: Add static code coverage reporting capability
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30259/ --- (Updated Jan. 25, 2015, 8:47 p.m.) Review request for kafka. Changes --- Add some notes. Bugs: KAFKA-1722 https://issues.apache.org/jira/browse/KAFKA-1722 Repository: kafka Description --- KAFKA-1722: Add static code coverage capability Diffs - build.gradle 1cbab29ce83e20dae0561b51eed6fdb86d522f28 core/src/main/scala/kafka/utils/ZkUtils.scala c14bd455b6642f5e6eb254670bef9f57ae41d6cb Diff: https://reviews.apache.org/r/30259/diff/ Testing (updated) --- How to run: ./gradlew sonarRunner -PscalaVersion=2.11 Note that if you do not have sonarqube running on your system. The sonarRunner task will fail, but it would have generated coverage reports for core and clients at core/build/reports/scoverage/ and clients/build/reports/jacocoHtml respectively. Open index.html in any of those dirs to see the coverage. Once gradle-scoverage starts publishing scoverage report, a single report generated from sonar will be available. Thanks, Ashish Singh
[jira] [Created] (KAFKA-1900) Add documentation on usage of code coverage in the project and how it works
Ashish Kumar Singh created KAFKA-1900: - Summary: Add documentation on usage of code coverage in the project and how it works Key: KAFKA-1900 URL: https://issues.apache.org/jira/browse/KAFKA-1900 Project: Kafka Issue Type: Sub-task Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1899) Update code coverage report generation once gradle-scoverage starts publishing scoverage report
Ashish Kumar Singh created KAFKA-1899: - Summary: Update code coverage report generation once gradle-scoverage starts publishing scoverage report Key: KAFKA-1899 URL: https://issues.apache.org/jira/browse/KAFKA-1899 Project: Kafka Issue Type: Sub-task Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1722) static analysis code coverage for pci audit needs
[ https://issues.apache.org/jira/browse/KAFKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated KAFKA-1722: -- Attachment: core coverage.png clients coverage.png Attaching screen shots of top level coverage pages for clients and core. User can browse through these pages to actual code to find out which lines/ branches are not covered. static analysis code coverage for pci audit needs - Key: KAFKA-1722 URL: https://issues.apache.org/jira/browse/KAFKA-1722 Project: Kafka Issue Type: Bug Components: security Reporter: Joe Stein Assignee: Ashish Kumar Singh Fix For: 0.9.0 Attachments: KAFKA-1722.patch, clients coverage.png, core coverage.png Code coverage is a measure used to describe the degree to which the source code of a product is tested. A product with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a product with low code coverage. Apart from PCI audit needs, increasing user base of Kafka makes it important to increase code coverage of Kafka. Something just can not be improved without being measured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1722) static analysis code coverage for pci audit needs
[ https://issues.apache.org/jira/browse/KAFKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated KAFKA-1722: -- Attachment: Sonar's summary report.png static analysis code coverage for pci audit needs - Key: KAFKA-1722 URL: https://issues.apache.org/jira/browse/KAFKA-1722 Project: Kafka Issue Type: Bug Components: security Reporter: Joe Stein Assignee: Ashish Kumar Singh Fix For: 0.9.0 Attachments: KAFKA-1722.patch, Sonar's summary report.png, clients coverage.png, core coverage.png Code coverage is a measure used to describe the degree to which the source code of a product is tested. A product with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a product with low code coverage. Apart from PCI audit needs, increasing user base of Kafka makes it important to increase code coverage of Kafka. Something just can not be improved without being measured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1722) static analysis code coverage for pci audit needs
[ https://issues.apache.org/jira/browse/KAFKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291266#comment-14291266 ] Ashish Kumar Singh commented on KAFKA-1722: --- How to run: ./gradlew sonarRunner -PscalaVersion=2.11 Note that if you do not have sonarqube running on your system. The sonarRunner task will fail, but it would have generated coverage reports for core and clients at core/build/reports/scoverage/ and clients/build/reports/jacocoHtml respectively. Open index.html in any of those dirs to see the coverage. Once gradle-scoverage starts publishing scoverage report, a single report generated from sonar will be available. static analysis code coverage for pci audit needs - Key: KAFKA-1722 URL: https://issues.apache.org/jira/browse/KAFKA-1722 Project: Kafka Issue Type: Bug Components: security Reporter: Joe Stein Assignee: Ashish Kumar Singh Fix For: 0.9.0 Attachments: KAFKA-1722.patch Code coverage is a measure used to describe the degree to which the source code of a product is tested. A product with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a product with low code coverage. Apart from PCI audit needs, increasing user base of Kafka makes it important to increase code coverage of Kafka. Something just can not be improved without being measured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1722) static analysis code coverage for pci audit needs
[ https://issues.apache.org/jira/browse/KAFKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291268#comment-14291268 ] Ashish Kumar Singh edited comment on KAFKA-1722 at 1/25/15 9:01 PM: [~joestein] could you take a look at the patch and approach. I have created a couple of sub-tasks, which I believe will be done once we finalize on this. Also, do let me know if this requires a KIP, I will create one if required. was (Author: singhashish): [~joestein] could you take a look at the patch and approach. I have created a couple of sub-tasks, which I believe will be done once we finalize on this. static analysis code coverage for pci audit needs - Key: KAFKA-1722 URL: https://issues.apache.org/jira/browse/KAFKA-1722 Project: Kafka Issue Type: Bug Components: security Reporter: Joe Stein Assignee: Ashish Kumar Singh Fix For: 0.9.0 Attachments: KAFKA-1722.patch, clients coverage.png, core coverage.png Code coverage is a measure used to describe the degree to which the source code of a product is tested. A product with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a product with low code coverage. Apart from PCI audit needs, increasing user base of Kafka makes it important to increase code coverage of Kafka. Something just can not be improved without being measured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30084: Patch for KAFKA-1866
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30084/#review69584 --- Can we add a test for this? If we add more metrics in the future, hopefully that test would fail if we forget to add the new metrics to this removeMetrics() API - Neha Narkhede On Jan. 20, 2015, 7:55 p.m., Sriharsha Chintalapani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30084/ --- (Updated Jan. 20, 2015, 7:55 p.m.) Review request for kafka. Bugs: KAFKA-1866 https://issues.apache.org/jira/browse/KAFKA-1866 Repository: kafka Description --- KAFKA-1866. LogStartOffset gauge throws exceptions after log.delete(). Diffs - core/src/main/scala/kafka/log/Log.scala 846023bb98d0fa0603016466360c97071ac935ea Diff: https://reviews.apache.org/r/30084/diff/ Testing --- Thanks, Sriharsha Chintalapani
[jira] [Closed] (KAFKA-1045) producer zk.connect config
[ https://issues.apache.org/jira/browse/KAFKA-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-1045. producer zk.connect config -- Key: KAFKA-1045 URL: https://issues.apache.org/jira/browse/KAFKA-1045 Project: Kafka Issue Type: Bug Reporter: sjk Fix For: 0.8.2 java.lang.IllegalArgumentException: requirement failed: Missing required property 'metadata.broker.list' props.put(zk.connect, KafkaConfig.getZooAddress()); when i config zk, why the above tip appear? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30073: Patch for KAFKA-1109
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30073/#review69585 --- Ship it! Ship It! - Neha Narkhede On Jan. 20, 2015, 12:07 p.m., Manikumar Reddy O wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30073/ --- (Updated Jan. 20, 2015, 12:07 p.m.) Review request for kafka. Bugs: KAFKA-1109 https://issues.apache.org/jira/browse/KAFKA-1109 Repository: kafka Description --- Corrected kafka-run-class.sh script to override KAFKA_GC_LOG_OPTS environment property Diffs - bin/kafka-run-class.sh 22a9865b5939450a9d7f4ea2eee5eba2c1ec758c Diff: https://reviews.apache.org/r/30073/diff/ Testing --- Thanks, Manikumar Reddy O
[jira] [Updated] (KAFKA-1109) Need to fix GC log configuration code, not able to override KAFKA_GC_LOG_OPTS
[ https://issues.apache.org/jira/browse/KAFKA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1109: - Fix Version/s: 0.8.3 Need to fix GC log configuration code, not able to override KAFKA_GC_LOG_OPTS - Key: KAFKA-1109 URL: https://issues.apache.org/jira/browse/KAFKA-1109 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Environment: *nix Reporter: Viktor Kolodrevskiy Assignee: Manikumar Reddy Fix For: 0.8.3 Attachments: KAFKA-1109.patch, KAFKA_1109_fix.patch kafka-run-class.sh contains GC log code: # GC options GC_FILE_SUFFIX='-gc.log' GC_LOG_FILE_NAME='' if [ $1 = daemon ] [ -z $KAFKA_GC_LOG_OPTS] ; then shift GC_LOG_FILE_NAME=$1$GC_FILE_SUFFIX shift KAFKA_GC_LOG_OPTS=-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps fi So when in my scripts I start kafka and want to override KAFKA_GC_LOG_OPTS by exporting new values I get: Exception in thread main java.lang.NoClassDefFoundError: daemon Caused by: java.lang.ClassNotFoundException: daemon That's because shift is not done when KAFKA_GC_LOG_OPTS is set and daemon is passed as main class. I suggest to replace it with this code: # GC options GC_FILE_SUFFIX='-gc.log' GC_LOG_FILE_NAME='' if [ $1 = daemon ] [ -z $KAFKA_GC_LOG_OPTS ] ; then shift GC_LOG_FILE_NAME=$1$GC_FILE_SUFFIX shift KAFKA_GC_LOG_OPTS=-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps else if [ $1 = daemon ] [ $KAFKA_GC_LOG_OPTS != ] ; then shift 2 fi fi -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1109) Need to fix GC log configuration code, not able to override KAFKA_GC_LOG_OPTS
[ https://issues.apache.org/jira/browse/KAFKA-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1109: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk Need to fix GC log configuration code, not able to override KAFKA_GC_LOG_OPTS - Key: KAFKA-1109 URL: https://issues.apache.org/jira/browse/KAFKA-1109 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Environment: *nix Reporter: Viktor Kolodrevskiy Assignee: Manikumar Reddy Fix For: 0.8.3 Attachments: KAFKA-1109.patch, KAFKA_1109_fix.patch kafka-run-class.sh contains GC log code: # GC options GC_FILE_SUFFIX='-gc.log' GC_LOG_FILE_NAME='' if [ $1 = daemon ] [ -z $KAFKA_GC_LOG_OPTS] ; then shift GC_LOG_FILE_NAME=$1$GC_FILE_SUFFIX shift KAFKA_GC_LOG_OPTS=-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps fi So when in my scripts I start kafka and want to override KAFKA_GC_LOG_OPTS by exporting new values I get: Exception in thread main java.lang.NoClassDefFoundError: daemon Caused by: java.lang.ClassNotFoundException: daemon That's because shift is not done when KAFKA_GC_LOG_OPTS is set and daemon is passed as main class. I suggest to replace it with this code: # GC options GC_FILE_SUFFIX='-gc.log' GC_LOG_FILE_NAME='' if [ $1 = daemon ] [ -z $KAFKA_GC_LOG_OPTS ] ; then shift GC_LOG_FILE_NAME=$1$GC_FILE_SUFFIX shift KAFKA_GC_LOG_OPTS=-Xloggc:$LOG_DIR/$GC_LOG_FILE_NAME -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps else if [ $1 = daemon ] [ $KAFKA_GC_LOG_OPTS != ] ; then shift 2 fi fi -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30062: Patch for KAFKA-1883
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30062/#review69586 --- Ship it! Ship It! - Neha Narkhede On Jan. 20, 2015, 3:40 a.m., Jaikiran Pai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30062/ --- (Updated Jan. 20, 2015, 3:40 a.m.) Review request for kafka. Bugs: KAFKA-1883 https://issues.apache.org/jira/browse/KAFKA-1883 Repository: kafka Description --- KAFKA-1883 Fix NullPointerException in RequestSendThread Diffs - core/src/main/scala/kafka/controller/ControllerChannelManager.scala eb492f00449744bc8d63f55b393e2a1659d38454 Diff: https://reviews.apache.org/r/30062/diff/ Testing --- Thanks, Jaikiran Pai
[jira] [Commented] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic
[ https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291407#comment-14291407 ] Neha Narkhede commented on KAFKA-1852: -- [~jjkoshy] would you like to take a look at this patch? OffsetCommitRequest can commit offset on unknown topic -- Key: KAFKA-1852 URL: https://issues.apache.org/jira/browse/KAFKA-1852 Project: Kafka Issue Type: Bug Affects Versions: 0.8.3 Reporter: Jun Rao Assignee: Sriharsha Chintalapani Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch Currently, we allow an offset to be committed to Kafka, even when the topic/partition for the offset doesn't exist. We probably should disallow that and send an error back in that case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1883) NullPointerException in RequestSendThread
[ https://issues.apache.org/jira/browse/KAFKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1883: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk NullPointerException in RequestSendThread - Key: KAFKA-1883 URL: https://issues.apache.org/jira/browse/KAFKA-1883 Project: Kafka Issue Type: Bug Components: core Reporter: jaikiran pai Assignee: jaikiran pai Fix For: 0.8.3 Attachments: KAFKA-1883.patch I often see the following exception while running some tests (ProducerFailureHandlingTest.testNoResponse is one such instance): {code} [2015-01-19 22:30:24,257] ERROR [Controller-0-to-broker-1-send-thread], Controller 0 fails to send a request to broker id:1,host:localhost,port:56729 (kafka.controller.RequestSendThread:103) java.lang.NullPointerException at kafka.controller.RequestSendThread.doWork(ControllerChannelManager. scala:150) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} Looking at that code in question, I can see that the NPE can be triggered when the receive is null which can happen if the isRunning is false (i.e a shutdown has been requested). -- This message was sent by Atlassian JIRA (v6.3.4#6332)