date:20160622

Re: Possibly very minor typo in documentation.html

2016-06-22 Thread Guozhang Wang

Thanks for reporting this Tyler, we will fix the docs.

Guozhang

On Wed, Jun 22, 2016 at 10:56 AM, Tyler  wrote:

> Note: Please CC me if needed, I am not subscribed to this list.
>
> 0.9 changes say: “Java 1.6 is no longer supported”
>
> However, the CWD in this example at
> http://kafka.apache.org/documentation.html#quickstart_multibroker <
> http://kafka.apache.org/documentation.html#quickstart_multibroker> says:
>
> "Now let's test out fault-tolerance. Broker 1 was acting as the leader so
> let's kill it:
> > ps | grep server-1.properties
> 7564
>  ttys0020:15.91
> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java...
> >
> kill -9 7564”
>
> Seems to say you’re in a Java 1.6 world — see bold text.
>
> Does that ps paste need some minor updating? ^F for 1.6 finds it nowhere
> else, which is how I originally found it.
>
> Thanks for all the great work,
> Tyler
> Seattle, WA




-- 
-- Guozhang

Re: Kafka HDFS Connector

2016-06-22 Thread Pariksheet Barapatre

Many Thanks Dave and Dustin for your inputs. I will check code and try to
implement proposed solution.

Cheers
Pari

On 22 June 2016 at 23:25, Dustin Cote  wrote:

> Yes, I believe what you're looking for is what Dave described.  Here's the
> source of that interface
>
> https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/Format.java
>  There
> already exists a StringConverter that should handle the conversion in and
> out of the connect data format in your case
>
> https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/storage/StringConverter.html
> .
> I think that's what you are looking for in terms of a Converter.  It looks
> like your bigger need is the output format for HDFS.
>
> FYI -- you are most welcome to add your request at the GitHub issues page
> for the HDFS Connector
> https://github.com/confluentinc/kafka-connect-hdfs/issues
>
> On Wed, Jun 22, 2016 at 1:26 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com
> > wrote:
>
> > I don't see any built-in support for this but I think that you can write
> a
> > class that implements io.confluent.connect.hdfs.Format
> >
> > public interface Format {
> >   RecordWriterProvider getRecordWriterProvider();
> >   SchemaFileReader getSchemaFileReader(AvroData avroData);
> >   HiveUtil getHiveUtil(HdfsSinkConnectorConfig config, AvroData avroData,
> > HiveMetaStore hiveMetaStore);
> > }
> >
> > You would still have to register a schema in the Schema Registry and the
> > "SchemaFileReader" that you return would have to return the same Schema.
> >
> > -Dave
> >
> > Dave Tauzell | Senior Software Engineer | Surescripts
> > O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> > Connect with us: Twitter I LinkedIn I Facebook I YouTube
> >
> >
> > -Original Message-
> > From: Pariksheet Barapatre [mailto:pari.data...@gmail.com]
> > Sent: Wednesday, June 22, 2016 11:49 AM
> > To: us...@kafka.apache.org
> > Cc: dev@kafka.apache.org
> > Subject: Re: Kafka HDFS Connector
> >
> > Hi Dustin,
> >
> > I am looking for option 1.
> >
> > Looking at Kafka Connect code, I guess we need to write converter code if
> > not available.
> >
> >
> > Thanks in advance.
> >
> > Regards
> > Pari
> >
> >
> > On 22 June 2016 at 18:50, Dustin Cote  wrote:
> >
> > > Hi Pari,
> > >
> > > Can you clarify which scenario you are looking to implement?
> > > 1) plaintext Kafka data --> plaintext HDFS data readable by hive
> > > 2) plaintext Kafka data --> avro/parquet HDFS data readable by hive
> > >
> > > Regards,
> > >
> > >
> > >
> > > On Wed, Jun 22, 2016 at 6:02 AM, Pariksheet Barapatre <
> > > pari.data...@gmail.com> wrote:
> > >
> > > > Thanks for your suggestions. I think if kafka connect provides the
> > > > same functionality as flume and storm,  why should we go for another
> > > > infrastructure investment.
> > > >
> > > > Kafka Connect effectively copies data from Kafka topic to HDFS
> > > > through connector. It supports avro as well as parquet, I am looking
> > > > if we can
> > > use
> > > > it to load plain text data.
> > > >
> > > > Cheers
> > > > Pari
> > > >
> > > >
> > > >
> > > > On 22 June 2016 at 12:34, Lohith Samaga M
> > > > 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > You can use Storm also, Here you have the option of
> > > > > rotating
> > > the
> > > > > file. You can also write to Hive directly.
> > > > >
> > > > > Best regards / Mit freundlichen Grüßen / Sincères salutations M.
> > > > > Lohith Samaga
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> > > > > Sent: Wednesday, June 22, 2016 12.32
> > > > > To: us...@kafka.apache.org; dev@kafka.apache.org
> > > > > Subject: Re: Kafka HDFS Connector
> > > > >
> > > > > I think you can use flume also.
> > > > >
> > > > > Thanks,
> > > > > Mudit
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 6/22/16, 12:29 PM, "Pariksheet Barapatre"
> > > > > 
> > > > > wrote:
> > > > >
> > > > > >Anybody have any idea on this?
> > > > > >
> > > > > >Thanks
> > > > > >Pari
> > > > > >
> > > > > >On 20 June 2016 at 14:36, Pariksheet Barapatre <
> > > pari.data...@gmail.com>
> > > > > >wrote:
> > > > > >
> > > > > >> Hello All,
> > > > > >>
> > > > > >> I have data coming from sensors into kafka cluster in text
> > > > > >> format delimited by comma.
> > > > > >>
> > > > > >> How to offload this data to Hive periodically from Kafka. I
> > > > > >> guess, Kafka Connect should solve my problem but when I checked
> > > > > >> documentation, examples have only avro formatted data. Can you
> > > please
> > > > > >> provide some knowledge on this.
> > > > > >>
> > > > > >> Many Thanks
> > > > > >> Pari
> > > > > >>
> > > > >
> > > > > Information transmitted by this e-mail is proprietary to Mphasis,
> > > > > its associated companies and/ or its customers and is intended for
> > > > > use only by the individual or entity to which it is addressed, and
> >

[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-22 Thread Eric Wasserman (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345618#comment-15345618
 ] 

Eric Wasserman commented on KAFKA-1981:
---

[~ijuma] do you think you have a chance to check out the new PR any time soon?

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3879) KafkaConsumer with auto commit enabled gets stuck when killed after broker is dead

2016-06-22 Thread Ashish K Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345609#comment-15345609
 ] 

Ashish K Singh commented on KAFKA-3879:
---

[~hachikuji] yeah KAFKA-3822 is pointing to the same issue. I am trying to 
think of a case where someone would want to wait on broker to come up in close, 
but no such case to my mind. One can always set {{max.block.ms}} to 
{{Long.MAX_VALUE}} to achieve that. I am more inclined towards having something 
like {{max.block.ms}}. It will probably require a KIP, are you working on 
KAFKA-3822? If not, I can post a short (hopefully :)) KIP and a patch this week.

> KafkaConsumer with auto commit enabled gets stuck when killed after broker is 
> dead
> --
>
> Key: KAFKA-3879
> URL: https://issues.apache.org/jira/browse/KAFKA-3879
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Fix For: 0.10.0.1
>
>
> KafkaConsumer with auto commit enabled gets stuck when killed after broker is 
> dead.
> * KafkaConsumer on close tries to close coordinator.
> * Coordinator, if auto commit is enabled, tries to commit offsets 
> synchronously before closing.
> * While trying to synchronously commit offsets, coordinator checks if 
> coordinator is alive by sending {{GroupCoordinatorRequest}}. As brokers are 
> dead, this returns {{NoAvailableBrokersException}}, which is a retriable 
> exception.
> * Coordinator ready check enters into an infinite loop as it keeps retrying 
> to discover group coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Jenkins build is back to normal : kafka-0.10.0-jdk7 #133

2016-06-22 Thread Apache Jenkins Server

See

Build failed in Jenkins: kafka-trunk-jdk7 #1387

2016-06-22 Thread Apache Jenkins Server

See 

Changes:

[me] KAFKA-3863: System tests covering connector/task failure and restart

--
[...truncated 6639 lines...]

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > testSocketsCloseOnShutdown STARTED

kafka.network.SocketServerTest > testSocketsCloseOnShutdown PASSED

kafka.network.SocketServerTest > testSslSocketServer STARTED

kafka.network.SocketServerTest > testSslSocketServer PASSED

kafka.network.SocketServerTest > tooBigRequestIsRejected STARTED

kafka.network.SocketServerTest > tooBigRequestIsRejected PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.SaslSslTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslSslTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.SaslSslTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslSslTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.SaslSslTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride PASSED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig STARTED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest

Possibly very minor typo in documentation.html

2016-06-22 Thread Tyler

Note: Please CC me if needed, I am not subscribed to this list.

0.9 changes say: “Java 1.6 is no longer supported”

However, the CWD in this example at 
http://kafka.apache.org/documentation.html#quickstart_multibroker 
 says:

"Now let's test out fault-tolerance. Broker 1 was acting as the leader so let's 
kill it:
> ps | grep server-1.properties
7564
 ttys0020:15.91 
/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java...
> 
kill -9 7564”

Seems to say you’re in a Java 1.6 world — see bold text.

Does that ps paste need some minor updating? ^F for 1.6 finds it nowhere else, 
which is how I originally found it.

Thanks for all the great work,
Tyler
Seattle, WA

RE: Kafka HDFS Connector

2016-06-22 Thread Tauzell, Dave

I don't see any built-in support for this but I think that you can write a 
class that implements io.confluent.connect.hdfs.Format

public interface Format {
  RecordWriterProvider getRecordWriterProvider();
  SchemaFileReader getSchemaFileReader(AvroData avroData);
  HiveUtil getHiveUtil(HdfsSinkConnectorConfig config, AvroData avroData, 
HiveMetaStore hiveMetaStore);
}

You would still have to register a schema in the Schema Registry and the 
"SchemaFileReader" that you return would have to return the same Schema.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Pariksheet Barapatre [mailto:pari.data...@gmail.com]
Sent: Wednesday, June 22, 2016 11:49 AM
To: us...@kafka.apache.org
Cc: dev@kafka.apache.org
Subject: Re: Kafka HDFS Connector

Hi Dustin,

I am looking for option 1.

Looking at Kafka Connect code, I guess we need to write converter code if not 
available.


Thanks in advance.

Regards
Pari


On 22 June 2016 at 18:50, Dustin Cote  wrote:

> Hi Pari,
>
> Can you clarify which scenario you are looking to implement?
> 1) plaintext Kafka data --> plaintext HDFS data readable by hive
> 2) plaintext Kafka data --> avro/parquet HDFS data readable by hive
>
> Regards,
>
>
>
> On Wed, Jun 22, 2016 at 6:02 AM, Pariksheet Barapatre <
> pari.data...@gmail.com> wrote:
>
> > Thanks for your suggestions. I think if kafka connect provides the
> > same functionality as flume and storm,  why should we go for another
> > infrastructure investment.
> >
> > Kafka Connect effectively copies data from Kafka topic to HDFS
> > through connector. It supports avro as well as parquet, I am looking
> > if we can
> use
> > it to load plain text data.
> >
> > Cheers
> > Pari
> >
> >
> >
> > On 22 June 2016 at 12:34, Lohith Samaga M
> > 
> > wrote:
> >
> > > Hi,
> > > You can use Storm also, Here you have the option of
> > > rotating
> the
> > > file. You can also write to Hive directly.
> > >
> > > Best regards / Mit freundlichen Grüßen / Sincères salutations M.
> > > Lohith Samaga
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> > > Sent: Wednesday, June 22, 2016 12.32
> > > To: us...@kafka.apache.org; dev@kafka.apache.org
> > > Subject: Re: Kafka HDFS Connector
> > >
> > > I think you can use flume also.
> > >
> > > Thanks,
> > > Mudit
> > >
> > >
> > >
> > >
> > > On 6/22/16, 12:29 PM, "Pariksheet Barapatre"
> > > 
> > > wrote:
> > >
> > > >Anybody have any idea on this?
> > > >
> > > >Thanks
> > > >Pari
> > > >
> > > >On 20 June 2016 at 14:36, Pariksheet Barapatre <
> pari.data...@gmail.com>
> > > >wrote:
> > > >
> > > >> Hello All,
> > > >>
> > > >> I have data coming from sensors into kafka cluster in text
> > > >> format delimited by comma.
> > > >>
> > > >> How to offload this data to Hive periodically from Kafka. I
> > > >> guess, Kafka Connect should solve my problem but when I checked
> > > >> documentation, examples have only avro formatted data. Can you
> please
> > > >> provide some knowledge on this.
> > > >>
> > > >> Many Thanks
> > > >> Pari
> > > >>
> > >
> > > Information transmitted by this e-mail is proprietary to Mphasis,
> > > its associated companies and/ or its customers and is intended for
> > > use only by the individual or entity to which it is addressed, and
> > may
> > > contain information that is privileged, confidential or exempt
> > > from disclosure under applicable law. If you are not the
> intended
> > > recipient or it appears that this mail has been forwarded to you
> > > without proper authority, you are notified that any use or
> > > dissemination of this information in any manner is strictly
> > > prohibited. In such cases, please notify us immediately at
> > > mailmas...@mphasis.com and delete this mail from your records.
> > >
> >
>
>
>
> --
> Dustin Cote
> confluent.io
>
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.

[jira] [Commented] (KAFKA-3879) KafkaConsumer with auto commit enabled gets stuck when killed after broker is dead

2016-06-22 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345485#comment-15345485
 ] 

Jason Gustafson commented on KAFKA-3879:


[~singhashish] This is a duplicate of KAFKA-3822, right? Yeah, we knew about 
it. I think we should either solve it by adding a {{max.block.ms}} 
configuration option or an overloaded close() with a timeout. 

> KafkaConsumer with auto commit enabled gets stuck when killed after broker is 
> dead
> --
>
> Key: KAFKA-3879
> URL: https://issues.apache.org/jira/browse/KAFKA-3879
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
> Fix For: 0.10.0.1
>
>
> KafkaConsumer with auto commit enabled gets stuck when killed after broker is 
> dead.
> * KafkaConsumer on close tries to close coordinator.
> * Coordinator, if auto commit is enabled, tries to commit offsets 
> synchronously before closing.
> * While trying to synchronously commit offsets, coordinator checks if 
> coordinator is alive by sending {{GroupCoordinatorRequest}}. As brokers are 
> dead, this returns {{NoAvailableBrokersException}}, which is a retriable 
> exception.
> * Coordinator ready check enters into an infinite loop as it keeps retrying 
> to discover group coordinator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Build failed in Jenkins: kafka-trunk-jdk8 #718

2016-06-22 Thread Apache Jenkins Server

See 

Changes:

[me] KAFKA-3863: System tests covering connector/task failure and restart

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-6 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 36cab7dbdff6981d0df4b355dadee3fac35508a6 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 36cab7dbdff6981d0df4b355dadee3fac35508a6
 > git rev-list 8bf18df1b6584d95a850a846b26046c2b79531b7 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson2606703693232774530.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# 
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 1 day 17 hr old

Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66

[jira] [Resolved] (KAFKA-3863) Add system test for connector failure/restart

2016-06-22 Thread Ewen Cheslack-Postava (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-3863.
--
   Resolution: Fixed
Fix Version/s: 0.10.0.1
   0.10.1.0

Issue resolved by pull request 1519
[https://github.com/apache/kafka/pull/1519]

> Add system test for connector failure/restart
> -
>
> Key: KAFKA-3863
> URL: https://issues.apache.org/jira/browse/KAFKA-3863
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> We should have system tests covering connector/task failure and the ability 
> to restart through the REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3863) Add system test for connector failure/restart

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345452#comment-15345452
 ] 

ASF GitHub Bot commented on KAFKA-3863:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1519


> Add system test for connector failure/restart
> -
>
> Key: KAFKA-3863
> URL: https://issues.apache.org/jira/browse/KAFKA-3863
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> We should have system tests covering connector/task failure and the ability 
> to restart through the REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1519: KAFKA-3863: System tests covering connector/task f...

2016-06-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1519


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Jason Gustafson (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345444#comment-15345444
 ] 

Jason Gustafson commented on KAFKA-3892:


[~iamnoah] Can you reliably reproduce the problem? It would help to dig into 
the details a little bit since the only known situation where this can happen 
is the one Ismael mentioned.

> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy

2016-06-22 Thread Jason Gustafson

Hey Vahid,

Thanks for the updates. I think the lack of comments on this KIP suggests
that the motivation might need a little work. Here are the two main
benefits of this assignor as I see them:

1. It can give a more balanced assignment when subscriptions do not match
in a group (this is the same problem solved by KIP-49).
2. It potentially allows applications to save the need to cleanup partition
state when rebalancing since partitions are more likely to stay assigned to
the same consumer.

Does that seem right to you?

I think it's unclear how serious the first problem is. Providing better
balance when subscriptions differ is nice, but are rolling updates the only
scenario where this is encountered? Or are there more general use cases
where differing subscriptions could persist for a longer duration? I'm also
wondering if this assignor addresses the problem found in KAFKA-2019. It
would be useful to confirm whether this problem still exists with the new
consumer's round robin strategy and how (whether?) it is addressed by this
assignor.

The major selling point seems to be the second point. This is definitely
nice to have, but would you expect a lot of value in practice since
consumer groups are usually assumed to be stable? It might help to describe
some specific use cases to help motivate the proposal. One of the downsides
is that it requires users to restructure their code to get any benefit from
it. In particular, they need to move partition cleanup out of the
onPartitionsRevoked() callback and into onPartitionsAssigned(). This is a
little awkward and will probably make explaining the consumer more
difficult. It's probably worth including a discussion of this point in the
proposal with an example.

Thanks,
Jason

On Tue, Jun 7, 2016 at 4:05 PM, Vahid S Hashemian  wrote:

> Hi Jason,
>
> I updated the KIP and added some details about the user data, the
> assignment algorithm, and the alternative strategies to consider.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-54+-+Sticky+Partition+Assignment+Strategy
>
> Please let me know if I missed to add something. Thank you.
>
> Regards,
> --Vahid
>
>
>

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Noah Sloan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345400#comment-15345400
 ] 

Noah Sloan commented on KAFKA-3892:
---

I can say that all producers and consumers ended up with metadata for all 
topics (according to the heap dump), not just ones that might have not had any 
subscriptions yet. So there is something pathological about it, since the 
conditions never corrects itself. Also, when i was debugging, it was never the 
first metadata response that contained all topics. 

> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3890) Kafka Streams: task assignment is not maintained on cluster restart or rolling restart

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345367#comment-15345367
 ] 

ASF GitHub Bot commented on KAFKA-3890:
---

GitHub user HenryCaiHaiying opened a pull request:

https://github.com/apache/kafka/pull/1543

KAFKA-3890 Kafka Streams: task assignment is not maintained on cluster 
restart or rolling restart

Current task assignment in TaskAssignor is not deterministic.

During cluster restart or rolling restart, we have the same set of 
participating worker nodes.  But the current TaskAssignor is not able to 
maintain a deterministic mapping, so about 20% partitions will be reassigned 
which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are 
carrying with) is not changed, we really just want to keep the old task 
assignment.

Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the 
new task list
- when there is no new node joining (its prevAssignTasks would be either 
empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from 
other nodes would not be equal to the new task list)

When the topology is not changing, we would just use the old mapping.

I also added the code to check whether the previous assignment is balanced 
(whether each node's task list is within [1/2 average -- 2 * average]), if it's 
not balanced, we will still start the a new task assignment.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HenryCaiHaiying/kafka upstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1543






> Kafka Streams: task assignment is not maintained on cluster restart or 
> rolling restart
> --
>
> Key: KAFKA-3890
> URL: https://issues.apache.org/jira/browse/KAFKA-3890
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Henry Cai
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Currently the task assignment in TaskAssignor is not deterministic.  During 
> cluster restart or rolling restart, even though the participating worker 
> nodes are the same, but the TaskAssignor is not able to maintain a 
> deterministic mapping, so about 20% partitions will be reassigned which would 
> cause state repopulation on cluster restart time.
> When the participating worker nodes are not changed, we really just want to 
> keep the old task assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1543: KAFKA-3890 Kafka Streams: task assignment is not m...

2016-06-22 Thread HenryCaiHaiying

GitHub user HenryCaiHaiying opened a pull request:

https://github.com/apache/kafka/pull/1543

KAFKA-3890 Kafka Streams: task assignment is not maintained on cluster 
restart or rolling restart

Current task assignment in TaskAssignor is not deterministic.

During cluster restart or rolling restart, we have the same set of 
participating worker nodes.  But the current TaskAssignor is not able to 
maintain a deterministic mapping, so about 20% partitions will be reassigned 
which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are 
carrying with) is not changed, we really just want to keep the old task 
assignment.

Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the 
new task list
- when there is no new node joining (its prevAssignTasks would be either 
empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from 
other nodes would not be equal to the new task list)

When the topology is not changing, we would just use the old mapping.

I also added the code to check whether the previous assignment is balanced 
(whether each node's task list is within [1/2 average -- 2 * average]), if it's 
not balanced, we will still start the a new task assignment.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HenryCaiHaiying/kafka upstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1543






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3890) Kafka Streams: task assignment is not maintained on cluster restart or rolling restart

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345365#comment-15345365
 ] 

ASF GitHub Bot commented on KAFKA-3890:
---

Github user HenryCaiHaiying closed the pull request at:

https://github.com/apache/kafka/pull/1538


> Kafka Streams: task assignment is not maintained on cluster restart or 
> rolling restart
> --
>
> Key: KAFKA-3890
> URL: https://issues.apache.org/jira/browse/KAFKA-3890
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Henry Cai
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Currently the task assignment in TaskAssignor is not deterministic.  During 
> cluster restart or rolling restart, even though the participating worker 
> nodes are the same, but the TaskAssignor is not able to maintain a 
> deterministic mapping, so about 20% partitions will be reassigned which would 
> cause state repopulation on cluster restart time.
> When the participating worker nodes are not changed, we really just want to 
> keep the old task assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3890) Kafka Streams: task assignment is not maintained on cluster restart or rolling restart

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345362#comment-15345362
 ] 

ASF GitHub Bot commented on KAFKA-3890:
---

GitHub user HenryCaiHaiying reopened a pull request:

https://github.com/apache/kafka/pull/1538

KAFKA-3890 Kafka Streams: task assignment is not maintained on cluster 
restart or rolling restart

Current task assignment in TaskAssignor is not deterministic.

During cluster restart or rolling restart, we have the same set of 
participating worker nodes.  But the current TaskAssignor is not able to 
maintain a deterministic mapping, so about 20% partitions will be reassigned 
which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are 
carrying with) is not changed, we really just want to keep the old task 
assignment.

Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the 
new task list
- when there is no new node joining (its prevAssignTasks would be either 
empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from 
other nodes would not be equal to the new task list)

When the topology is not changing, we would just use the old mapping.

I also added the code to check whether the previous assignment is balanced 
(whether each node's task list is within [1/2 average -- 2 * average]), if it's 
not balanced, we will still start the a new task assignment.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HenryCaiHaiying/kafka upstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1538






> Kafka Streams: task assignment is not maintained on cluster restart or 
> rolling restart
> --
>
> Key: KAFKA-3890
> URL: https://issues.apache.org/jira/browse/KAFKA-3890
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Henry Cai
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Currently the task assignment in TaskAssignor is not deterministic.  During 
> cluster restart or rolling restart, even though the participating worker 
> nodes are the same, but the TaskAssignor is not able to maintain a 
> deterministic mapping, so about 20% partitions will be reassigned which would 
> cause state repopulation on cluster restart time.
> When the participating worker nodes are not changed, we really just want to 
> keep the old task assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1538: KAFKA-3890 Kafka Streams: task assignment is not m...

2016-06-22 Thread HenryCaiHaiying

Github user HenryCaiHaiying closed the pull request at:

https://github.com/apache/kafka/pull/1538


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Build failed in Jenkins: kafka-trunk-jdk8 #717

2016-06-22 Thread Apache Jenkins Server

See 

Changes:

[harsha] MINOR: Verify acls for group resource on all servers of test cluster.

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-6 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 8bf18df1b6584d95a850a846b26046c2b79531b7 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8bf18df1b6584d95a850a846b26046c2b79531b7
 > git rev-list 10bbffd75439e10fe9db6cf0aa48a7da7e386ef3 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson2787321272493645798.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 1 day 16 hr old

Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66

[jira] [Commented] (KAFKA-3890) Kafka Streams: task assignment is not maintained on cluster restart or rolling restart

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345361#comment-15345361
 ] 

ASF GitHub Bot commented on KAFKA-3890:
---

Github user HenryCaiHaiying closed the pull request at:

https://github.com/apache/kafka/pull/1538


> Kafka Streams: task assignment is not maintained on cluster restart or 
> rolling restart
> --
>
> Key: KAFKA-3890
> URL: https://issues.apache.org/jira/browse/KAFKA-3890
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Henry Cai
>Assignee: Henry Cai
>  Labels: api, newbie
>
> Currently the task assignment in TaskAssignor is not deterministic.  During 
> cluster restart or rolling restart, even though the participating worker 
> nodes are the same, but the TaskAssignor is not able to maintain a 
> deterministic mapping, so about 20% partitions will be reassigned which would 
> cause state repopulation on cluster restart time.
> When the participating worker nodes are not changed, we really just want to 
> keep the old task assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1538: KAFKA-3890 Kafka Streams: task assignment is not m...

2016-06-22 Thread HenryCaiHaiying

Github user HenryCaiHaiying closed the pull request at:

https://github.com/apache/kafka/pull/1538


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #1538: KAFKA-3890 Kafka Streams: task assignment is not m...

2016-06-22 Thread HenryCaiHaiying

GitHub user HenryCaiHaiying reopened a pull request:

https://github.com/apache/kafka/pull/1538

KAFKA-3890 Kafka Streams: task assignment is not maintained on cluster 
restart or rolling restart

Current task assignment in TaskAssignor is not deterministic.

During cluster restart or rolling restart, we have the same set of 
participating worker nodes.  But the current TaskAssignor is not able to 
maintain a deterministic mapping, so about 20% partitions will be reassigned 
which would cause state repopulation.
When the topology of work nodes (# of worker nodes, the TaskIds they are 
carrying with) is not changed, we really just want to keep the old task 
assignment.

Add the code to check whether the node topology is changing or not:
- when the prevAssignedTasks from the old clientStates is the same as the 
new task list
- when there is no new node joining (its prevAssignTasks would be either 
empty or conflict with some other nodes)
- when there is no node dropping out (the total of prevAssignedTasks from 
other nodes would not be equal to the new task list)

When the topology is not changing, we would just use the old mapping.

I also added the code to check whether the previous assignment is balanced 
(whether each node's task list is within [1/2 average -- 2 * average]), if it's 
not balanced, we will still start the a new task assignment.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HenryCaiHaiying/kafka upstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1538






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Build failed in Jenkins: kafka-trunk-jdk7 #1386

2016-06-22 Thread Apache Jenkins Server

See 

Changes:

[harsha] MINOR: Verify acls for group resource on all servers of test cluster.

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-6 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 8bf18df1b6584d95a850a846b26046c2b79531b7 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8bf18df1b6584d95a850a846b26046c2b79531b7
 > git rev-list 10bbffd75439e10fe9db6cf0aa48a7da7e386ef3 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson5982264550042000980.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.

FAILURE: Build failed with an exception.

* What went wrong:
Unable to start the daemon process.
This problem might be caused by incorrect configuration of the daemon.
For example, an unrecognized jvm option is used.
Please refer to the user guide chapter on the daemon at 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html
Please read the following process output to find out more:
---
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# /home/jenkins/.gradle/daemon/2.4-rc-2/hs_err_pid12554.log


* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 22 days old

Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51

[jira] [Created] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-06-22 Thread Tim Carey-Smith (JIRA)

Tim Carey-Smith created KAFKA-3894:
--

 Summary: Log Cleaner thread crashes and never restarts
 Key: KAFKA-3894
 URL: https://issues.apache.org/jira/browse/KAFKA-3894
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.1, 0.8.2.2
 Environment: Oracle JDK 8
Ubuntu Precise
Reporter: Tim Carey-Smith


The log-cleaner thread can crash if the number of keys in a topic grows to be 
too large to fit into the dedupe buffer. 

The result of this is a log line: 
{quote}
broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
\[kafka-log-cleaner-thread-0\], Error due to  
java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
log.cleaner.threads
{quote}

As a result, the broker is left in a potentially dangerous situation where 
cleaning of compacted topics is not running. 

It is unclear if the broader strategy for the {{LogCleaner}} is the reason for 
this upper bound, or if this is a value which must be tuned for each specific 
use-case. 

Of more immediate concern is the fact that the thread crash is not visible via 
JMX or exposed as some form of service degradation. 

Some short-term remediations we have made are:
* increasing the size of the dedupe buffer
* monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-06-22 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345288#comment-15345288
 ] 

Sriharsha Chintalapani edited comment on KAFKA-3893 at 6/22/16 10:30 PM:
-

[~chaithrar...@gmail.com] it looks like brokers are loosing connection to 
zookeeper and hence the ephemeral node under /broker/ids will disappear. I 
advise you to set the connection timeout 3 ms . 
I advise you to post your questions in kafka mailing lists before opening a 
JIRA as it doesn't look like a bug in kafka


was (Author: sriharsha):
[~chaithrar...@gmail.com] it looks like brokers are loosing connection to 
zookeeper and hence the ephemeral node under /broker/ids will disappear. I 
advise you to set the connection timeout 3 ms . 
I advise you post your questions in kafka mailing lists before opening a JIRA 
as it doesn't look like a bug in kafka

> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-06-22 Thread Sriharsha Chintalapani (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345288#comment-15345288
 ] 

Sriharsha Chintalapani commented on KAFKA-3893:
---

[~chaithrar...@gmail.com] it looks like brokers are loosing connection to 
zookeeper and hence the ephemeral node under /broker/ids will disappear. I 
advise you to set the connection timeout 3 ms . 
I advise you post your questions in kafka mailing lists before opening a JIRA 
as it doesn't look like a bug in kafka

> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1540: MINOR: Verify acls for group resource on all serve...

2016-06-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1540


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Description: 
Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6

We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
disappears.
We see the zookeeper connection active and no network issue.

The zookeeper conection timeout is set to 6000ms in server.properties

Hence Kafka not participating in cluster

  was:
Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6

We have scenario where kafka is not trying to register back in zookeeper, 
because of which we don't see the broker id in the zookeeper path /brokers/ids.

In our scenario its been almost 24 hours since there is active connection 
established to zookeeper , But I don't see any "connection exception after 
timeout in zookeeper"

The zookeeper conection timeout is set to 6000ms in server.properties

Hence Kafka not participating in cluster


> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3893) Kafka Borker ID disappears from /borkers/ids

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Summary: Kafka Borker ID disappears from /borkers/ids  (was: Kafka not 
trying re-register back in Zookeeper)

> Kafka Borker ID disappears from /borkers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka is not trying to register back in zookeeper, 
> because of which we don't see the broker id in the zookeeper path 
> /brokers/ids.
> In our scenario its been almost 24 hours since there is active connection 
> established to zookeeper , But I don't see any "connection exception after 
> timeout in zookeeper"
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3663) Proposal for a kafka broker command - kafka-brokers.sh

2016-06-22 Thread Jayesh Thakrar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345240#comment-15345240
 ] 

Jayesh Thakrar commented on KAFKA-3663:
---

I think I need some pointers/help here.
Apparently the pull request shows a Jenkins error, although I verified that the 
build works.
Here's the link to the pull request - https://github.com/apache/kafka/pull/1539

> Proposal for a kafka broker command - kafka-brokers.sh
> --
>
> Key: KAFKA-3663
> URL: https://issues.apache.org/jira/browse/KAFKA-3663
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Jayesh Thakrar
>
> This is a proposal for an admin tool - say, kafka-brokers.sh to provide 
> broker related useful information. See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-59%3A+Proposal+for+a+kafka+broker+command
>  for details.
> The kafka-brokers.sh command mimics the kafka-topic.sh command, but provides 
> details by broker rather than by topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3893) Kafka not trying re-register back in Zookeeper

2016-06-22 Thread chaitra (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345201#comment-15345201
 ] 

chaitra commented on KAFKA-3893:


Kafka busy spitting below logs
{noformat}
[2016-06-21 21:59:58,863] INFO Partition [applog,19] on broker 3: Shrinking ISR 
for partition [applog,19] from 3,5 to 3 (kafka.cluster.Partition)
[2016-06-21 21:59:58,865] INFO Partition [applog,19] on broker 3: Cached 
zkVersion [475] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,866] INFO Partition [vztetappstdout,14] on broker 3: 
Shrinking ISR for partition [vztetappstdout,14] from 3,2 to 3 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,868] INFO Partition [vztetappstdout,14] on broker 3: 
Cached zkVersion [265] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,868] INFO Partition [appstderr,14] on broker 3: Shrinking 
ISR for partition [appstderr,14] from 3,5 to 3 (kafka.cluster.Partition)
[2016-06-21 21:59:58,870] INFO Partition [appstderr,14] on broker 3: Cached 
zkVersion [473] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,870] INFO Partition [vztetappstdout,2] on broker 3: 
Shrinking ISR for partition [vztetappstdout,2] from
{noformat}

> Kafka not trying re-register back in Zookeeper
> --
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka is not trying to register back in zookeeper, 
> because of which we don't see the broker id in the zookeeper path 
> /brokers/ids.
> In our scenario its been almost 24 hours since there is active connection 
> established to zookeeper , But I don't see any "connection exception after 
> timeout in zookeeper"
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3893) Kafka not trying re-register back in Zookeeper

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Description: 
Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6

We have scenario where kafka is not trying to register back in zookeeper, 
because of which we don't see the broker id in the zookeeper path /brokers/ids.

In our scenario its been almost 24 hours since there is active connection 
established to zookeeper , But I don't see any "connection exception after 
timeout in zookeeper"

The zookeeper conection timeout is set to 6000ms in server.properties

Hence Kafka not participating in cluster

  was:
Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6

We have scenario where kafka 


> Kafka not trying re-register back in Zookeeper
> --
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka is not trying to register back in zookeeper, 
> because of which we don't see the broker id in the zookeeper path 
> /brokers/ids.
> In our scenario its been almost 24 hours since there is active connection 
> established to zookeeper , But I don't see any "connection exception after 
> timeout in zookeeper"
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3893) Kafka not trying re-register back in Zookeeper

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Priority: Critical  (was: Major)

> Kafka not trying re-register back in Zookeeper
> --
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka is not trying to register back in zookeeper, 
> because of which we don't see the broker id in the zookeeper path 
> /brokers/ids.
> In our scenario its been almost 24 hours since there is active connection 
> established to zookeeper , But I don't see any "connection exception after 
> timeout in zookeeper"
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3893) Kafka not trying re-register back in Zookeeper

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Summary: Kafka not trying re-register back in Zookeeper  (was: Kafka not 
disconnecting zookeeper connection even after time configured for timeout has 
crossed)

> Kafka not trying re-register back in Zookeeper
> --
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3893) Kafka not disconnecting zookeeper connection even after time configured for timeout has crossed

2016-06-22 Thread chaitra (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaitra updated KAFKA-3893:
---
Description: 
Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6

We have scenario where kafka 

> Kafka not disconnecting zookeeper connection even after time configured for 
> timeout has crossed
> ---
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-3893) Kafka not disconnecting zookeeper connection even after time configured for timeout has crossed

2016-06-22 Thread chaitra (JIRA)

chaitra created KAFKA-3893:
--

 Summary: Kafka not disconnecting zookeeper connection even after 
time configured for timeout has crossed
 Key: KAFKA-3893
 URL: https://issues.apache.org/jira/browse/KAFKA-3893
 Project: Kafka
  Issue Type: Bug
Reporter: chaitra






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345164#comment-15345164
 ] 

ASF GitHub Bot commented on KAFKA-3892:
---

GitHub user iamnoah opened a pull request:

https://github.com/apache/kafka/pull/1542

KAFKA-3892 prune metadata response to subscribed topics

Rebased from PR #1541

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spredfast/kafka-1 remove-extra-metadata-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1542


commit b1af18d06a18080f8f8fd1535ea99807c55cbf50
Author: Noah Sloan 
Date:   2016-06-22T20:10:35Z

KAFKA-3892 prune metadata response to subscribed topics




> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1542: KAFKA-3892 prune metadata response to subscribed t...

2016-06-22 Thread iamnoah

GitHub user iamnoah opened a pull request:

https://github.com/apache/kafka/pull/1542

KAFKA-3892 prune metadata response to subscribed topics

Rebased from PR #1541

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spredfast/kafka-1 remove-extra-metadata-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1542


commit b1af18d06a18080f8f8fd1535ea99807c55cbf50
Author: Noah Sloan 
Date:   2016-06-22T20:10:35Z

KAFKA-3892 prune metadata response to subscribed topics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345140#comment-15345140
 ] 

ASF GitHub Bot commented on KAFKA-3892:
---

Github user iamnoah closed the pull request at:

https://github.com/apache/kafka/pull/1541


> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1541: KAFKA-3892 prune metadata response to subscribed t...

2016-06-22 Thread iamnoah

Github user iamnoah closed the pull request at:

https://github.com/apache/kafka/pull/1541


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345120#comment-15345120
 ] 

Ismael Juma commented on KAFKA-3892:


I was explaining a known cause of it happening. :) The protocol did not support 
a metadata request with 0 topics so if you had no topics, you would have to ask 
for _all_ topics. Fixed in 0.10.0.0, as I said.

> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Noah Sloan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345103#comment-15345103
 ] 

Noah Sloan commented on KAFKA-3892:
---

I believe the ultimate cause is that the either a request is being made for all 
topic metadata, or a broker is mistakenly responding with all topic metadata. I 
was not able to figure out why that would happen.

I think the best immediate fix is to have the client defensively prune the 
metadata response and only retain topics that are subscribed. My PR does not 
affect the case where a topic pattern is used for subscription, so it seems 
like a fairly safe change to me.

> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1541: KAFKA-3892 prune metadata response to subscribed t...

2016-06-22 Thread iamnoah

GitHub user iamnoah opened a pull request:

https://github.com/apache/kafka/pull/1541

KAFKA-3892 prune metadata response to subscribed topics

I believe this will cause clients to defensively prune their cluster 
metadata in all cases. It doesn't address why a client without a Pattern 
subscription would receive a response containing all topics and partitions for 
the cluster (which is still undesirable, but I am guessing would require a fix 
for the broker.)

In my own testing, this restored the amount of heap required to 0.8 
consumer levels.

I am concerned that I do not 100% understand all the uses of this class. My 
assumption is that only topics that have been added are expected in the 
response and that the two unit test modifications I needed to make were 
oversights.

I am also assuming that this behavior was only applied to the pattern 
matching case to avoid a small amount of (presumed) unnecessary work and not 
for correctness reasons.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spredfast/kafka-1 remove-extra-metadata

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1541


commit cb19feac9c1473e8406fd10a895a41468373ddae
Author: Noah Sloan 
Date:   2016-06-22T20:10:35Z

KAFKA-3892 prune metadata response to subscribed topics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345095#comment-15345095
 ] 

ASF GitHub Bot commented on KAFKA-3892:
---

GitHub user iamnoah opened a pull request:

https://github.com/apache/kafka/pull/1541

KAFKA-3892 prune metadata response to subscribed topics

I believe this will cause clients to defensively prune their cluster 
metadata in all cases. It doesn't address why a client without a Pattern 
subscription would receive a response containing all topics and partitions for 
the cluster (which is still undesirable, but I am guessing would require a fix 
for the broker.)

In my own testing, this restored the amount of heap required to 0.8 
consumer levels.

I am concerned that I do not 100% understand all the uses of this class. My 
assumption is that only topics that have been added are expected in the 
response and that the two unit test modifications I needed to make were 
oversights.

I am also assuming that this behavior was only applied to the pattern 
matching case to avoid a small amount of (presumed) unnecessary work and not 
for correctness reasons.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/spredfast/kafka-1 remove-extra-metadata

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1541


commit cb19feac9c1473e8406fd10a895a41468373ddae
Author: Noah Sloan 
Date:   2016-06-22T20:10:35Z

KAFKA-3892 prune metadata response to subscribed topics




> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1540: MINOR: Verify acls for group resource on all serve...

2016-06-22 Thread SinghAsDev

GitHub user SinghAsDev opened a pull request:

https://github.com/apache/kafka/pull/1540

MINOR: Verify acls for group resource on all servers of test cluster.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SinghAsDev/kafka MinorAclTestCheck

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1540.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1540


commit c8d0b88fa66ba5bde52feacd70d025e073362f05
Author: Ashish Singh 
Date:   2016-06-22T20:19:08Z

MINOR: Verify acls for group resource on all servers of test cluster.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345082#comment-15345082
 ] 

Ismael Juma commented on KAFKA-3892:


One of the issues is that if your producer or consumer did a metadata request 
with 0 topics, it would get all topics. This was fixed in 0.10.0.0. Is there 
any chance you could test 0.10.0.0 and see if your problem goes away?

> Clients retain metadata for non-subscribed topics
> -
>
> Key: KAFKA-3892
> URL: https://issues.apache.org/jira/browse/KAFKA-3892
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Noah Sloan
>
> After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
> producer classes,) we noticed services with small heap crashing due to 
> OutOfMemoryErrors. These services contained many producers and consumers (~20 
> total) and were connected to brokers with >2000 topics and over 10k 
> partitions. Heap dumps revealed that each client had 3.3MB of Metadata 
> retained in their Cluster, with references to topics that were not being 
> produced or subscribed to. While the services were running with 128MB of heap 
> prior to the upgrade, we to had increased max heap to 200MB to accommodate 
> all the extra data. 
> While this is not technically a memory leak, it does impose a significant 
> overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-3892) Clients retain metadata for non-subscribed topics

2016-06-22 Thread Noah Sloan (JIRA)

Noah Sloan created KAFKA-3892:
-

 Summary: Clients retain metadata for non-subscribed topics
 Key: KAFKA-3892
 URL: https://issues.apache.org/jira/browse/KAFKA-3892
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.9.0.1
Reporter: Noah Sloan


After upgrading to 0.9.0.1 from 0.8.2 (and adopting the new consumer and 
producer classes,) we noticed services with small heap crashing due to 
OutOfMemoryErrors. These services contained many producers and consumers (~20 
total) and were connected to brokers with >2000 topics and over 10k partitions. 
Heap dumps revealed that each client had 3.3MB of Metadata retained in their 
Cluster, with references to topics that were not being produced or subscribed 
to. While the services were running with 128MB of heap prior to the upgrade, we 
to had increased max heap to 200MB to accommodate all the extra data. 

While this is not technically a memory leak, it does impose a significant 
overhead on clients when connected to a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Kafka HDFS Connector

2016-06-22 Thread Dustin Cote

Yes, I believe what you're looking for is what Dave described.  Here's the
source of that interface
https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/Format.java
 There
already exists a StringConverter that should handle the conversion in and
out of the connect data format in your case
https://kafka.apache.org/0100/javadoc/org/apache/kafka/connect/storage/StringConverter.html.
I think that's what you are looking for in terms of a Converter.  It looks
like your bigger need is the output format for HDFS.

FYI -- you are most welcome to add your request at the GitHub issues page
for the HDFS Connector
https://github.com/confluentinc/kafka-connect-hdfs/issues

On Wed, Jun 22, 2016 at 1:26 PM, Tauzell, Dave  wrote:

> I don't see any built-in support for this but I think that you can write a
> class that implements io.confluent.connect.hdfs.Format
>
> public interface Format {
>   RecordWriterProvider getRecordWriterProvider();
>   SchemaFileReader getSchemaFileReader(AvroData avroData);
>   HiveUtil getHiveUtil(HdfsSinkConnectorConfig config, AvroData avroData,
> HiveMetaStore hiveMetaStore);
> }
>
> You would still have to register a schema in the Schema Registry and the
> "SchemaFileReader" that you return would have to return the same Schema.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -Original Message-
> From: Pariksheet Barapatre [mailto:pari.data...@gmail.com]
> Sent: Wednesday, June 22, 2016 11:49 AM
> To: us...@kafka.apache.org
> Cc: dev@kafka.apache.org
> Subject: Re: Kafka HDFS Connector
>
> Hi Dustin,
>
> I am looking for option 1.
>
> Looking at Kafka Connect code, I guess we need to write converter code if
> not available.
>
>
> Thanks in advance.
>
> Regards
> Pari
>
>
> On 22 June 2016 at 18:50, Dustin Cote  wrote:
>
> > Hi Pari,
> >
> > Can you clarify which scenario you are looking to implement?
> > 1) plaintext Kafka data --> plaintext HDFS data readable by hive
> > 2) plaintext Kafka data --> avro/parquet HDFS data readable by hive
> >
> > Regards,
> >
> >
> >
> > On Wed, Jun 22, 2016 at 6:02 AM, Pariksheet Barapatre <
> > pari.data...@gmail.com> wrote:
> >
> > > Thanks for your suggestions. I think if kafka connect provides the
> > > same functionality as flume and storm,  why should we go for another
> > > infrastructure investment.
> > >
> > > Kafka Connect effectively copies data from Kafka topic to HDFS
> > > through connector. It supports avro as well as parquet, I am looking
> > > if we can
> > use
> > > it to load plain text data.
> > >
> > > Cheers
> > > Pari
> > >
> > >
> > >
> > > On 22 June 2016 at 12:34, Lohith Samaga M
> > > 
> > > wrote:
> > >
> > > > Hi,
> > > > You can use Storm also, Here you have the option of
> > > > rotating
> > the
> > > > file. You can also write to Hive directly.
> > > >
> > > > Best regards / Mit freundlichen Grüßen / Sincères salutations M.
> > > > Lohith Samaga
> > > >
> > > >
> > > >
> > > >
> > > > -Original Message-
> > > > From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> > > > Sent: Wednesday, June 22, 2016 12.32
> > > > To: us...@kafka.apache.org; dev@kafka.apache.org
> > > > Subject: Re: Kafka HDFS Connector
> > > >
> > > > I think you can use flume also.
> > > >
> > > > Thanks,
> > > > Mudit
> > > >
> > > >
> > > >
> > > >
> > > > On 6/22/16, 12:29 PM, "Pariksheet Barapatre"
> > > > 
> > > > wrote:
> > > >
> > > > >Anybody have any idea on this?
> > > > >
> > > > >Thanks
> > > > >Pari
> > > > >
> > > > >On 20 June 2016 at 14:36, Pariksheet Barapatre <
> > pari.data...@gmail.com>
> > > > >wrote:
> > > > >
> > > > >> Hello All,
> > > > >>
> > > > >> I have data coming from sensors into kafka cluster in text
> > > > >> format delimited by comma.
> > > > >>
> > > > >> How to offload this data to Hive periodically from Kafka. I
> > > > >> guess, Kafka Connect should solve my problem but when I checked
> > > > >> documentation, examples have only avro formatted data. Can you
> > please
> > > > >> provide some knowledge on this.
> > > > >>
> > > > >> Many Thanks
> > > > >> Pari
> > > > >>
> > > >
> > > > Information transmitted by this e-mail is proprietary to Mphasis,
> > > > its associated companies and/ or its customers and is intended for
> > > > use only by the individual or entity to which it is addressed, and
> > > may
> > > > contain information that is privileged, confidential or exempt
> > > > from disclosure under applicable law. If you are not the
> > intended
> > > > recipient or it appears that this mail has been forwarded to you
> > > > without proper authority, you are notified that any use or
> > > > dissemination of this information in any manner is strictly
> > > > prohibited. In such cases, please notify us immediately at
> > > > mailmas...@mphasis.com and delete this mail from your records.

Re: Kafka HDFS Connector

2016-06-22 Thread Pariksheet Barapatre

Hi Dustin,

I am looking for option 1.

Looking at Kafka Connect code, I guess we need to write converter code if
not available.


Thanks in advance.

Regards
Pari


On 22 June 2016 at 18:50, Dustin Cote  wrote:

> Hi Pari,
>
> Can you clarify which scenario you are looking to implement?
> 1) plaintext Kafka data --> plaintext HDFS data readable by hive
> 2) plaintext Kafka data --> avro/parquet HDFS data readable by hive
>
> Regards,
>
>
>
> On Wed, Jun 22, 2016 at 6:02 AM, Pariksheet Barapatre <
> pari.data...@gmail.com> wrote:
>
> > Thanks for your suggestions. I think if kafka connect provides the same
> > functionality as flume and storm,  why should we go for another
> > infrastructure investment.
> >
> > Kafka Connect effectively copies data from Kafka topic to HDFS through
> > connector. It supports avro as well as parquet, I am looking if we can
> use
> > it to load plain text data.
> >
> > Cheers
> > Pari
> >
> >
> >
> > On 22 June 2016 at 12:34, Lohith Samaga M 
> > wrote:
> >
> > > Hi,
> > > You can use Storm also, Here you have the option of rotating
> the
> > > file. You can also write to Hive directly.
> > >
> > > Best regards / Mit freundlichen Grüßen / Sincères salutations
> > > M. Lohith Samaga
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> > > Sent: Wednesday, June 22, 2016 12.32
> > > To: us...@kafka.apache.org; dev@kafka.apache.org
> > > Subject: Re: Kafka HDFS Connector
> > >
> > > I think you can use flume also.
> > >
> > > Thanks,
> > > Mudit
> > >
> > >
> > >
> > >
> > > On 6/22/16, 12:29 PM, "Pariksheet Barapatre" 
> > > wrote:
> > >
> > > >Anybody have any idea on this?
> > > >
> > > >Thanks
> > > >Pari
> > > >
> > > >On 20 June 2016 at 14:36, Pariksheet Barapatre <
> pari.data...@gmail.com>
> > > >wrote:
> > > >
> > > >> Hello All,
> > > >>
> > > >> I have data coming from sensors into kafka cluster in text format
> > > >> delimited by comma.
> > > >>
> > > >> How to offload this data to Hive periodically from Kafka. I guess,
> > > >> Kafka Connect should solve my problem but when I checked
> > > >> documentation, examples have only avro formatted data. Can you
> please
> > > >> provide some knowledge on this.
> > > >>
> > > >> Many Thanks
> > > >> Pari
> > > >>
> > >
> > > Information transmitted by this e-mail is proprietary to Mphasis, its
> > > associated companies and/ or its customers and is intended
> > > for use only by the individual or entity to which it is addressed, and
> > may
> > > contain information that is privileged, confidential or
> > > exempt from disclosure under applicable law. If you are not the
> intended
> > > recipient or it appears that this mail has been forwarded
> > > to you without proper authority, you are notified that any use or
> > > dissemination of this information in any manner is strictly
> > > prohibited. In such cases, please notify us immediately at
> > > mailmas...@mphasis.com and delete this mail from your records.
> > >
> >
>
>
>
> --
> Dustin Cote
> confluent.io
>

[jira] [Comment Edited] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344676#comment-15344676
 ] 

kambiz shahri edited comment on KAFKA-3824 at 6/22/16 4:30 PM:
---

Fair enough.
Thanks for taking the time, and apologies if I was coming off belligerent.


was (Author: beez):
Fair enough.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344676#comment-15344676
 ] 

kambiz shahri commented on KAFKA-3824:
--

Fair enough.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread Jay Kreps (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344548#comment-15344548
 ] 

Jay Kreps commented on KAFKA-3824:
--

[~beez] Sorry for not following up sooner. I think the reason for re-assigning 
was probably that writing a complete description of the behavior in JIRA is the 
same difficulty as just editing the javadoc directly.

I do think the description in the issue is fairly complete: The docs indicate 
that we violate at-least once delivery when auto commit is enabled, but as of 
0.10 we actually don't (i.e. our consumer delivery guarantees are stronger than 
indicated in the javadoc). I'm not sure there is anything else to add to the 
description.

In any case, sorry, we could probably have handled it more gracefully!

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3663) Proposal for a kafka broker command - kafka-brokers.sh

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344469#comment-15344469
 ] 

ASF GitHub Bot commented on KAFKA-3663:
---

GitHub user JThakrar opened a pull request:

https://github.com/apache/kafka/pull/1539

KAFKA-3663

This implements KIP-59: Proposal for a kafka broker command.
See 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-59%3A+Proposal+for+a+kafka+broker+command
 for details and sample output.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JThakrar/kafka KAFKA-3663

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1539


commit 1e78dd685235a1e46accf27e559d216fda82f397
Author: Thakrar, Jayesh 
Date:   2016-06-20T02:30:42Z

Initial checkin for KAFKA-3663 - kafka-brokers.sh command

commit dd01cd0ea2445472b186ea4ebe437ac526151cae
Author: Jayesh Thakrar 
Date:   2016-06-22T06:07:59Z

Made kafka-broker.sh executable

commit 0bb443add5e6cf085d6a95efa609c3fc3e0b2c19
Author: Jayesh Thakrar 
Date:   2016-06-22T06:10:03Z

Corrected applyBrokerFilter and some cosmetic updates




> Proposal for a kafka broker command - kafka-brokers.sh
> --
>
> Key: KAFKA-3663
> URL: https://issues.apache.org/jira/browse/KAFKA-3663
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Jayesh Thakrar
>
> This is a proposal for an admin tool - say, kafka-brokers.sh to provide 
> broker related useful information. See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-59%3A+Proposal+for+a+kafka+broker+command
>  for details.
> The kafka-brokers.sh command mimics the kafka-topic.sh command, but provides 
> details by broker rather than by topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #1539: KAFKA-3663

2016-06-22 Thread JThakrar

GitHub user JThakrar opened a pull request:

https://github.com/apache/kafka/pull/1539

KAFKA-3663

This implements KIP-59: Proposal for a kafka broker command.
See 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-59%3A+Proposal+for+a+kafka+broker+command
 for details and sample output.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JThakrar/kafka KAFKA-3663

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1539


commit 1e78dd685235a1e46accf27e559d216fda82f397
Author: Thakrar, Jayesh 
Date:   2016-06-20T02:30:42Z

Initial checkin for KAFKA-3663 - kafka-brokers.sh command

commit dd01cd0ea2445472b186ea4ebe437ac526151cae
Author: Jayesh Thakrar 
Date:   2016-06-22T06:07:59Z

Made kafka-broker.sh executable

commit 0bb443add5e6cf085d6a95efa609c3fc3e0b2c19
Author: Jayesh Thakrar 
Date:   2016-06-22T06:10:03Z

Corrected applyBrokerFilter and some cosmetic updates




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

RE: Kafka HDFS Connector

2016-06-22 Thread Lohith Samaga M

Hi,
You can use Storm also, Here you have the option of rotating the file. 
You can also write to Hive directly.

Best regards / Mit freundlichen Grüßen / Sincères salutations
M. Lohith Samaga

-Original Message-
From: Mudit Kumar [mailto:mudit.ku...@askme.in] 
Sent: Wednesday, June 22, 2016 12.32
To: us...@kafka.apache.org; dev@kafka.apache.org
Subject: Re: Kafka HDFS Connector

I think you can use flume also.

Thanks,
Mudit

On 6/22/16, 12:29 PM, "Pariksheet Barapatre"  wrote:

>Anybody have any idea on this?
>
>Thanks
>Pari
>
>On 20 June 2016 at 14:36, Pariksheet Barapatre 
>wrote:
>
>> Hello All,
>>
>> I have data coming from sensors into kafka cluster in text format 
>> delimited by comma.
>>
>> How to offload this data to Hive periodically from Kafka. I guess, 
>> Kafka Connect should solve my problem but when I checked 
>> documentation, examples have only avro formatted data. Can you please 
>> provide some knowledge on this.
>>
>> Many Thanks
>> Pari
>>

Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.

Re: Kafka HDFS Connector

2016-06-22 Thread Mudit Kumar

I think you can use flume also.

Thanks,
Mudit




On 6/22/16, 12:29 PM, "Pariksheet Barapatre"  wrote:

>Anybody have any idea on this?
>
>Thanks
>Pari
>
>On 20 June 2016 at 14:36, Pariksheet Barapatre 
>wrote:
>
>> Hello All,
>>
>> I have data coming from sensors into kafka cluster in text format
>> delimited by comma.
>>
>> How to offload this data to Hive periodically from Kafka. I guess, Kafka
>> Connect should solve my problem but when I checked documentation, examples
>> have only avro formatted data. Can you please provide some knowledge on
>> this.
>>
>> Many Thanks
>> Pari
>>

Re: Kafka HDFS Connector

2016-06-22 Thread Dustin Cote

Hi Pari,

Can you clarify which scenario you are looking to implement?
1) plaintext Kafka data --> plaintext HDFS data readable by hive
2) plaintext Kafka data --> avro/parquet HDFS data readable by hive

Regards,



On Wed, Jun 22, 2016 at 6:02 AM, Pariksheet Barapatre <
pari.data...@gmail.com> wrote:

> Thanks for your suggestions. I think if kafka connect provides the same
> functionality as flume and storm,  why should we go for another
> infrastructure investment.
>
> Kafka Connect effectively copies data from Kafka topic to HDFS through
> connector. It supports avro as well as parquet, I am looking if we can use
> it to load plain text data.
>
> Cheers
> Pari
>
>
>
> On 22 June 2016 at 12:34, Lohith Samaga M 
> wrote:
>
> > Hi,
> > You can use Storm also, Here you have the option of rotating the
> > file. You can also write to Hive directly.
> >
> > Best regards / Mit freundlichen Grüßen / Sincères salutations
> > M. Lohith Samaga
> >
> >
> >
> >
> > -Original Message-
> > From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> > Sent: Wednesday, June 22, 2016 12.32
> > To: us...@kafka.apache.org; dev@kafka.apache.org
> > Subject: Re: Kafka HDFS Connector
> >
> > I think you can use flume also.
> >
> > Thanks,
> > Mudit
> >
> >
> >
> >
> > On 6/22/16, 12:29 PM, "Pariksheet Barapatre" 
> > wrote:
> >
> > >Anybody have any idea on this?
> > >
> > >Thanks
> > >Pari
> > >
> > >On 20 June 2016 at 14:36, Pariksheet Barapatre 
> > >wrote:
> > >
> > >> Hello All,
> > >>
> > >> I have data coming from sensors into kafka cluster in text format
> > >> delimited by comma.
> > >>
> > >> How to offload this data to Hive periodically from Kafka. I guess,
> > >> Kafka Connect should solve my problem but when I checked
> > >> documentation, examples have only avro formatted data. Can you please
> > >> provide some knowledge on this.
> > >>
> > >> Many Thanks
> > >> Pari
> > >>
> >
> > Information transmitted by this e-mail is proprietary to Mphasis, its
> > associated companies and/ or its customers and is intended
> > for use only by the individual or entity to which it is addressed, and
> may
> > contain information that is privileged, confidential or
> > exempt from disclosure under applicable law. If you are not the intended
> > recipient or it appears that this mail has been forwarded
> > to you without proper authority, you are notified that any use or
> > dissemination of this information in any manner is strictly
> > prohibited. In such cases, please notify us immediately at
> > mailmas...@mphasis.com and delete this mail from your records.
> >
>



-- 
Dustin Cote
confluent.io

[jira] [Resolved] (KAFKA-3891) A KTable with Long values with a numeric filter apparently may retain null values

2016-06-22 Thread Phil Derome (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Derome resolved KAFKA-3891.

Resolution: Invalid

My mistake.

> A KTable with Long values with a numeric filter apparently may retain null 
> values
> -
>
> Key: KAFKA-3891
> URL: https://issues.apache.org/jira/browse/KAFKA-3891
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Phil Derome
>Assignee: Guozhang Wang
>Priority: Minor
>
> See Confluent's UserRegionLambdaExample for full detail. Not sure if this 
> qualifies as a bug as I am new to community, but to me it looks like a bug 
> (resolved KAFKA-739 and KAFKA-2026 also pertain to undesirable nulls and they 
> were deemed Major Bugs).
> The first filter on KTable for count below should filter correctly for null 
> since null does not satisfy predicate count >= 2.
> Variable regionCounts apparently contain some null values despite the filter 
> on count given the second filter that takes place. It's quite confusing. Why 
> would we want to publish these null values on any topic given the filter's 
> intent should be quite clear?
>   // Aggregate the user counts of by region
> KTable regionCounts = userRegions
> // Count by region
> // We do not need to specify any explict serdes because the key and 
> value types do not change
> .groupBy((userId, region) -> KeyValue.pair(region, region))
> .count("CountsByRegion")
> // discard any regions with only 1 user
> .filter((regionName, count) -> count >= 2);
> // Note: The following operations would NOT be needed for the actual 
> users-per-region
> // computation, which would normally stop at the filter() above.  We use 
> the operations
> // below only to "massage" the output data so it is easier to inspect on 
> the console via
> // kafka-console-consumer.
> //
> KStream regionCountsForConsole = regionCounts
> // get rid of windows (and the underlying KTable) by transforming the 
> KTable to a KStream
> .toStream()
> // sanitize the output by removing null record values (again, we do 
> this only so that the
> // output is easier to read via kafka-console-consumer combined with 
> LongDeserializer
> // because LongDeserializer fails on null values, and even though we 
> could configure
> // kafka-console-consumer to skip messages on error the output still 
> wouldn't look pretty)
> .filter((regionName, count) -> count != null);
> // write to the result topic, we need to override the value serializer to 
> for type long
> regionCountsForConsole.to(stringSerde, longSerde, "LargeRegions");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3891) A KTable with Long values with a numeric filter apparently may retain null values

2016-06-22 Thread Phil Derome (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Derome updated KAFKA-3891:
---

Please reject, my mistake.

> A KTable with Long values with a numeric filter apparently may retain null 
> values
> -
>
> Key: KAFKA-3891
> URL: https://issues.apache.org/jira/browse/KAFKA-3891
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Phil Derome
>Assignee: Guozhang Wang
>Priority: Minor
>
> See Confluent's UserRegionLambdaExample for full detail. Not sure if this 
> qualifies as a bug as I am new to community, but to me it looks like a bug 
> (resolved KAFKA-739 and KAFKA-2026 also pertain to undesirable nulls and they 
> were deemed Major Bugs).
> The first filter on KTable for count below should filter correctly for null 
> since null does not satisfy predicate count >= 2.
> Variable regionCounts apparently contain some null values despite the filter 
> on count given the second filter that takes place. It's quite confusing. Why 
> would we want to publish these null values on any topic given the filter's 
> intent should be quite clear?
>   // Aggregate the user counts of by region
> KTable regionCounts = userRegions
> // Count by region
> // We do not need to specify any explict serdes because the key and 
> value types do not change
> .groupBy((userId, region) -> KeyValue.pair(region, region))
> .count("CountsByRegion")
> // discard any regions with only 1 user
> .filter((regionName, count) -> count >= 2);
> // Note: The following operations would NOT be needed for the actual 
> users-per-region
> // computation, which would normally stop at the filter() above.  We use 
> the operations
> // below only to "massage" the output data so it is easier to inspect on 
> the console via
> // kafka-console-consumer.
> //
> KStream regionCountsForConsole = regionCounts
> // get rid of windows (and the underlying KTable) by transforming the 
> KTable to a KStream
> .toStream()
> // sanitize the output by removing null record values (again, we do 
> this only so that the
> // output is easier to read via kafka-console-consumer combined with 
> LongDeserializer
> // because LongDeserializer fails on null values, and even though we 
> could configure
> // kafka-console-consumer to skip messages on error the output still 
> wouldn't look pretty)
> .filter((regionName, count) -> count != null);
> // write to the result topic, we need to override the value serializer to 
> for type long
> regionCountsForConsole.to(stringSerde, longSerde, "LargeRegions");



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-3891) A KTable with Long values with a numeric filter apparently may retain null values

2016-06-22 Thread Phil Derome (JIRA)

Phil Derome created KAFKA-3891:
--

 Summary: A KTable with Long values with a numeric filter 
apparently may retain null values
 Key: KAFKA-3891
 URL: https://issues.apache.org/jira/browse/KAFKA-3891
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Phil Derome
Assignee: Guozhang Wang
Priority: Minor


See Confluent's UserRegionLambdaExample for full detail. Not sure if this 
qualifies as a bug as I am new to community, but to me it looks like a bug 
(resolved KAFKA-739 and KAFKA-2026 also pertain to undesirable nulls and they 
were deemed Major Bugs).

The first filter on KTable for count below should filter correctly for null 
since null does not satisfy predicate count >= 2.

Variable regionCounts apparently contain some null values despite the filter on 
count given the second filter that takes place. It's quite confusing. Why would 
we want to publish these null values on any topic given the filter's intent 
should be quite clear?

  // Aggregate the user counts of by region
KTable regionCounts = userRegions
// Count by region
// We do not need to specify any explict serdes because the key and 
value types do not change
.groupBy((userId, region) -> KeyValue.pair(region, region))
.count("CountsByRegion")
// discard any regions with only 1 user
.filter((regionName, count) -> count >= 2);

// Note: The following operations would NOT be needed for the actual 
users-per-region
// computation, which would normally stop at the filter() above.  We use 
the operations
// below only to "massage" the output data so it is easier to inspect on 
the console via
// kafka-console-consumer.
//
KStream regionCountsForConsole = regionCounts
// get rid of windows (and the underlying KTable) by transforming the 
KTable to a KStream
.toStream()
// sanitize the output by removing null record values (again, we do 
this only so that the
// output is easier to read via kafka-console-consumer combined with 
LongDeserializer
// because LongDeserializer fails on null values, and even though we 
could configure
// kafka-console-consumer to skip messages on error the output still 
wouldn't look pretty)
.filter((regionName, count) -> count != null);

// write to the result topic, we need to override the value serializer to 
for type long
regionCountsForConsole.to(stringSerde, longSerde, "LargeRegions");




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Kafka HDFS Connector

2016-06-22 Thread Pariksheet Barapatre

Thanks for your suggestions. I think if kafka connect provides the same
functionality as flume and storm,  why should we go for another
infrastructure investment.

Kafka Connect effectively copies data from Kafka topic to HDFS through
connector. It supports avro as well as parquet, I am looking if we can use
it to load plain text data.

Cheers
Pari



On 22 June 2016 at 12:34, Lohith Samaga M  wrote:

> Hi,
> You can use Storm also, Here you have the option of rotating the
> file. You can also write to Hive directly.
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations
> M. Lohith Samaga
>
>
>
>
> -Original Message-
> From: Mudit Kumar [mailto:mudit.ku...@askme.in]
> Sent: Wednesday, June 22, 2016 12.32
> To: us...@kafka.apache.org; dev@kafka.apache.org
> Subject: Re: Kafka HDFS Connector
>
> I think you can use flume also.
>
> Thanks,
> Mudit
>
>
>
>
> On 6/22/16, 12:29 PM, "Pariksheet Barapatre" 
> wrote:
>
> >Anybody have any idea on this?
> >
> >Thanks
> >Pari
> >
> >On 20 June 2016 at 14:36, Pariksheet Barapatre 
> >wrote:
> >
> >> Hello All,
> >>
> >> I have data coming from sensors into kafka cluster in text format
> >> delimited by comma.
> >>
> >> How to offload this data to Hive periodically from Kafka. I guess,
> >> Kafka Connect should solve my problem but when I checked
> >> documentation, examples have only avro formatted data. Can you please
> >> provide some knowledge on this.
> >>
> >> Many Thanks
> >> Pari
> >>
>
> Information transmitted by this e-mail is proprietary to Mphasis, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmas...@mphasis.com and delete this mail from your records.
>

Re: [VOTE] KIP-55: Secure quotas for authenticated users

2016-06-22 Thread Rajini Sivaram

Ismael, Jun,

Thank you both for the feedback. Have updated the KIP to add dynamic
default quotas for client-id with deprecation of existing static default
properties.


On Wed, Jun 22, 2016 at 12:50 AM, Jun Rao  wrote:

> Yes, for consistency, perhaps we can allow client-id quota to be configured
> dynamically too and mark the static config in the broker as deprecated. If
> both are set, the dynamic one wins.
>
> Thanks,
>
> Jun
>
> On Tue, Jun 21, 2016 at 3:56 AM, Ismael Juma  wrote:
>
> > On Tue, Jun 21, 2016 at 12:50 PM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> > > It is actually quite tempting to do the same for client-id quotas as
> > well,
> > > but I suppose we can't break existing users who have configured
> defaults
> > in
> > > server.properties and providing two ways of setting client-id defaults
> > > would be just too confusing.
> > >
> >
> > Using two different approaches for client-id versus user quota defaults
> is
> > also not great. We could deprecate the server.properties default configs
> > for client-id quotas and remove them in the future. In the meantime, we
> > would have to live with 2 level defaults.
> >
> > Jun, what are your thoughts on this?
> >
> > Ismael
> >
>



-- 
Regards,

Rajini

Build failed in Jenkins: kafka-trunk-jdk8 #716

2016-06-22 Thread Apache Jenkins Server

See 

Changes:

[ismael] MINOR: KAFKA-3176 follow-up to fix minor issues

--
[...truncated 531 lines...]
kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.OffsetIndexTest > lookupExtremeCases STARTED

kafka.log.OffsetIndexTest > lookupExtremeCases PASSED

kafka.log.OffsetIndexTest > appendTooMany STARTED

kafka.log.OffsetIndexTest > appendTooMany PASSED

kafka.log.OffsetIndexTest > randomLookupTest STARTED

kafka.log.OffsetIndexTest > randomLookupTest PASSED

kafka.log.OffsetIndexTest > testReopen STARTED

kafka.log.OffsetIndexTest > testReopen PASSED

kafka.log.OffsetIndexTest > appendOutOfOrder STARTED

kafka.log.OffsetIndexTest > appendOutOfOrder PASSED

kafka.log.OffsetIndexTest > truncate STARTED

kafka.log.OffsetIndexTest > truncate PASSED

kafka.log.OffsetMapTest > testClear STARTED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.OffsetMapTest > testGetWhenFull STARTED

kafka.log.OffsetMapTest > testGetWhenFull PASSED

kafka.log.OffsetMapTest > testBasicValidation STARTED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize STARTED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
STARTED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogManagerTest > testGetNonExistentLog STARTED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails STARTED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment STARTED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments STARTED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints STARTED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testTimeBasedFlush STARTED

k

[GitHub] kafka pull request #1536: MINOR: KAFKA-3176 follow-up to fix minor issues

2016-06-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1536


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3176) Allow console consumer to consume from particular partitions when new consumer is used.

2016-06-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343897#comment-15343897
 ] 

ASF GitHub Bot commented on KAFKA-3176:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1536


> Allow console consumer to consume from particular partitions when new 
> consumer is used.
> ---
>
> Key: KAFKA-3176
> URL: https://issues.apache.org/jira/browse/KAFKA-3176
> Project: Kafka
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Vahid Hashemian
> Fix For: 0.10.1.0
>
>
> Previously we have simple consumer shell which can consume from a particular 
> partition. Moving forward we will deprecate simple consumer, it would be 
> useful to allow console consumer to consumer from a particular partition when 
> new consumer is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343864#comment-15343864
 ] 

kambiz shahri edited comment on KAFKA-3824 at 6/22/16 7:50 AM:
---

My gripe is 3 fold:
1. The reporter of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it affects least once delivery; there are 
production systems running out there, that might be adversely affected, by a 
subtlety that someone is not bothered to document or even reply to a request 
for clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.


was (Author: beez):
My gripe is 3 fold:
1. The reporter of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it affects least once delivery; there are 
production systems running out there, that might be adversely affected, by a 
subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343864#comment-15343864
 ] 

kambiz shahri edited comment on KAFKA-3824 at 6/22/16 7:49 AM:
---

My gripe is 3 fold:
1. The reporter of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it potentially least once delivery; there 
are production systems running out there, that might be adversely affected, by 
a subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.


was (Author: beez):
My gripe is 3 fold:
1. The report of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it potentially least once delivery; there 
are production systems running out there, that might be adversely affected, by 
a subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343864#comment-15343864
 ] 

kambiz shahri edited comment on KAFKA-3824 at 6/22/16 7:49 AM:
---

My gripe is 3 fold:
1. The reporter of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it affects least once delivery; there are 
production systems running out there, that might be adversely affected, by a 
subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.


was (Author: beez):
My gripe is 3 fold:
1. The reporter of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it potentially least once delivery; there 
are production systems running out there, that might be adversely affected, by 
a subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3824) Docs indicate auto.commit breaks at least once delivery but that is incorrect

2016-06-22 Thread kambiz shahri (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343864#comment-15343864
 ] 

kambiz shahri commented on KAFKA-3824:
--

My gripe is 3 fold:
1. The report of the issue simply did not bother to reply 
2. It is as clear as mud
3. We all should have a chance to understand who has done what to the Consumer 
with regards to autocommit and how it potentially least once delivery; there 
are production systems running out there, that might be adversely affected, by 
a subtlety that someone is not bothered to document or even reply to a request 
for a clarification.

Internal or external, there should be clarity on such an important class.

Also, farming out JavaDoc or Documentation to a Newbie, who then does her/his 
due-diligence, and requests clarification, which means the reporter has to do 
more work, and then just reassigns it, because he cannot be bothered to explain 
changes he made, is a very poor reflection on Kafka.

So please, related or not, all changes to the Consumer should be made clear, 
and explained so it can be documented.
Assigning it back to myself, to be fobbed off more, is not of interest.

> Docs indicate auto.commit breaks at least once delivery but that is incorrect
> -
>
> Key: KAFKA-3824
> URL: https://issues.apache.org/jira/browse/KAFKA-3824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Affects Versions: 0.10.0.0
>Reporter: Jay Kreps
>Assignee: Jason Gustafson
>  Labels: newbie
> Fix For: 0.10.1.0, 0.10.0.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The javadocs for the new consumer indicate that auto commit breaks at least 
> once delivery. This is no longer correct as of 0.10. 
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Kafka HDFS Connector

2016-06-22 Thread Pariksheet Barapatre

Anybody have any idea on this?

Thanks
Pari

On 20 June 2016 at 14:36, Pariksheet Barapatre 
wrote:

> Hello All,
>
> I have data coming from sensors into kafka cluster in text format
> delimited by comma.
>
> How to offload this data to Hive periodically from Kafka. I guess, Kafka
> Connect should solve my problem but when I checked documentation, examples
> have only avro formatted data. Can you please provide some knowledge on
> this.
>
> Many Thanks
> Pari
>

73 matches

Mail list logo