[jira] [Updated] (KAFKA-4822) Kafka producer implementation without additional threads and control of when data is sent to kafka (similar to sync producer of 0.8.)

2017-03-01 Thread Giri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giri updated KAFKA-4822:

Summary: Kafka producer implementation without additional threads and 
control of when data is sent to kafka (similar to sync producer of 0.8.)  (was: 
Kafka producer implementation without additional threads and control of when 
data is sent to kafka, similar to sync producer of 0.8.)

> Kafka producer implementation without additional threads and control of when 
> data is sent to kafka (similar to sync producer of 0.8.)
> -
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4822) Kafka producer implementation without additional threads and control of when data is sent to kafka, similar to sync producer of 0.8.

2017-03-01 Thread Giri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giri updated KAFKA-4822:

Summary: Kafka producer implementation without additional threads and 
control of when data is sent to kafka, similar to sync producer of 0.8.  (was: 
Kafka producer implementation without additional threads, similar to sync 
producer of 0.8.)

> Kafka producer implementation without additional threads and control of when 
> data is sent to kafka, similar to sync producer of 0.8.
> 
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes

2017-03-01 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891667#comment-15891667
 ] 

Ewen Cheslack-Postava commented on KAFKA-4092:
--

[~ijuma] Since this is still marked for a released version, can we retarget the 
fix version since it's been reopened? And possibly the status as well -- is 
there a blocker here, or still minor? At a minimum, we might need to remove the 
fix version if the change has been reverted.

> retention.bytes should not be allowed to be less than segment.bytes
> ---
>
> Key: KAFKA-4092
> URL: https://issues.apache.org/jira/browse/KAFKA-4092
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Right now retention.bytes can be as small as the user wants but it doesn't 
> really get acted on for the active segment if retention.bytes is smaller than 
> segment.bytes.  We shouldn't allow retention.bytes to be less than 
> segment.bytes and validate that at startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-128: Add ByteArrayConverter for Kafka Connect

2017-03-01 Thread Gwen Shapira
Yeah, you are right, it is best not to make converters actively break the
data structures :)

On Wed, Mar 1, 2017 at 4:55 PM, Ewen Cheslack-Postava 
wrote:

> Guozhang,
>
> I'm fine w/ adjusting if people want to, but it ends up being more code
> since we also need to convert SerializationExceptions to DataExceptions and
> the only thing the toConnectData method even does is specific to Connect
> (adding the SchemaAndValue).
>
> Gwen -- isn't that an SMT? ExtractField?
>
> -Ewen
>
> On Wed, Mar 1, 2017 at 1:58 PM, Gwen Shapira  wrote:
>
> > Hi Ewen,
> >
> > Thanks for the KIP, I think it will be useful :)
> >
> > I'm just wondering if we can add support not just for bytes schema,
> > but also for a struct that contains bytes? I'm thinking of the
> > scenario of using a connector to grab BLOBs out of a DB - I think you
> > end up with this structure if you use a JDBC connector and custom
> > query...
> >
> > Maybe even support Maps with generic objects using Java's default
> > serialization? I'm not sure if this is useful.
> >
> > Gwen
> >
> >
> >
> > On Sat, Feb 25, 2017 at 2:25 PM, Ewen Cheslack-Postava
> >  wrote:
> > > Hi all,
> > >
> > > I've added a pretty trivial KIP for adding a pass-through Converter for
> > > Kafka Connect, similar to ByteArraySerializer/Deserializer.
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 128%3A+Add+ByteArrayConverter+for+Kafka+Connect
> > >
> > > This wasn't added with the framework originally because the idea was to
> > > deal with structured data for the most part. However, we've seen a
> couple
> > > of use cases arise as the framework got more traction and I think it
> > makes
> > > sense to provide this out of the box now so people stop reinventing the
> > > wheel (and using a different fully-qualified class name) for each
> > connector
> > > that needs this functionality.
> > >
> > > I imagine this will be a rather uncontroversial addition, so if I don't
> > see
> > > any comments in the next day or two I'll just start the vote thread.
> > >
> > > -Ewen
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Build failed in Jenkins: kafka-trunk-jdk8 #1314

2017-03-01 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-4677 Follow Up: add optimization to StickyTaskAssignor for 
rolling

[wangguoz] MINOR: improve MinTimestampTrackerTest and fix NPE when null element

--
[...truncated 693.66 KB...]

org.apache.kafka.streams.state.internals.ChangeLoggingSegmentedBytesStoreTest > 
shouldDelegateToUnderlyingStoreWhenFetching STARTED

org.apache.kafka.streams.state.internals.ChangeLoggingSegmentedBytesStoreTest > 
shouldDelegateToUnderlyingStoreWhenFetching PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
shouldPutAllKeyValuePairs STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
shouldPutAllKeyValuePairs PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > testEvict 
STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > testEvict 
PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
shouldUpdateValuesForExistingKeysOnPutAll STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
shouldUpdateValuesForExistingKeysOnPutAll PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > testSize 
STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > testSize 
PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutIfAbsent STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutIfAbsent PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestoreWithDefaultSerdes STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestoreWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestore STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testRestore PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRange STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRange PASSED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRangeWithDefaultSerdes STARTED

org.apache.kafka.streams.state.internals.InMemoryLRUCacheStoreTest > 
testPutGetRangeWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldUseCustomRocksDbConfigSetter STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldUseCustomRocksDbConfigSetter PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldPerformAllQueriesWithCachingDisabled STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldPerformAllQueriesWithCachingDisabled PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldPerformRangeQueriesWithCachingDisabled STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldPerformRangeQueriesWithCachingDisabled PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldCloseOpenIteratorsWhenStoreClosedAndThrowInvalidStateStoreOnHasNextAndNext
 STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
shouldCloseOpenIteratorsWhenStoreClosedAndThrowInvalidStateStoreOnHasNextAndNext
 PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > testSize 
STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > testSize 
PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutIfAbsent STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutIfAbsent PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testRestoreWithDefaultSerdes STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testRestoreWithDefaultSerdes PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > testRestore 
STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > testRestore 
PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRange STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRange PASSED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRangeWithDefaultSerdes STARTED

org.apache.kafka.streams.state.internals.RocksDBKeyValueStoreTest > 
testPutGetRangeWithDefaultSerdes PASSED

org.apache.kafka.streams.KeyValueTest > shouldHaveSaneEqualsAndHashCode STARTED

org.apache.kafka.streams.KeyValueTest > shouldHaveSaneEqualsAndHashCode PASSED

org.apache.kafka.streams.KafkaStreamsTest > testIllegalMetricsConfig STARTED

org.apache.kafka.streams.KafkaStreamsTest > testIllegalMetricsConfig PASSED

org.apache.kafka.streams.KafkaStreamsTest > shouldNotGetAllTasksWhenNotRunning 
STARTED

Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Gwen Shapira
+1

On Tue, Feb 28, 2017 at 2:40 AM, Ismael Juma  wrote:

> Hi everyone,
>
> Since the few who responded in the discuss thread were in favour and there
> were no objections, I would like to initiate the voting process for
> KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
>
> The vote will run for a minimum of 72 hours.
>
> Thanks,
> Ismael
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Ismael Juma
Hi Becket,

I think there's a misunderstanding. Scala 2.11 has been the recommended
version since Kafka 0.9.0.0 as you can see in the downloads page:

https://kafka.apache.org/downloads

Scala 2.10 is the default build version for the same reason that we
recommend people to build with Java 7 when developing: we build with the
lowest supported version to avoid using methods from a newer version
accidentally.

If we continue supporting 2.10 in 0.11.0, then we won't be able to drop
support until Kafka 0.12.0 (or 1.0.0 if we go from 0.11.0 to 1.0.0). If
this is important to LinkedIn, we should consider it. I want to make sure
that the facts are clear though. :)

Ismael

On Thu, Mar 2, 2017 at 1:49 AM, Becket Qin  wrote:

> Hey Ismael,
>
> Sorry for being late on this KIP. At LinkedIn we are still in the progress
> or migrating from Scala 2.10 to 2.11. I am not sure how many other existing
> Kafka users are still on Scala 2.10. But given that until now the default
> Scala version is still 2.10.6, I feel it is a little too aggressive to move
> default version to 2.11 and drop 2.10 support at the same time. We are
> essentially dropping support of the current default Scala version.
>
> Since we have a policy for API deprecation, i.e. add new interface and mark
> the old one as deprecated at release X, drop the old interface at release
> X+1. It seems reasonable to do the same for Scala version here. So should
> we consider only making Scala version 2.11 as default in Kafka 0.11.0 and
> drop support for Scala 2.10 in Kafka 0.11.1?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Mar 1, 2017 at 4:42 PM, Apurva Mehta  wrote:
>
> > +1 (non-binding)
> >
> > On Wed, Mar 1, 2017 at 4:41 PM, Ewen Cheslack-Postava  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > -Ewen
> > >
> > > On Wed, Mar 1, 2017 at 4:56 AM, Jozef.koval  >
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Jozef
> > > >
> > > > P.S. I volunteer to help with this KIP.
> > > >
> > > >
> > > > Sent from [ProtonMail](https://protonmail.ch), encrypted email based
> > in
> > > > Switzerland.
> > > >
> > > >
> > > >
> > > >  Original Message 
> > > > Subject: Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka
> 0.11
> > > > Local Time: March 1, 2017 1:46 PM
> > > > UTC Time: March 1, 2017 12:46 PM
> > > > From: bbej...@gmail.com
> > > > To: dev@kafka.apache.org
> > > >
> > > > +1
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > > > On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska  >
> > > > wrote:
> > > >
> > > > > +1, thanks.
> > > > >
> > > > > Eno
> > > > > > On 28 Feb 2017, at 17:17, Guozhang Wang 
> > wrote:
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford <
> > tcrayf...@heroku.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1 (non-binding)
> > > > > >>
> > > > > >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint <
> > molnarcsi...@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> +1
> > > > > >>>
> > > > > >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> > > > > >>>
> > > > > 
> > > > > 
> > > > >  +1.
> > > > > 
> > > > > 
> > > > > 
> > > > >  Best,
> > > > > 
> > > > >  Dongjin
> > > > > 
> > > > > 
> > > > >  --
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > >  Dongjin Lee
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > >  Software developer in Line+.
> > > > > 
> > > > >  So interested in massive-scale machine learning.
> > > > > 
> > > > > 
> > > > > 
> > > > >  facebook: www.facebook.com/dongjin.lee.kr (
> > > http://www.facebook.com/
> > > > >  dongjin.lee.kr)
> > > > > 
> > > > >  linkedin: kr.linkedin.com/in/dongjinleekr (
> > > > > >> http://kr.linkedin.com/in/
> > > > >  dongjinleekr)
> > > > > 
> > > > > 
> > > > >  github: (http://goog_969573159/)github.com/dongjinleekr (
> > > > >  http://github.com/dongjinleekr)
> > > > > 
> > > > >  twitter: www.twitter.com/dongjinleekr (
> http://www.twitter.com/
> > > > >  dongjinleekr)
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > >
> > > > > > On Feb 28, 2017 at 6:40 PM,  > > ism...@juma.me.uk
> > > > > >> )>
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Since the few who responded in the discuss thread were in
> > favour
> > > > and
> > > > >  there
> > > > > > were no objections, I would like to initiate the voting
> process
> > > for
> > > > > > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> > > > > >
> > > > > > The vote will run for a minimu

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-03-01 Thread Colin McCabe
On Wed, Mar 1, 2017, at 15:52, radai wrote:
> quick comment on the request objects:
> 
> i see "abstract class NewTopic" and "class NewTopicWithReplication" and "
> NewTopicWithReplicaAssignments"
> 
> 1. since the result object is called CreateTopicResults should these be
> called *Request?

Hi radai,

Thanks for taking a look.

I think using the name "request" would be very confusing here, because
we have a whole family of internal Request classes such as
CreateTopicsRequest, TopicMetataRequest, etc. which are used for RPCs.

> 2. this seems like a suboptimal approach to me. imagine we add a
> NewTopicWithSecurity, and then we would need
> NewTopicWithReplicationAndSecurity? (or any composable "traits").
> this wont really scale. Wouldnt it be better to have a single (rather 
> complicated)
> CreateTopicRequest, and use a builder pattern to deal with the compexity
> and options? like so:
> 
> CreateTopicRequest req =
> AdminRequests.newTopic("bob").replicationFactor(2).withPartitionAssignment(1,
> "boker7", "broker10").withOption(...).build();
> 
> the builder would validate any potentially conflicting options and would
> allow piling on the complexity in a more manageable way (note - my code
> above intends to demonstrate both a general replication factor and a
> specific assignment for a partiocular partition of that topic, which may
> be
> too much freedom).

We don't need to express every optional bell and whistle by creating a
subclass.  In fact, the proposal already had setConfigs() in the base
class, since it applies to every new topic creation.

Thinking about it a little more, though, the subclasses don't really add
that much value, so we should probably just have NewTopic and no
subclasses.  I removed the subclasses.

best,
Colin

> 
> On Wed, Mar 1, 2017 at 1:58 PM, Colin McCabe  wrote:
> 
> > Hi all,
> >
> > Thanks for commenting, everyone.  Does anyone have more questions or
> > comments, or should we vote?  The latest proposal is up at
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+
> > AdminClient+API+for+Kafka+admin+operations
> >
> > best,
> > Colin
> >
> >
> > On Thu, Feb 16, 2017, at 15:00, Colin McCabe wrote:
> > > On Thu, Feb 16, 2017, at 14:11, Dong Lin wrote:
> > > > Hey Colin,
> > > >
> > > > Thanks for the update. I have two comments:
> > > >
> > > > - I actually think it is simpler and good enough to have per-topic API
> > > > instead of batch-of-topic API. This is different from the argument for
> > > > batch-of-partition API because, unlike operation on topic, people
> > usually
> > > > operate on multiple partitions (e.g. seek(), purge()) at a time. Is
> > there
> > > > performance concern with per-topic API? I am wondering if we should do
> > > > per-topic API until there is use-case or performance benefits of
> > > > batch-of-topic API.
> > >
> > > Yes, there is a performance concern with only supporting operations on
> > > one topic at a time.  Jay expressed this in some of his earlier emails
> > > and some other people did as well.  We have cases in mind for management
> > > software where many topics are created at once.
> > >
> > > >
> > > > - Currently we have interface "Consumer" and "Producer". And we also
> > have
> > > > implementations of these two interfaces as "KafkaConsumer" and
> > > > "KafkaProducer". If we follow the same naming pattern, should we have
> > > > interface "AdminClient" and the implementation "KafkaAdminClient",
> > > > instead
> > > > of the other way around?
> > >
> > > That's a good point.  We should do that for consistency.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Dong
> > > >
> > > >
> > > > On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > So I think people have made some very good points so far.  There
> > seems
> > > > > to be agreement that we need to have explicit batch APIs for the
> > sake of
> > > > > efficiency, so I added that back.
> > > > >
> > > > > Contexts seem a little more complex than we thought, so I removed
> > that
> > > > > from the proposal.
> > > > >
> > > > > I removed the Impl class.  Instead, we now have a KafkaAdminClient
> > > > > interface and an AdminClient implementation.  I think this matches
> > our
> > > > > other code better, as Jay commented.
> > > > >
> > > > > Each call now has an "Options" object that is passed in.  This will
> > > > > allow us to easily add new parameters to the calls without having
> > tons
> > > > > of function overloads.  Similarly, each call now has a Results
> > object,
> > > > > which will let us easily extend the results we are returning if
> > needed.
> > > > >
> > > > > Many people made the point that Java 7 Futures are not that useful,
> > but
> > > > > Java 8 CompletableFutures are.  With CompletableFutures, you can
> > chain
> > > > > calls, adapt them, join them-- basically all the stuff people are
> > doing
> > > > > in Node.js and Twisted Python.  Java 7 Futures don't 

Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-03-01 Thread Becket Qin
Thanks for the update. The changes sound reasonable.

On Wed, Mar 1, 2017 at 1:57 PM, Dong Lin  wrote:

> Hi all,
>
> I have updated the KIP to include a script that allows user to purge data
> by providing a map from partition to offset. I think this script may be
> convenience and useful, e.g., if user simply wants to purge all data of
> given partitions from command line. I am wondering if anyone object this
> script or has suggestions on the interface.
>
> Besides, Ismael commented in the pull request that it may be better to
> rename PurgeDataBefore() to DeleteDataBefore() and rename PurgeRequest to
> DeleteRequest. I think it may be a good idea because kafka-topics.sh
> already use "delete" as an option. Personally I don't have strong
> preference between "purge" and "delete". I am wondering if anyone object to
> this change.
>
> Thanks,
> Dong
>
>
>
> On Wed, Mar 1, 2017 at 9:46 AM, Dong Lin  wrote:
>
> > Hi Ismael,
> >
> > I actually mean log_start_offset. I realized that it is a better name
> > after I start implementation because "logStartOffset" is already used in
> > Log.scala and LogCleanerManager.scala. So I changed it from
> > log_begin_offset to log_start_offset in the patch. But I forgot to update
> > the KIP and specify it in the mailing thread.
> >
> > Thanks for catching this. Let me update the KIP to reflect this change.
> >
> > Dong
> >
> >
> > On Wed, Mar 1, 2017 at 6:15 AM, Ismael Juma  wrote:
> >
> >> Hi Dong,
> >>
> >> When you say "logStartOffset", do you mean "log_begin_offset "? I could
> >> only find the latter in the KIP. If so, would log_start_offset be a
> better
> >> name?
> >>
> >> Ismael
> >>
> >> On Tue, Feb 28, 2017 at 4:26 AM, Dong Lin  wrote:
> >>
> >> > Hi Jun and everyone,
> >> >
> >> > I would like to change the KIP in the following way. Currently, if any
> >> > replica if offline, the purge result for a partition will
> >> > be NotEnoughReplicasException and its low_watermark will be 0. The
> >> > motivation for this approach is that we want to guarantee that the
> data
> >> > before purgedOffset has been deleted on all replicas of this partition
> >> if
> >> > purge result indicates success.
> >> >
> >> > But this approach seems too conservative. It should be sufficient in
> >> most
> >> > cases to just tell user success and set low_watermark to minimum
> >> > logStartOffset of all live replicas in the PurgeResponse if
> >> logStartOffset
> >> > of all live replicas have reached purgedOffset. This is because for an
> >> > offline replicas to become online and be elected leader, it should
> have
> >> > received one FetchReponse from the current leader which should tell it
> >> to
> >> > purge beyond purgedOffset. The benefit of doing this change is that we
> >> can
> >> > allow purge operation to succeed when some replica is offline.
> >> >
> >> > Are you OK with this change? If so, I will go ahead to update the KIP
> >> and
> >> > implement this behavior.
> >> >
> >> > Thanks,
> >> > Dong
> >> >
> >> >
> >> >
> >> > On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin 
> wrote:
> >> >
> >> > > Hey Jun,
> >> > >
> >> > > Do you have time to review the KIP again or vote for it?
> >> > >
> >> > > Hey Ewen,
> >> > >
> >> > > Can you also review the KIP again or vote for it? I have discussed
> >> with
> >> > > Radai and Becket regarding your concern. We still think putting it
> in
> >> > Admin
> >> > > Client seems more intuitive because there is use-case where
> >> application
> >> > > which manages topic or produces data may also want to purge data. It
> >> > seems
> >> > > weird if they need to create a consumer to do this.
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> > > On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
> >> > > gharatmayures...@gmail.com> wrote:
> >> > >
> >> > >> +1 (non-binding)
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> Mayuresh
> >> > >>
> >> > >> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin 
> >> wrote:
> >> > >>
> >> > >> > Sorry for the duplicated email. It seems that gmail will put the
> >> > voting
> >> > >> > email in this thread if I simply replace DISCUSS with VOTE in the
> >> > >> subject.
> >> > >> >
> >> > >> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin 
> >> > wrote:
> >> > >> >
> >> > >> > > Hi all,
> >> > >> > >
> >> > >> > > It seems that there is no further concern with the KIP-107. At
> >> this
> >> > >> point
> >> > >> > > we would like to start the voting process. The KIP can be found
> >> at
> >> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107
> >> > >> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
> >> > >> > >
> >> > >> > > Thanks,
> >> > >> > > Dong
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> -Regards,
> >> > >> Mayuresh R. Gharat
> >> > >> (862) 250-7125
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>


Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Becket Qin
Hey Ismael,

Sorry for being late on this KIP. At LinkedIn we are still in the progress
or migrating from Scala 2.10 to 2.11. I am not sure how many other existing
Kafka users are still on Scala 2.10. But given that until now the default
Scala version is still 2.10.6, I feel it is a little too aggressive to move
default version to 2.11 and drop 2.10 support at the same time. We are
essentially dropping support of the current default Scala version.

Since we have a policy for API deprecation, i.e. add new interface and mark
the old one as deprecated at release X, drop the old interface at release
X+1. It seems reasonable to do the same for Scala version here. So should
we consider only making Scala version 2.11 as default in Kafka 0.11.0 and
drop support for Scala 2.10 in Kafka 0.11.1?

Thanks,

Jiangjie (Becket) Qin

On Wed, Mar 1, 2017 at 4:42 PM, Apurva Mehta  wrote:

> +1 (non-binding)
>
> On Wed, Mar 1, 2017 at 4:41 PM, Ewen Cheslack-Postava 
> wrote:
>
> > +1 (binding)
> >
> > -Ewen
> >
> > On Wed, Mar 1, 2017 at 4:56 AM, Jozef.koval 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Jozef
> > >
> > > P.S. I volunteer to help with this KIP.
> > >
> > >
> > > Sent from [ProtonMail](https://protonmail.ch), encrypted email based
> in
> > > Switzerland.
> > >
> > >
> > >
> > >  Original Message 
> > > Subject: Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11
> > > Local Time: March 1, 2017 1:46 PM
> > > UTC Time: March 1, 2017 12:46 PM
> > > From: bbej...@gmail.com
> > > To: dev@kafka.apache.org
> > >
> > > +1
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska 
> > > wrote:
> > >
> > > > +1, thanks.
> > > >
> > > > Eno
> > > > > On 28 Feb 2017, at 17:17, Guozhang Wang 
> wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford <
> tcrayf...@heroku.com>
> > > > wrote:
> > > > >
> > > > >> +1 (non-binding)
> > > > >>
> > > > >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint <
> molnarcsi...@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> +1
> > > > >>>
> > > > >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> > > > >>>
> > > > 
> > > > 
> > > >  +1.
> > > > 
> > > > 
> > > > 
> > > >  Best,
> > > > 
> > > >  Dongjin
> > > > 
> > > > 
> > > >  --
> > > > 
> > > > 
> > > > 
> > > > 
> > > >  Dongjin Lee
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >  Software developer in Line+.
> > > > 
> > > >  So interested in massive-scale machine learning.
> > > > 
> > > > 
> > > > 
> > > >  facebook: www.facebook.com/dongjin.lee.kr (
> > http://www.facebook.com/
> > > >  dongjin.lee.kr)
> > > > 
> > > >  linkedin: kr.linkedin.com/in/dongjinleekr (
> > > > >> http://kr.linkedin.com/in/
> > > >  dongjinleekr)
> > > > 
> > > > 
> > > >  github: (http://goog_969573159/)github.com/dongjinleekr (
> > > >  http://github.com/dongjinleekr)
> > > > 
> > > >  twitter: www.twitter.com/dongjinleekr (http://www.twitter.com/
> > > >  dongjinleekr)
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > >
> > > > > On Feb 28, 2017 at 6:40 PM,  > ism...@juma.me.uk
> > > > >> )>
> > > >  wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > Since the few who responded in the discuss thread were in
> favour
> > > and
> > > >  there
> > > > > were no objections, I would like to initiate the voting process
> > for
> > > > > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> > > > >
> > > > > The vote will run for a minimum of 72 hours.
> > > > >
> > > > > Thanks,
> > > > > Ismael
> > > > >
> > > > 
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > >
> > > >
> > >
> >
>


[jira] [Comment Edited] (KAFKA-4396) Seeing offsets not resetting even when reset policy is configured explicitly

2017-03-01 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891428#comment-15891428
 ] 

Helena Edelson edited comment on KAFKA-4396 at 3/2/17 1:39 AM:
---

I ran into this as well. Re-deploying with a new consumer group.id fixed it but 
that's not a sustainable solution.
Kafka 0.10.1.0.

@jrmiller I recommend opening a related ticket against Spark - spark streaming 
kafka


was (Author: helena_e):
I ran into this as well. Re-deploying with a new consumer group.id fixed it but 
that's not a sustainable solution.
Kafka 0.10.1.0

> Seeing offsets not resetting even when reset policy is configured explicitly
> 
>
> Key: KAFKA-4396
> URL: https://issues.apache.org/jira/browse/KAFKA-4396
> Project: Kafka
>  Issue Type: Bug
>Reporter: Justin Miller
>
> I've been seeing a curious error with kafka 0.10 (spark 2.11), these may be 
> two separate errors, I'm not sure. What's puzzling is that I'm setting 
> auto.offset.reset to latest and it's still throwing an 
> OffsetOutOfRangeException, behavior that's contrary to the code. Please help! 
> :)
> {code}
> val kafkaParams = Map[String, Object](
>   "group.id" -> consumerGroup,
>   "bootstrap.servers" -> bootstrapServers,
>   "key.deserializer" -> classOf[ByteArrayDeserializer],
>   "value.deserializer" -> classOf[MessageRowDeserializer],
>   "auto.offset.reset" -> "latest",
>   "enable.auto.commit" -> (false: java.lang.Boolean),
>   "max.poll.records" -> persisterConfig.maxPollRecords,
>   "request.timeout.ms" -> persisterConfig.requestTimeoutMs,
>   "session.timeout.ms" -> persisterConfig.sessionTimeoutMs,
>   "heartbeat.interval.ms" -> persisterConfig.heartbeatIntervalMs,
>   "connections.max.idle.ms"-> persisterConfig.connectionsMaxIdleMs
> )
> {code}
> {code}
> 16/11/09 23:10:17 INFO BlockManagerInfo: Added broadcast_154_piece0 in memory 
> on xyz (size: 146.3 KB, free: 8.4 GB)
> 16/11/09 23:10:23 WARN TaskSetManager: Lost task 15.0 in stage 151.0 (TID 
> 38837, xyz): org.apache.kafka.clients.consumer.OffsetOutOfRangeException: 
> Offsets out of range with no configured reset policy for partitions: 
> {topic=231884473}
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:588)
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:99)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:70)
> at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
> at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
> at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/11/09 23:10:29 INFO TaskSetManager: Finished task 10.0 in stage 154.0 (TID 
> 39388) in 12043 ms on xyz (1/16)
> 16/11/09 23:10:31 INFO TaskSetManager: Finished task 0.0 in stage 154.0 (TID 
> 39375) in 13444 ms on xyz (2/16)
> 16/11/09 23:10:44 WARN TaskSetManager: Lost task 1.0 in stage 151.0 (TID 
> 38843, xyz): java.u

[jira] [Commented] (KAFKA-4396) Seeing offsets not resetting even when reset policy is configured explicitly

2017-03-01 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891428#comment-15891428
 ] 

Helena Edelson commented on KAFKA-4396:
---

I ran into this as well. Re-deploying with a new consumer group.id fixed it but 
that's not a sustainable solution.
Kafka 0.10.1.0

> Seeing offsets not resetting even when reset policy is configured explicitly
> 
>
> Key: KAFKA-4396
> URL: https://issues.apache.org/jira/browse/KAFKA-4396
> Project: Kafka
>  Issue Type: Bug
>Reporter: Justin Miller
>
> I've been seeing a curious error with kafka 0.10 (spark 2.11), these may be 
> two separate errors, I'm not sure. What's puzzling is that I'm setting 
> auto.offset.reset to latest and it's still throwing an 
> OffsetOutOfRangeException, behavior that's contrary to the code. Please help! 
> :)
> {code}
> val kafkaParams = Map[String, Object](
>   "group.id" -> consumerGroup,
>   "bootstrap.servers" -> bootstrapServers,
>   "key.deserializer" -> classOf[ByteArrayDeserializer],
>   "value.deserializer" -> classOf[MessageRowDeserializer],
>   "auto.offset.reset" -> "latest",
>   "enable.auto.commit" -> (false: java.lang.Boolean),
>   "max.poll.records" -> persisterConfig.maxPollRecords,
>   "request.timeout.ms" -> persisterConfig.requestTimeoutMs,
>   "session.timeout.ms" -> persisterConfig.sessionTimeoutMs,
>   "heartbeat.interval.ms" -> persisterConfig.heartbeatIntervalMs,
>   "connections.max.idle.ms"-> persisterConfig.connectionsMaxIdleMs
> )
> {code}
> {code}
> 16/11/09 23:10:17 INFO BlockManagerInfo: Added broadcast_154_piece0 in memory 
> on xyz (size: 146.3 KB, free: 8.4 GB)
> 16/11/09 23:10:23 WARN TaskSetManager: Lost task 15.0 in stage 151.0 (TID 
> 38837, xyz): org.apache.kafka.clients.consumer.OffsetOutOfRangeException: 
> Offsets out of range with no configured reset policy for partitions: 
> {topic=231884473}
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:588)
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:354)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1000)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:938)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:99)
> at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:70)
> at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
> at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
> at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/11/09 23:10:29 INFO TaskSetManager: Finished task 10.0 in stage 154.0 (TID 
> 39388) in 12043 ms on xyz (1/16)
> 16/11/09 23:10:31 INFO TaskSetManager: Finished task 0.0 in stage 154.0 (TID 
> 39375) in 13444 ms on xyz (2/16)
> 16/11/09 23:10:44 WARN TaskSetManager: Lost task 1.0 in stage 151.0 (TID 
> 38843, xyz): java.util.ConcurrentModificationException: KafkaConsumer is not 
> safe for multi-threaded access
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:1431)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:929)
> 

Re: [DISCUSS] KIP-128: Add ByteArrayConverter for Kafka Connect

2017-03-01 Thread Ewen Cheslack-Postava
Guozhang,

I'm fine w/ adjusting if people want to, but it ends up being more code
since we also need to convert SerializationExceptions to DataExceptions and
the only thing the toConnectData method even does is specific to Connect
(adding the SchemaAndValue).

Gwen -- isn't that an SMT? ExtractField?

-Ewen

On Wed, Mar 1, 2017 at 1:58 PM, Gwen Shapira  wrote:

> Hi Ewen,
>
> Thanks for the KIP, I think it will be useful :)
>
> I'm just wondering if we can add support not just for bytes schema,
> but also for a struct that contains bytes? I'm thinking of the
> scenario of using a connector to grab BLOBs out of a DB - I think you
> end up with this structure if you use a JDBC connector and custom
> query...
>
> Maybe even support Maps with generic objects using Java's default
> serialization? I'm not sure if this is useful.
>
> Gwen
>
>
>
> On Sat, Feb 25, 2017 at 2:25 PM, Ewen Cheslack-Postava
>  wrote:
> > Hi all,
> >
> > I've added a pretty trivial KIP for adding a pass-through Converter for
> > Kafka Connect, similar to ByteArraySerializer/Deserializer.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 128%3A+Add+ByteArrayConverter+for+Kafka+Connect
> >
> > This wasn't added with the framework originally because the idea was to
> > deal with structured data for the most part. However, we've seen a couple
> > of use cases arise as the framework got more traction and I think it
> makes
> > sense to provide this out of the box now so people stop reinventing the
> > wheel (and using a different fully-qualified class name) for each
> connector
> > that needs this functionality.
> >
> > I imagine this will be a rather uncontroversial addition, so if I don't
> see
> > any comments in the next day or two I'll just start the vote thread.
> >
> > -Ewen
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-03-01 Thread Sriram Subramanian
+1

Thanks Ismael for volunteering to run the next time based release.

On Wed, Mar 1, 2017 at 4:51 PM, Ismael Juma  wrote:

> Thanks everyone for the feedback. Since everyone was in favour and we had
> covered this particular case in the time-based release plan[1], I went
> ahead and updated the wiki page to specify the next version as 0.11.0.0.
> I'll update JIRA versions and the KIP page soon.
>
> Ismael
>
> [1] "We change the message format, in which case we bump to 0.11.x" in
> https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan
>
> On Tue, Feb 28, 2017 at 3:47 AM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > With 0.10.2.0 out of the way, I would like to volunteer to be the release
> > manager for our next time-based release. See https://cwiki.apache.org/c
> > onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> > communication on time-based releases or need a reminder.
> >
> > I put together a draft release plan with June 2017 as the release month
> > (as previously agreed) and a list of KIPs that have already been voted:
> >
> > *https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68716876
> >  action?pageId=68716876>*
> >
> > I haven't set exact dates for the various stages (feature freeze, code
> > freeze, etc.) for now as Ewen is going to send out an email with some
> > suggested tweaks based on his experience as release manager for 0.10.2.0.
> > We can set the exact dates after that discussion.
> >
> > As we are starting the process early this time, we should expect the
> > number of KIPs in the plan to grow (so don't worry if your KIP is not
> there
> > yet), but it's good to see that we already have 10 (including 2 merged
> and
> > 2 with PR reviews in progress).
> >
> > Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and
> KIP-101
> > (Leader Generation in Replication) require message format changes, which
> > typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> > it makes sense to also include KIP-106 (Unclean leader election should be
> > false by default) and KIP-118 (Drop support for Java 7). We would also
> take
> > the chance to remove deprecated code, in that case.
> >
> > Given the above, how do people feel about 0.11.0.0 as the next Kafka
> > version? Please share your thoughts.
> >
> > Thanks,
> > Ismael
> >
>


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-03-01 Thread Ismael Juma
Thanks everyone for the feedback. Since everyone was in favour and we had
covered this particular case in the time-based release plan[1], I went
ahead and updated the wiki page to specify the next version as 0.11.0.0.
I'll update JIRA versions and the KIP page soon.

Ismael

[1] "We change the message format, in which case we bump to 0.11.x" in
https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan

On Tue, Feb 28, 2017 at 3:47 AM, Ismael Juma  wrote:

> Hi all,
>
> With 0.10.2.0 out of the way, I would like to volunteer to be the release
> manager for our next time-based release. See https://cwiki.apache.org/c
> onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> communication on time-based releases or need a reminder.
>
> I put together a draft release plan with June 2017 as the release month
> (as previously agreed) and a list of KIPs that have already been voted:
>
> *https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68716876
> *
>
> I haven't set exact dates for the various stages (feature freeze, code
> freeze, etc.) for now as Ewen is going to send out an email with some
> suggested tweaks based on his experience as release manager for 0.10.2.0.
> We can set the exact dates after that discussion.
>
> As we are starting the process early this time, we should expect the
> number of KIPs in the plan to grow (so don't worry if your KIP is not there
> yet), but it's good to see that we already have 10 (including 2 merged and
> 2 with PR reviews in progress).
>
> Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and KIP-101
> (Leader Generation in Replication) require message format changes, which
> typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> it makes sense to also include KIP-106 (Unclean leader election should be
> false by default) and KIP-118 (Drop support for Java 7). We would also take
> the chance to remove deprecated code, in that case.
>
> Given the above, how do people feel about 0.11.0.0 as the next Kafka
> version? Please share your thoughts.
>
> Thanks,
> Ismael
>


Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Apurva Mehta
+1 (non-binding)

On Wed, Mar 1, 2017 at 4:41 PM, Ewen Cheslack-Postava 
wrote:

> +1 (binding)
>
> -Ewen
>
> On Wed, Mar 1, 2017 at 4:56 AM, Jozef.koval 
> wrote:
>
> > +1 (non-binding)
> >
> > Jozef
> >
> > P.S. I volunteer to help with this KIP.
> >
> >
> > Sent from [ProtonMail](https://protonmail.ch), encrypted email based in
> > Switzerland.
> >
> >
> >
> >  Original Message 
> > Subject: Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11
> > Local Time: March 1, 2017 1:46 PM
> > UTC Time: March 1, 2017 12:46 PM
> > From: bbej...@gmail.com
> > To: dev@kafka.apache.org
> >
> > +1
> >
> > Thanks,
> > Bill
> >
> > On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska 
> > wrote:
> >
> > > +1, thanks.
> > >
> > > Eno
> > > > On 28 Feb 2017, at 17:17, Guozhang Wang  wrote:
> > > >
> > > > +1
> > > >
> > > > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford 
> > > wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint  >
> > > >> wrote:
> > > >>
> > > >>> +1
> > > >>>
> > > >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> > > >>>
> > > 
> > > 
> > >  +1.
> > > 
> > > 
> > > 
> > >  Best,
> > > 
> > >  Dongjin
> > > 
> > > 
> > >  --
> > > 
> > > 
> > > 
> > > 
> > >  Dongjin Lee
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  Software developer in Line+.
> > > 
> > >  So interested in massive-scale machine learning.
> > > 
> > > 
> > > 
> > >  facebook: www.facebook.com/dongjin.lee.kr (
> http://www.facebook.com/
> > >  dongjin.lee.kr)
> > > 
> > >  linkedin: kr.linkedin.com/in/dongjinleekr (
> > > >> http://kr.linkedin.com/in/
> > >  dongjinleekr)
> > > 
> > > 
> > >  github: (http://goog_969573159/)github.com/dongjinleekr (
> > >  http://github.com/dongjinleekr)
> > > 
> > >  twitter: www.twitter.com/dongjinleekr (http://www.twitter.com/
> > >  dongjinleekr)
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > >
> > > > On Feb 28, 2017 at 6:40 PM,  ism...@juma.me.uk
> > > >> )>
> > >  wrote:
> > > >
> > > >
> > > >
> > > > Hi everyone,
> > > >
> > > > Since the few who responded in the discuss thread were in favour
> > and
> > >  there
> > > > were no objections, I would like to initiate the voting process
> for
> > > > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> > > >
> > > > The vote will run for a minimum of 72 hours.
> > > >
> > > > Thanks,
> > > > Ismael
> > > >
> > > 
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> > >
> >
>


Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Ewen Cheslack-Postava
+1 (binding)

-Ewen

On Wed, Mar 1, 2017 at 4:56 AM, Jozef.koval 
wrote:

> +1 (non-binding)
>
> Jozef
>
> P.S. I volunteer to help with this KIP.
>
>
> Sent from [ProtonMail](https://protonmail.ch), encrypted email based in
> Switzerland.
>
>
>
>  Original Message 
> Subject: Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11
> Local Time: March 1, 2017 1:46 PM
> UTC Time: March 1, 2017 12:46 PM
> From: bbej...@gmail.com
> To: dev@kafka.apache.org
>
> +1
>
> Thanks,
> Bill
>
> On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska 
> wrote:
>
> > +1, thanks.
> >
> > Eno
> > > On 28 Feb 2017, at 17:17, Guozhang Wang  wrote:
> > >
> > > +1
> > >
> > > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford 
> > wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint 
> > >> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> > >>>
> > 
> > 
> >  +1.
> > 
> > 
> > 
> >  Best,
> > 
> >  Dongjin
> > 
> > 
> >  --
> > 
> > 
> > 
> > 
> >  Dongjin Lee
> > 
> > 
> > 
> > 
> > 
> >  Software developer in Line+.
> > 
> >  So interested in massive-scale machine learning.
> > 
> > 
> > 
> >  facebook: www.facebook.com/dongjin.lee.kr (http://www.facebook.com/
> >  dongjin.lee.kr)
> > 
> >  linkedin: kr.linkedin.com/in/dongjinleekr (
> > >> http://kr.linkedin.com/in/
> >  dongjinleekr)
> > 
> > 
> >  github: (http://goog_969573159/)github.com/dongjinleekr (
> >  http://github.com/dongjinleekr)
> > 
> >  twitter: www.twitter.com/dongjinleekr (http://www.twitter.com/
> >  dongjinleekr)
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > >
> > > On Feb 28, 2017 at 6:40 PM, mailto:ism...@juma.me.uk
> > >> )>
> >  wrote:
> > >
> > >
> > >
> > > Hi everyone,
> > >
> > > Since the few who responded in the discuss thread were in favour
> and
> >  there
> > > were no objections, I would like to initiate the voting process for
> > > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> > >
> > > The vote will run for a minimum of 72 hours.
> > >
> > > Thanks,
> > > Ismael
> > >
> > 
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
> >
>


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-03-01 Thread mjuarez (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891355#comment-15891355
 ] 

mjuarez commented on KAFKA-2729:


We are also running into this problem in our staging cluster, running Kafka 
0.10.0.1.  Basically it looks like this happened yesterday: 

{noformat}
[2017-02-28 18:41:33,513] INFO Client session timed out, have not heard from 
server in 7799ms for sessionid 0x159d7893eab0088, closing socket connection and 
attempting reconnect (org.apache.zookeeper.ClientCnxn)
{noformat}

I'm attributing that to a transient network issue, since we haven't seen any 
other issues.  And less than a minute later, we started seeing these errors:

{noformat}
[2017-02-28 18:42:45,739] INFO Partition 
[analyticsInfrastructure_KafkaAvroUserMessage,16] on broker 101: Shrinking ISR 
for partition [analyticsInfrastructure_KafkaAvroUserMessage,16] from 
102,101,105 to 101 (kaf
[2017-02-28 18:42:45,751] INFO Partition 
[analyticsInfrastructure_KafkaAvroUserMessage,16] on broker 101: Cached 
zkVersion [94] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-02-28 18:42:45,751] INFO Partition 
[qa_exporter11_slingshot_salesforce_invoice,6] on broker 101: Shrinking ISR for 
partition [qa_exporter11_slingshot_salesforce_invoice,6] from 101,105,104 to 
101 (kafka.clu
[2017-02-28 18:42:45,756] INFO Partition 
[qa_exporter11_slingshot_salesforce_invoice,6] on broker 101: Cached zkVersion 
[237] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-02-28 18:42:45,756] INFO Partition [GNRDEV_counters_singleCount,2] on 
broker 101: Shrinking ISR for partition [GNRDEV_counters_singleCount,2] from 
101,105,104 to 101 (kafka.cluster.Partition)
[2017-02-28 18:42:45,761] INFO Partition [GNRDEV_counters_singleCount,2] on 
broker 101: Cached zkVersion [334] not equal to that in zookeeper, skip 
updating ISR (kafka.cluster.Partition)
[2017-02-28 18:42:45,761] INFO Partition [sod-spins-spark-local,1] on broker 
101: Shrinking ISR for partition [sod-spins-spark-local,1] from 101,103,104 to 
101 (kafka.cluster.Partition)
[2017-02-28 18:42:45,764] INFO Partition [sod-spins-spark-local,1] on broker 
101: Cached zkVersion [379] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2017-02-28 18:42:45,764] INFO Partition [sod-spins-spark-local,11] on broker 
101: Shrinking ISR for partition [sod-spins-spark-local,11] from 102,101,105 to 
101 (kafka.cluster.Partition)
[2017-02-28 18:42:45,767] INFO Partition [sod-spins-spark-local,11] on broker 
101: Cached zkVersion [237] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
{noformat}

The "current" server is 101.  So it thinks it's the leader for basically every 
partition on that node, but it's refusing to update the ISRs, because the 
cached zkversion doesn't match the one in zookeeper.  This is causing 
permanently under-replicated partitions, because server doesn't ever catch up, 
since it doesn't think there's a problem.  Also, the metadata reported by the 
101 server to consumers indicates it thinks it's part of the ISR, but every 
other broker doesn't think so.

Let me know if more logs/details would be helpful.  I'll try to fix this by 
restarting the node, and hopefully it fixes the issue.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-03-01 Thread radai
quick comment on the request objects:

i see "abstract class NewTopic" and "class NewTopicWithReplication" and "
NewTopicWithReplicaAssignments"

1. since the result object is called CreateTopicResults should these be
called *Request?
2. this seems like a suboptimal approach to me. imagine we add a
NewTopicWithSecurity, and then we would need
NewTopicWithReplicationAndSecurity? (or any composable "traits"). this wont
really scale. Wouldnt it be better to have a single (rather complicated)
CreateTopicRequest, and use a builder pattern to deal with the compexity
and options? like so:

CreateTopicRequest req =
AdminRequests.newTopic("bob").replicationFactor(2).withPartitionAssignment(1,
"boker7", "broker10").withOption(...).build();

the builder would validate any potentially conflicting options and would
allow piling on the complexity in a more manageable way (note - my code
above intends to demonstrate both a general replication factor and a
specific assignment for a partiocular partition of that topic, which may be
too much freedom).

On Wed, Mar 1, 2017 at 1:58 PM, Colin McCabe  wrote:

> Hi all,
>
> Thanks for commenting, everyone.  Does anyone have more questions or
> comments, or should we vote?  The latest proposal is up at
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+
> AdminClient+API+for+Kafka+admin+operations
>
> best,
> Colin
>
>
> On Thu, Feb 16, 2017, at 15:00, Colin McCabe wrote:
> > On Thu, Feb 16, 2017, at 14:11, Dong Lin wrote:
> > > Hey Colin,
> > >
> > > Thanks for the update. I have two comments:
> > >
> > > - I actually think it is simpler and good enough to have per-topic API
> > > instead of batch-of-topic API. This is different from the argument for
> > > batch-of-partition API because, unlike operation on topic, people
> usually
> > > operate on multiple partitions (e.g. seek(), purge()) at a time. Is
> there
> > > performance concern with per-topic API? I am wondering if we should do
> > > per-topic API until there is use-case or performance benefits of
> > > batch-of-topic API.
> >
> > Yes, there is a performance concern with only supporting operations on
> > one topic at a time.  Jay expressed this in some of his earlier emails
> > and some other people did as well.  We have cases in mind for management
> > software where many topics are created at once.
> >
> > >
> > > - Currently we have interface "Consumer" and "Producer". And we also
> have
> > > implementations of these two interfaces as "KafkaConsumer" and
> > > "KafkaProducer". If we follow the same naming pattern, should we have
> > > interface "AdminClient" and the implementation "KafkaAdminClient",
> > > instead
> > > of the other way around?
> >
> > That's a good point.  We should do that for consistency.
> >
> > best,
> > Colin
> >
> > >
> > > Dong
> > >
> > >
> > > On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > So I think people have made some very good points so far.  There
> seems
> > > > to be agreement that we need to have explicit batch APIs for the
> sake of
> > > > efficiency, so I added that back.
> > > >
> > > > Contexts seem a little more complex than we thought, so I removed
> that
> > > > from the proposal.
> > > >
> > > > I removed the Impl class.  Instead, we now have a KafkaAdminClient
> > > > interface and an AdminClient implementation.  I think this matches
> our
> > > > other code better, as Jay commented.
> > > >
> > > > Each call now has an "Options" object that is passed in.  This will
> > > > allow us to easily add new parameters to the calls without having
> tons
> > > > of function overloads.  Similarly, each call now has a Results
> object,
> > > > which will let us easily extend the results we are returning if
> needed.
> > > >
> > > > Many people made the point that Java 7 Futures are not that useful,
> but
> > > > Java 8 CompletableFutures are.  With CompletableFutures, you can
> chain
> > > > calls, adapt them, join them-- basically all the stuff people are
> doing
> > > > in Node.js and Twisted Python.  Java 7 Futures don't really let you
> do
> > > > anything but poll for a value or block.  So I felt that it was
> better to
> > > > just go with a CompletableFuture-based API.
> > > >
> > > > People also made the point that they would like an easy API for
> waiting
> > > > on complete success of a batch call.  For example, an API that would
> > > > fail if even one topic wasn't created in createTopics.  So I came up
> > > > with Result objects that provide multiple futures that you can wait
> on.
> > > > You can wait on a future that fires when everything is complete, or
> you
> > > > can wait on futures for individual topics being created.
> > > >
> > > > I updated the wiki, so please take a look.  Note that this new API
> > > > requires JDK8.  It seems like JDK8 is coming soon, though, and the
> > > > disadvantages of sticking to Java 7 are pretty big here, I think.
> > > >
> > > > best,
> > > > Colin
> >

Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-01 Thread Florian Hussonnois
Hi Matthias,

First, I will answer to your last question.

The main reason to have both TaskState#assignment and
TaskState#consumedOffsetsByPartition is that tasks have no consumed offsets
until at least one message is consumed for each partition even if previous
offsets exist for the consumer group.
So yes this methods are redundant as it only diverge at application startup.

About the use case, currently we are developping for a customer a little
framework based on KafkaStreams which transform/denormalize data before
ingesting into hadoop.

We have a cluster of workers (SpringBoot) which instantiate KStreams
topologies dynamicaly based on dataflow configurations.
Each configuration describes a topic to consume and how to process messages
(this looks like NiFi processors API).

Our architecture is inspired from KafkaConnect. We have topics for configs
and states which are consumed by each workers (actually we have reused some
internals classes to the connect API).

Now, we would like to develop UIs to visualize topics and partitions
consumed by our worker applications.

Also, I think it would be nice to be able,  in the futur, to develop web
UIs similar to Spark but for KafkaStreams to visualize DAGs...so maybe this
KIP is just a first step.

Thanks,

2017-03-01 22:52 GMT+01:00 Matthias J. Sax :

> Thanks for the KIP.
>
> I am wondering a little bit, why you need to expose this information.
> Can you describe some use cases?
>
> Would it be worth to unify this new API with KafkaStreams#state() to get
> the overall state of an application without the need to call two
> different methods? Not sure how this unified API might look like though.
>
>
> One minor comment about the API: TaskState#assignment seems to be
> redundant. It should be the same as
> TaskState#consumedOffsetsByPartition.keySet()
>
> Or do I miss something?
>
>
> -Matthias
>
> On 3/1/17 5:19 AM, Florian Hussonnois wrote:
> > Hi Eno,
> >
> > Yes, but the state() method only returns the global state of the
> > KafkaStream application (ie: CREATED, RUNNING, REBALANCING,
> > PENDING_SHUTDOWN, NOT_RUNNING).
> >
> > An alternative to this KIP would be to change this method to return more
> > information instead of adding a new method.
> >
> > 2017-03-01 13:46 GMT+01:00 Eno Thereska :
> >
> >> Thanks Florian,
> >>
> >> Have you had a chance to look at the new state methods in 0.10.2, e.g.,
> >> KafkaStreams.state()?
> >>
> >> Thanks
> >> Eno
> >>> On 1 Mar 2017, at 11:54, Florian Hussonnois 
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I have just created KIP-130 to add a new method to the KafkaStreams API
> >> in
> >>> order to expose the states of threads and active tasks.
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> >> 130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Florian HUSSONNOIS
> >>
> >>
> >
> >
>
>


-- 
Florian HUSSONNOIS


[GitHub] kafka pull request #2603: MINOR: Minor reduce unnecessary calls to time.mill...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2603


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-01 Thread Colin McCabe
That makes sense.  I didn't see that this field already existed in some
of the replies-- good clarification.

best,


On Wed, Mar 1, 2017, at 05:41, Rajini Sivaram wrote:
> Colin,
> 
> Thank you for the feedback. Since we are reusing the existing
> throttle_time_ms field for produce/fetch responses, changing this to
> microseconds would be a breaking change. Since we don't currently plan to
> throttle at sub-millisecond intervals, perhaps it makes sense to keep the
> value consistent with the existing responses (and metrics which expose
> this
> value) and change them all together in future if required?
> 
> Regards,
> 
> Rajini
> 
> On Tue, Feb 28, 2017 at 5:58 PM, Colin McCabe  wrote:
> 
> > I noticed that the throttle_time_ms added to all the message responses
> > is in milliseconds.  Does it make sense to express this in microseconds
> > in case we start doing more fine-grained CPU throttling later on?  An
> > int32 should still be more than enough if using microseconds.
> >
> > best,
> > Colin
> >
> >
> > On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote:
> > > Hi, Jay,
> > >
> > > 2. Regarding request.unit vs request.percentage. I started with
> > > request.percentage too. The reasoning for request.unit is the following.
> > > Suppose that the capacity has been reached on a broker and the admin
> > > needs
> > > to add a new user. A simple way to increase the capacity is to increase
> > > the
> > > number of io threads, assuming there are still enough cores. If the limit
> > > is based on percentage, the additional capacity automatically gets
> > > distributed to existing users and we haven't really carved out any
> > > additional resource for the new user. Now, is it easy for a user to
> > > reason
> > > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > > configured empirically. Not sure if percentage is obviously easier to
> > > reason about.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> > >
> > > > A couple of quick points:
> > > >
> > > > 1. Even though the implementation of this quota is only using io thread
> > > > time, i think we should call it something like "request-time". This
> > will
> > > > give us flexibility to improve the implementation to cover network
> > threads
> > > > in the future and will avoid exposing internal details like our thread
> > > > pools on the server.
> > > >
> > > > 2. Jun/Roger, I get what you are trying to fix but the idea of
> > thread/units
> > > > is super unintuitive as a user-facing knob. I had to read the KIP like
> > > > eight times to understand this. I'm not sure that your point that
> > > > increasing the number of threads is a problem with a percentage-based
> > > > value, it really depends on whether the user thinks about the
> > "percentage
> > > > of request processing time" or "thread units". If they think "I have
> > > > allocated 10% of my request processing time to user x" then it is a bug
> > > > that increasing the thread count decreases that percent as it does in
> > the
> > > > current proposal. As a practical matter I think the only way to
> > actually
> > > > reason about this is as a percent---I just don't believe people are
> > going
> > > > to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> > > > think they have to understand this thread unit concept, figure out what
> > > > they have set in number of threads, compute a percent and then come up
> > with
> > > > the number of thread units, and these will all be wrong if that thread
> > > > count changes. I also think this ties us to throttling the I/O thread
> > pool,
> > > > which may not be where we want to end up.
> > > >
> > > > 3. For what it's worth I do think having a single throttle_ms field in
> > all
> > > > the responses that combines all throttling from all quotas is probably
> > the
> > > > simplest. There could be a use case for having separate fields for
> > each,
> > > > but I think that is actually harder to use/monitor in the common case
> > so
> > > > unless someone has a use case I think just one should be fine.
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > I have updated the KIP based on the discussions so far.
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you all for the feedback.
> > > > > >
> > > > > > Ismael #1. It makes sense not to throttle inter-broker requests
> > like
> > > > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot
> > use
> > > > > these
> > > > > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> > > > prevent
> > > > > > clients from using these requests and unauthorized requests are
> > > > included
> > > > > > towards quotas.
> > > > > >
> > 

[jira] [Commented] (KAFKA-4828) ProcessorTopologyTestDriver does not work when using .through()

2017-03-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891151#comment-15891151
 ] 

Matthias J. Sax commented on KAFKA-4828:


I am not sure if the JIRA title and error message align... Do you know about 
this FAQ: 
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whatdoesexception"Store'schangelog(-changelog)doesnotcontainpartition"mean?

Does this help?

> ProcessorTopologyTestDriver does not work when using .through()
> ---
>
> Key: KAFKA-4828
> URL: https://issues.apache.org/jira/browse/KAFKA-4828
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Hamidreza Afzali
>Assignee: Hamidreza Afzali
>  Labels: unit-test
>
> *Problem:*
> ProcessorTopologyTestDriver does not work when testing a topology that uses 
> through().
> {code}
> org.apache.kafka.streams.errors.StreamsException: Store count2's change log 
> (count2-topic) does not contain partition 1
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:81)
> {code}
> *Example:*
> {code}
> object Topology1 {
>   def main(args: Array[String]): Unit = {
> val inputTopic = "input"
> val stateStore = "count"
> val stateStore2 = "count2"
> val outputTopic2 = "count2-topic"
> val inputs = Seq[(String, Integer)](("A", 1), ("A", 2))
> val props = new Properties
> props.put(StreamsConfig.APPLICATION_ID_CONFIG, UUID.randomUUID().toString)
> props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
> val builder = new KStreamBuilder
> builder.stream(Serdes.String, Serdes.Integer, inputTopic)
>   .groupByKey(Serdes.String, Serdes.Integer)
>   .count(stateStore)
>   .through(Serdes.String, Serdes.Long, outputTopic2, stateStore2)
> val driver = new ProcessorTopologyTestDriver(new StreamsConfig(props), 
> builder, stateStore, stateStore2)
> inputs.foreach {
>   case (key, value) => {
> driver.process(inputTopic, key, value, Serdes.String.serializer, 
> Serdes.Integer.serializer)
> val record = driver.readOutput(outputTopic2, 
> Serdes.String.deserializer, Serdes.Long.deserializer)
> println(record)
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-01 Thread Dong Lin
Hey Jun,

Do you think it is OK to keep the existing wire protocol in the KIP? I am
wondering if we can initiate vote for this KIP.

Thanks,
Dong



On Tue, Feb 28, 2017 at 2:41 PM, Dong Lin  wrote:

> Hey Jun,
>
> I just realized that StopReplicaRequest itself doesn't specify the
> replicaId in the wire protocol. Thus controller would need to log the
> brokerId with StopReplicaRequest in the log. Thus it may be
> reasonable for controller to do the same with LeaderAndIsrRequest and only
> specify the isNewReplica for the broker that receives LeaderAndIsrRequest.
>
> Thanks,
> Dong
>
> On Tue, Feb 28, 2017 at 2:14 PM, Dong Lin  wrote:
>
>> Hi Jun,
>>
>> Yeah there is tradeoff between controller's implementation complexity vs.
>> wire-protocol complexity. I personally think it is more important to keep
>> wire-protocol concise and only add information in wire-protocol if
>> necessary. It seems fine to add a little bit complexity to controller's
>> implementation, e.g. log destination broker per LeaderAndIsrRequet. Becket
>> also shares this opinion with me. Is the only purpose of doing so to make
>> controller log simpler?
>>
>> And certainly, I have added Todd's comment in the wiki.
>>
>> Thanks,
>> Dong
>>
>>
>> On Tue, Feb 28, 2017 at 1:37 PM, Jun Rao  wrote:
>>
>>> Hi, Dong,
>>>
>>> 52. What you suggested would work. However, I am thinking that it's
>>> probably simpler to just set isNewReplica at the replica level. That way,
>>> the LeaderAndIsrRequest can be created a bit simpler. When reading a
>>> LeaderAndIsrRequest in the controller log, it's easier to see which
>>> replicas are new without looking at which broker the request is intended
>>> for.
>>>
>>> Could you also add those additional points from Todd's on 1 broker per
>>> disk
>>> vs JBOD vs RAID5/6 to the KIP?
>>>
>>> Thanks,
>>>
>>> Hi, Todd,
>>>
>>> Thanks for the feedback. That's very useful.
>>>
>>> Jun
>>>
>>> On Tue, Feb 28, 2017 at 10:25 AM, Dong Lin  wrote:
>>>
>>> > Hey Jun,
>>> >
>>> > Certainly, I have added Todd to reply to the thread. And I have
>>> updated the
>>> > item to in the wiki.
>>> >
>>> > 50. The full statement is "Broker assumes a log directory to be good
>>> after
>>> > it starts, and mark log directory as bad once there is IOException when
>>> > broker attempts to access (i.e. read or write) the log directory". This
>>> > statement seems reasonable, right? If a log directory is actually bad,
>>> then
>>> > the broker will first assume it is OK, try to read logs on this log
>>> > directory, encounter IOException, and then mark it as bad.
>>> >
>>> > 51. My bad. I thought I removed it but I didn't. It is removed now.
>>> >
>>> > 52. I don't think so.. The isNewReplica field in the
>>> LeaderAndIsrRequest is
>>> > only relevant to the replica (i.e. broker) that receives the
>>> > LeaderAndIsrRequest. There is no need to specify whether each replica
>>> is
>>> > new inside LeaderAndIsrRequest. In other words, if a broker sends
>>> > LeaderAndIsrRequest to three different replicas of a given partition,
>>> the
>>> > isNewReplica field can be different across these three requests.
>>> >
>>> > Yeah, I would definitely want to start discussion on KIP-113 after we
>>> have
>>> > reached agreement on KIP-112. I have actually opened KIP-113 discussion
>>> > thread on 1/12 together with this thread. I have yet to add the
>>> ability to
>>> > list offline directories in KIP-113 which we discussed in this thread.
>>> >
>>> > Thanks for all your reviews! Is there further concern with the latest
>>> KIP?
>>> >
>>> > Thanks!
>>> > Dong
>>> >
>>> > On Tue, Feb 28, 2017 at 9:40 AM, Jun Rao  wrote:
>>> >
>>> > > Hi, Dong,
>>> > >
>>> > > RAID6 is an improvement over RAID5 and can tolerate 2 disks failure.
>>> > Eno's
>>> > > point is that the rebuild of RAID5/RAID6 requires reading more data
>>> > > compared with RAID10, which increases the probability of error during
>>> > > rebuild. This makes sense. In any case, do you think you could ask
>>> the
>>> > SREs
>>> > > at LinkedIn to share their opinions on RAID5/RAID6?
>>> > >
>>> > > Yes, when a replica is offline due to a bad disk, it makes sense to
>>> > handle
>>> > > it immediately as if a StopReplicaRequest is received (i.e., replica
>>> is
>>> > no
>>> > > longer considered a leader and is removed from any replica fetcher
>>> > thread).
>>> > > Could you add that detail in item 2. in the wiki?
>>> > >
>>> > > 50. The wiki says "Broker assumes a log directory to be good after it
>>> > > starts" : A log directory actually could be bad during startup.
>>> > >
>>> > > 51. In item 4, the wiki says "The controller watches the path
>>> > > /log_dir_event_notification for new znode.". This doesn't seem be
>>> needed
>>> > > now?
>>> > >
>>> > > 52. The isNewReplica field in LeaderAndIsrRequest should be for each
>>> > > replica inside the replicas field, right?
>>> > >
>>> > > Other than those, the current KIP looks good to me. Do you want to
>>> start
>>> > a
>>> > > separate discuss

Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-03-01 Thread Colin McCabe
Hi all,

Thanks for commenting, everyone.  Does anyone have more questions or
comments, or should we vote?  The latest proposal is up at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations

best,
Colin


On Thu, Feb 16, 2017, at 15:00, Colin McCabe wrote:
> On Thu, Feb 16, 2017, at 14:11, Dong Lin wrote:
> > Hey Colin,
> > 
> > Thanks for the update. I have two comments:
> > 
> > - I actually think it is simpler and good enough to have per-topic API
> > instead of batch-of-topic API. This is different from the argument for
> > batch-of-partition API because, unlike operation on topic, people usually
> > operate on multiple partitions (e.g. seek(), purge()) at a time. Is there
> > performance concern with per-topic API? I am wondering if we should do
> > per-topic API until there is use-case or performance benefits of
> > batch-of-topic API.
> 
> Yes, there is a performance concern with only supporting operations on
> one topic at a time.  Jay expressed this in some of his earlier emails
> and some other people did as well.  We have cases in mind for management
> software where many topics are created at once.
> 
> > 
> > - Currently we have interface "Consumer" and "Producer". And we also have
> > implementations of these two interfaces as "KafkaConsumer" and
> > "KafkaProducer". If we follow the same naming pattern, should we have
> > interface "AdminClient" and the implementation "KafkaAdminClient",
> > instead
> > of the other way around?
> 
> That's a good point.  We should do that for consistency.
> 
> best,
> Colin
> 
> > 
> > Dong
> > 
> > 
> > On Thu, Feb 16, 2017 at 10:19 AM, Colin McCabe 
> > wrote:
> > 
> > > Hi all,
> > >
> > > So I think people have made some very good points so far.  There seems
> > > to be agreement that we need to have explicit batch APIs for the sake of
> > > efficiency, so I added that back.
> > >
> > > Contexts seem a little more complex than we thought, so I removed that
> > > from the proposal.
> > >
> > > I removed the Impl class.  Instead, we now have a KafkaAdminClient
> > > interface and an AdminClient implementation.  I think this matches our
> > > other code better, as Jay commented.
> > >
> > > Each call now has an "Options" object that is passed in.  This will
> > > allow us to easily add new parameters to the calls without having tons
> > > of function overloads.  Similarly, each call now has a Results object,
> > > which will let us easily extend the results we are returning if needed.
> > >
> > > Many people made the point that Java 7 Futures are not that useful, but
> > > Java 8 CompletableFutures are.  With CompletableFutures, you can chain
> > > calls, adapt them, join them-- basically all the stuff people are doing
> > > in Node.js and Twisted Python.  Java 7 Futures don't really let you do
> > > anything but poll for a value or block.  So I felt that it was better to
> > > just go with a CompletableFuture-based API.
> > >
> > > People also made the point that they would like an easy API for waiting
> > > on complete success of a batch call.  For example, an API that would
> > > fail if even one topic wasn't created in createTopics.  So I came up
> > > with Result objects that provide multiple futures that you can wait on.
> > > You can wait on a future that fires when everything is complete, or you
> > > can wait on futures for individual topics being created.
> > >
> > > I updated the wiki, so please take a look.  Note that this new API
> > > requires JDK8.  It seems like JDK8 is coming soon, though, and the
> > > disadvantages of sticking to Java 7 are pretty big here, I think.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Mon, Feb 13, 2017, at 11:51, Colin McCabe wrote:
> > > > On Sun, Feb 12, 2017, at 09:21, Jay Kreps wrote:
> > > > > Hey Colin,
> > > > >
> > > > > Thanks for the hard work on this. I know going back and forth on APIs
> > > is
> > > > > kind of frustrating but we're at the point where these things live 
> > > > > long
> > > > > enough and are used by enough people that it is worth the pain. I'm
> > > sure
> > > > > it'll come down in the right place eventually. A couple things I've
> > > found
> > > > > helped in the past:
> > > > >
> > > > >1. The burden of evidence needs to fall on the complicator. i.e. if
> > > > >person X thinks the api should be async they need to produce a set
> > > of
> > > > >common use cases that require this. Otherwise you are perpetually
> > > > >having to
> > > > >think "we might need x". I think it is good to have a rule of
> > > "simple
> > > > >until
> > > > >proven insufficient".
> > > > >2. Make sure we frame things for the intended audience. At this
> > > point
> > > > >our apis get used by a very broad set of Java engineers. This is a
> > > > >very
> > > > >different audience from our developer mailing list. These people
> > > code
> > > > >for a
> > > > >living not necessarily 

Re: [DISCUSS] KIP-128: Add ByteArrayConverter for Kafka Connect

2017-03-01 Thread Gwen Shapira
Hi Ewen,

Thanks for the KIP, I think it will be useful :)

I'm just wondering if we can add support not just for bytes schema,
but also for a struct that contains bytes? I'm thinking of the
scenario of using a connector to grab BLOBs out of a DB - I think you
end up with this structure if you use a JDBC connector and custom
query...

Maybe even support Maps with generic objects using Java's default
serialization? I'm not sure if this is useful.

Gwen



On Sat, Feb 25, 2017 at 2:25 PM, Ewen Cheslack-Postava
 wrote:
> Hi all,
>
> I've added a pretty trivial KIP for adding a pass-through Converter for
> Kafka Connect, similar to ByteArraySerializer/Deserializer.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-128%3A+Add+ByteArrayConverter+for+Kafka+Connect
>
> This wasn't added with the framework originally because the idea was to
> deal with structured data for the most part. However, we've seen a couple
> of use cases arise as the framework got more traction and I think it makes
> sense to provide this out of the box now so people stop reinventing the
> wheel (and using a different fully-qualified class name) for each connector
> that needs this functionality.
>
> I imagine this will be a rather uncontroversial addition, so if I don't see
> any comments in the next day or two I'll just start the vote thread.
>
> -Ewen



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-03-01 Thread Dong Lin
Hi all,

I have updated the KIP to include a script that allows user to purge data
by providing a map from partition to offset. I think this script may be
convenience and useful, e.g., if user simply wants to purge all data of
given partitions from command line. I am wondering if anyone object this
script or has suggestions on the interface.

Besides, Ismael commented in the pull request that it may be better to
rename PurgeDataBefore() to DeleteDataBefore() and rename PurgeRequest to
DeleteRequest. I think it may be a good idea because kafka-topics.sh
already use "delete" as an option. Personally I don't have strong
preference between "purge" and "delete". I am wondering if anyone object to
this change.

Thanks,
Dong



On Wed, Mar 1, 2017 at 9:46 AM, Dong Lin  wrote:

> Hi Ismael,
>
> I actually mean log_start_offset. I realized that it is a better name
> after I start implementation because "logStartOffset" is already used in
> Log.scala and LogCleanerManager.scala. So I changed it from
> log_begin_offset to log_start_offset in the patch. But I forgot to update
> the KIP and specify it in the mailing thread.
>
> Thanks for catching this. Let me update the KIP to reflect this change.
>
> Dong
>
>
> On Wed, Mar 1, 2017 at 6:15 AM, Ismael Juma  wrote:
>
>> Hi Dong,
>>
>> When you say "logStartOffset", do you mean "log_begin_offset "? I could
>> only find the latter in the KIP. If so, would log_start_offset be a better
>> name?
>>
>> Ismael
>>
>> On Tue, Feb 28, 2017 at 4:26 AM, Dong Lin  wrote:
>>
>> > Hi Jun and everyone,
>> >
>> > I would like to change the KIP in the following way. Currently, if any
>> > replica if offline, the purge result for a partition will
>> > be NotEnoughReplicasException and its low_watermark will be 0. The
>> > motivation for this approach is that we want to guarantee that the data
>> > before purgedOffset has been deleted on all replicas of this partition
>> if
>> > purge result indicates success.
>> >
>> > But this approach seems too conservative. It should be sufficient in
>> most
>> > cases to just tell user success and set low_watermark to minimum
>> > logStartOffset of all live replicas in the PurgeResponse if
>> logStartOffset
>> > of all live replicas have reached purgedOffset. This is because for an
>> > offline replicas to become online and be elected leader, it should have
>> > received one FetchReponse from the current leader which should tell it
>> to
>> > purge beyond purgedOffset. The benefit of doing this change is that we
>> can
>> > allow purge operation to succeed when some replica is offline.
>> >
>> > Are you OK with this change? If so, I will go ahead to update the KIP
>> and
>> > implement this behavior.
>> >
>> > Thanks,
>> > Dong
>> >
>> >
>> >
>> > On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin  wrote:
>> >
>> > > Hey Jun,
>> > >
>> > > Do you have time to review the KIP again or vote for it?
>> > >
>> > > Hey Ewen,
>> > >
>> > > Can you also review the KIP again or vote for it? I have discussed
>> with
>> > > Radai and Becket regarding your concern. We still think putting it in
>> > Admin
>> > > Client seems more intuitive because there is use-case where
>> application
>> > > which manages topic or produces data may also want to purge data. It
>> > seems
>> > > weird if they need to create a consumer to do this.
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> > > On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
>> > > gharatmayures...@gmail.com> wrote:
>> > >
>> > >> +1 (non-binding)
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Mayuresh
>> > >>
>> > >> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin 
>> wrote:
>> > >>
>> > >> > Sorry for the duplicated email. It seems that gmail will put the
>> > voting
>> > >> > email in this thread if I simply replace DISCUSS with VOTE in the
>> > >> subject.
>> > >> >
>> > >> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin 
>> > wrote:
>> > >> >
>> > >> > > Hi all,
>> > >> > >
>> > >> > > It seems that there is no further concern with the KIP-107. At
>> this
>> > >> point
>> > >> > > we would like to start the voting process. The KIP can be found
>> at
>> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107
>> > >> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
>> > >> > >
>> > >> > > Thanks,
>> > >> > > Dong
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> -Regards,
>> > >> Mayuresh R. Gharat
>> > >> (862) 250-7125
>> > >>
>> > >
>> > >
>> >
>>
>
>


Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-01 Thread Matthias J. Sax
Thanks for the KIP.

I am wondering a little bit, why you need to expose this information.
Can you describe some use cases?

Would it be worth to unify this new API with KafkaStreams#state() to get
the overall state of an application without the need to call two
different methods? Not sure how this unified API might look like though.


One minor comment about the API: TaskState#assignment seems to be
redundant. It should be the same as
TaskState#consumedOffsetsByPartition.keySet()

Or do I miss something?


-Matthias

On 3/1/17 5:19 AM, Florian Hussonnois wrote:
> Hi Eno,
> 
> Yes, but the state() method only returns the global state of the
> KafkaStream application (ie: CREATED, RUNNING, REBALANCING,
> PENDING_SHUTDOWN, NOT_RUNNING).
> 
> An alternative to this KIP would be to change this method to return more
> information instead of adding a new method.
> 
> 2017-03-01 13:46 GMT+01:00 Eno Thereska :
> 
>> Thanks Florian,
>>
>> Have you had a chance to look at the new state methods in 0.10.2, e.g.,
>> KafkaStreams.state()?
>>
>> Thanks
>> Eno
>>> On 1 Mar 2017, at 11:54, Florian Hussonnois 
>> wrote:
>>>
>>> Hi all,
>>>
>>> I have just created KIP-130 to add a new method to the KafkaStreams API
>> in
>>> order to expose the states of threads and active tasks.
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
>> 130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
>>>
>>>
>>> Thanks,
>>>
>>> --
>>> Florian HUSSONNOIS
>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Commented] (KAFKA-4706) Unify StreamsKafkaClient instances

2017-03-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891048#comment-15891048
 ] 

Matthias J. Sax commented on KAFKA-4706:


Yes.

> Unify StreamsKafkaClient instances
> --
>
> Key: KAFKA-4706
> URL: https://issues.apache.org/jira/browse/KAFKA-4706
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Assignee: Sharad
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>
> Kafka Streams currently used two instances of {{StreamsKafkaClient}} (one in 
> {{KafkaStreams}} and one in {{InternalTopicManager}}).
> We want to unify both such that only a single instance is used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4677) Avoid unnecessary task movement across threads during rebalance

2017-03-01 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891031#comment-15891031
 ] 

Matthias J. Sax commented on KAFKA-4677:


Streams used it's own partition assigner implementation and does not use the 
Consumer's default partition assigner, because streams has additional 
requirements/restrictions with regard to group management. Thus, any change 
from KIP-54 will not affect Streams as it does not use it.

> Avoid unnecessary task movement across threads during rebalance
> ---
>
> Key: KAFKA-4677
> URL: https://issues.apache.org/jira/browse/KAFKA-4677
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>
> StreamPartitionAssigner tries to follow a sticky assignment policy to avoid 
> expensive task migration. Currently, it does this in a best-effort approach.
> We could observe a case, for which tasks did migrate for no good reason, thus 
> we assume that the current implementation could be improved to be more sticky.
> The concrete scenario is as follows:
> assume we have topology with 3 tasks, A, B, C
> assume we have 3 threads, each executing one task: 1-A, 2-B, 3-C
> for some reason, thread 1 goes down and a rebalance gets triggered
> thread 2 and 3 get their partitions revoked
> sometimes (not sure what the exact condition for this is), the new assignment 
> flips the assignment for task B and C (task A is newly assigned to either 
> thread 2 or 3)
> > possible new assignment 2(A,C) and 3-B
> There is no obvious reason (like load-balancing) why the task assignment for 
> B and C does change to the other thread resulting in unnecessary task 
> migration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-03-01 Thread Becket Qin
Hi Ismael,

Thanks for the reply. Please see the comments inline.

On Wed, Mar 1, 2017 at 6:47 AM, Ismael Juma  wrote:

> Hi Becket,
>
> Thanks for sharing your thoughts. More inline.
>
> On Wed, Mar 1, 2017 at 2:54 AM, Becket Qin  wrote:
>
> > As you can imagine if the ProducerRecord has a value as a List and the
> > Interceptor.onSend() can actually add an element to the List. If the
> > producer.send() is called on the same ProducerRecord again, the value
> list
> > would have one more element than the previously sent ProducerRecord even
> > though the ProducerRecord itself is not mutable, right? Same thing can
> > apply to any modifiable class type.
> >
>
> The difference is that the user chooses the value type. They are free to
> choose a mutable or immutable type. A generic interceptor cannot mutate the
> value because it doesn't know the type (and it could be immutable). One
> could write an interceptor that checked the type of the value at runtime
> and did things based on that. But again, the user who creates the record is
> in control.
>
But there is no generic interceptor, right? The interceptor always takes
specific K, V type.


> From this standpoint allowing headers to be mutable doesn't really weaken
> > the mutability we already have. Admittedly a mutable header is kind of
> > guiding user towards to change the headers in the existing object instead
> > of creating a new one.
>
>
> Yes, with headers, we are providing a model for the user (the user doesn't
> get to choose it like with keys and values) and for the interceptors. So, I
> think it's not the same.


>
> > But I think reusing an object while it is possible
> > to be modified by user code is a risk that users themselves are willing
> to
> > take. And we do not really protect against that.
>
>
> If users want to take that risk, it's fine. But we give them the option to
> protect themselves. With mutable headers, there is no option.

If we want to let the users control the mutability, users can always call
headers.close() before calling producer.send() and that will force the
interceptor to create new ProducerRecord object.

Because the headers are mostly useful for interceptors, unless the users do
not want the interceptors to change their records, it seems reasonable to
say that by default modification of headers are allowed for the
interceptors.

>
>
> > But there still seems
> > value to allow the users to not pay the overhead of creating tons of
> > objects if they do not reuse an object to send it twice, which is
> probably
> > a more common case.
> >
>
> I think the assumption that there would be tons of objects created is
> incorrect (I suggested a solution that would only involve one additional
> reference in the `Header` instance). The usability of the immutable API
> seemed to be more of an issue.
>
If we do not allow the users to add headers on existing ProducerRecord
objects, each interceptor who wants to add headers will have to create a
new ProducerRecord. So we will have to create NUM_INTERCEPTOR times of
ProducerRecord, if a producer is sending 100K messages per second, it would
be a lot of new objects, right?

>
> In any case, if we do add the `close()` method, then we need to add a note
> to the compatibility section stating that once a producer record is sent,
> it cannot be sent again as this would cause interceptors that add headers
> to fail.
>
Agreed, clear documentation is important.

>
> Ismael
>


[GitHub] kafka pull request #2625: KAFKA:4623- Change default unclean.leader.election...

2017-03-01 Thread sharad-develop
Github user sharad-develop closed the pull request at:

https://github.com/apache/kafka/pull/2625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Issue Comment Deleted] (KAFKA-4623) Change Default unclean.leader.election.enabled from True to False

2017-03-01 Thread Sharad (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad updated KAFKA-4623:
--
Comment: was deleted

(was: https://github.com/apache/kafka/pull/2470)

> Change Default unclean.leader.election.enabled from True to False
> -
>
> Key: KAFKA-4623
> URL: https://issues.apache.org/jira/browse/KAFKA-4623
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Sharad
> Fix For: 0.11.0.0
>
>
> See KIP-106
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-106+-+Change+Default+unclean.leader.election.enabled+from+True+to+False



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4623) Change Default unclean.leader.election.enabled from True to False

2017-03-01 Thread Sharad (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891025#comment-15891025
 ] 

Sharad commented on KAFKA-4623:
---

Rebased.
PR submitted:
https://github.com/apache/kafka/pull/2625


> Change Default unclean.leader.election.enabled from True to False
> -
>
> Key: KAFKA-4623
> URL: https://issues.apache.org/jira/browse/KAFKA-4623
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Sharad
> Fix For: 0.11.0.0
>
>
> See KIP-106
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-106+-+Change+Default+unclean.leader.election.enabled+from+True+to+False



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4706) Unify StreamsKafkaClient instances

2017-03-01 Thread Sharad (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890965#comment-15890965
 ] 

Sharad commented on KAFKA-4706:
---

[~mjsax] Need a bit of review of the solution. Is it ok if I use StreamThread  
to share the instance of StreamsKafkaClient for 
KafkaStreams,InternalTopicManager?

> Unify StreamsKafkaClient instances
> --
>
> Key: KAFKA-4706
> URL: https://issues.apache.org/jira/browse/KAFKA-4706
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Assignee: Sharad
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>
> Kafka Streams currently used two instances of {{StreamsKafkaClient}} (one in 
> {{KafkaStreams}} and one in {{InternalTopicManager}}).
> We want to unify both such that only a single instance is used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2611: MINOR: improve MinTimestampTrackerTest and fix NPE...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2611


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4677) Avoid unnecessary task movement across threads during rebalance

2017-03-01 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890935#comment-15890935
 ] 

Jeff Widman commented on KAFKA-4677:


Is KIP-54 related since it also implements a sticky partition assigner, albeit 
within the Kafka consumer? 

I haven't worked directly with streams, but I thought they piggy-backed on top 
of KafKa consumer. If so, wouldn't it be better to just inherit the KIP-54 
implementation?

Again, I don't really understand internals, so not sure.

> Avoid unnecessary task movement across threads during rebalance
> ---
>
> Key: KAFKA-4677
> URL: https://issues.apache.org/jira/browse/KAFKA-4677
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>
> StreamPartitionAssigner tries to follow a sticky assignment policy to avoid 
> expensive task migration. Currently, it does this in a best-effort approach.
> We could observe a case, for which tasks did migrate for no good reason, thus 
> we assume that the current implementation could be improved to be more sticky.
> The concrete scenario is as follows:
> assume we have topology with 3 tasks, A, B, C
> assume we have 3 threads, each executing one task: 1-A, 2-B, 3-C
> for some reason, thread 1 goes down and a rebalance gets triggered
> thread 2 and 3 get their partitions revoked
> sometimes (not sure what the exact condition for this is), the new assignment 
> flips the assignment for task B and C (task A is newly assigned to either 
> thread 2 or 3)
> > possible new assignment 2(A,C) and 3-B
> There is no obvious reason (like load-balancing) why the task assignment for 
> B and C does change to the other thread resulting in unnecessary task 
> migration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4831) Extract WindowedSerde to public APIs

2017-03-01 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4831:


 Summary: Extract WindowedSerde to public APIs
 Key: KAFKA-4831
 URL: https://issues.apache.org/jira/browse/KAFKA-4831
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Now that we have augmented WindowSerde with non-arg parameters, the next step 
is to extract it out as part of the public APIs so that users who wants to I/O 
windowed streams can use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2625: KAFKA:4623- Change default unclean.leader.election...

2017-03-01 Thread sharad-develop
GitHub user sharad-develop opened a pull request:

https://github.com/apache/kafka/pull/2625

KAFKA:4623- Change default unclean.leader.election.enabled from True to 
False

KAFKA:4623- Change default unclean.leader.election.enabled from True to 
False

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sharad-develop/kafka KAFKA-4623

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2625


commit 0aa2549ed5973cb724b4d5a16ee9bc2fdf16f523
Author: sharad.develop 
Date:   2017-03-01T20:03:02Z

KAFKA:4623- Change default unclean.leader.election.enabled from True to 
False




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4830) Augment KStream.print() to allow users pass in extra parameters in the printed string

2017-03-01 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-4830:


 Summary: Augment KStream.print() to allow users pass in extra 
parameters in the printed string
 Key: KAFKA-4830
 URL: https://issues.apache.org/jira/browse/KAFKA-4830
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Today {{KStream.print}} use the hard-coded result string as:

{code}
"[" + this.streamName + "]: " + keyToPrint + " , " + valueToPrint
{code}

And some users are asking to augment this so that they can customize the output 
string as {{KStream.print(KeyValueMapper):

{code}
"[" + this.streamName + "]: " + mapper.apply(keyToPrint, valueToPrint)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4829) Improve logging of StreamTask commits

2017-03-01 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890917#comment-15890917
 ] 

Guozhang Wang commented on KAFKA-4829:
--

I'd like to expand the scope of this JIRA a bit to make a pass over all the 
log4j entries in Streams and see:

1. If some INFO logging is too verbose / frequent that we can reduce.
2. If some TRACE / DEBUG logging is very vital and hence should be in DEBUG / 
INFO logging as long as they are not too frequent.

> Improve logging of StreamTask commits
> -
>
> Key: KAFKA-4829
> URL: https://issues.apache.org/jira/browse/KAFKA-4829
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Steven Schlansker
>Priority: Minor
>  Labels: user-experience
>
> Currently I see this every commit interval:
> {code}
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 1_31
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 2_31
> {code}
> We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
> This means every commit interval we log a few hundred lines of the above
> which is an order of magnitude chattier than anything else in the log
> during normal operations.
> To improve visibility of important messages, we should reduce the chattiness 
> of normal commits and highlight abnormal commits.  An example proposal:
> existing message is fine at TRACE level for diagnostics
> {{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}
> normal fast case, wrap them all up into one summary line
> {{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}
> some kind of threshold / messaging in case it doesn't complete quickly or 
> logs an exception
> {{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Matthias J. Sax
It should be:

groupBy -> always trigger repartitioning
groupByKey -> maybe trigger repartitioning

And there will not be two repartitioning topics. The repartitioning will
be done by the groupBy/groupByKey operation, and thus, in the
aggregation step we know that data is correctly partitioned and there
will be no second repartitioning topic.



-Matthias

On 3/1/17 11:25 AM, Michael Noll wrote:
> FYI: The difference between `groupBy` (may trigger re-partitioning) vs.
> `groupByKey` (does not trigger re-partitioning) also applies to:
> 
> - `map` vs. `mapValues`
> - `flatMap` vs. `flatMapValues`
> 
> 
> 
> On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy  wrote:
> 
>> If you use stream.groupByKey() then there will be no repartitioning as long
>> as there have been no key changing operations preceding it, i.e, map,
>> selectKey, flatMap, transform. If you use stream.groupBy(...) then we see
>> it as a key changing operation, hence we need to repartition the data.
>>
>> On Wed, 1 Mar 2017 at 18:59 Tianji Li  wrote:
>>
>>> Hi there,
>>>
>>> I wonder if it makes sense to give the option to disable auto
>>> repartitioning while doing groupBy.
>>>
>>> I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
>>> an internal topic for repartition will be automatically created and
>> synced
>>> to brokers, which is useful when aggregation keys are not the ones used
>>> when ingesting raw data.
>>>
>>> However, in my case, I have carefully partitioned the data when ingesting
>>> my raw topics. If I do groupBy followed by aggregation, there will be TWO
>>> change logs topics, one for groupBy another or aggregation.
>>>
>>> Does it make sense to make the groupBy one configurable?
>>>
>>> Thanks
>>> Tianji
>>>
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Updated] (KAFKA-4829) Improve logging of StreamTask commits

2017-03-01 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4829:
-
Labels: user-experience  (was: )

> Improve logging of StreamTask commits
> -
>
> Key: KAFKA-4829
> URL: https://issues.apache.org/jira/browse/KAFKA-4829
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Steven Schlansker
>Priority: Minor
>  Labels: user-experience
>
> Currently I see this every commit interval:
> {code}
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 1_31
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 2_31
> {code}
> We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
> This means every commit interval we log a few hundred lines of the above
> which is an order of magnitude chattier than anything else in the log
> during normal operations.
> To improve visibility of important messages, we should reduce the chattiness 
> of normal commits and highlight abnormal commits.  An example proposal:
> existing message is fine at TRACE level for diagnostics
> {{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}
> normal fast case, wrap them all up into one summary line
> {{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}
> some kind of threshold / messaging in case it doesn't complete quickly or 
> logs an exception
> {{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4829) Improve logging of StreamTask commits

2017-03-01 Thread Steven Schlansker (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Schlansker updated KAFKA-4829:
-
Description: 
Currently I see this every commit interval:

{code}
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 1_31
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 2_31
{code}

We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
This means every commit interval we log a few hundred lines of the above
which is an order of magnitude chattier than anything else in the log
during normal operations.

To improve visibility of important messages, we should reduce the chattiness of 
normal commits and highlight abnormal commits.  An example proposal:

existing message is fine at TRACE level for diagnostics
{{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}

normal fast case, wrap them all up into one summary line
{{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}

some kind of threshold / messaging in case it doesn't complete quickly or logs 
an exception
{{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}

  was:
Currently I see this every commit interval:

{code}
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 1_31
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 2_31
{code}

We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
This means every commit interval we log a few hundred lines of the above
which is an order of magnitude chattier than anything else in the log
during normal operations.

To improve visibility of important messages, we should reduce the chattiness of 
normal commits and highlight abnormal commits.  An example proposal:

existing message is fine at TRACE level for diagnostics
{{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}

normal fast case, wrap them all up into one summary line
{{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}

some kind of threshold / messaging in case it doesn't complete quickly
or logs an exception
{{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}

Thoughts?


> Improve logging of StreamTask commits
> -
>
> Key: KAFKA-4829
> URL: https://issues.apache.org/jira/browse/KAFKA-4829
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Steven Schlansker
>Priority: Minor
>
> Currently I see this every commit interval:
> {code}
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 1_31
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 2_31
> {code}
> We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
> This means every commit interval we log a few hundred lines of the above
> which is an order of magnitude chattier than anything else in the log
> during normal operations.
> To improve visibility of important messages, we should reduce the chattiness 
> of normal commits and highlight abnormal commits.  An example proposal:
> existing message is fine at TRACE level for diagnostics
> {{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}
> normal fast case, wrap them all up into one summary line
> {{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}
> some kind of threshold / messaging in case it doesn't complete quickly or 
> logs an exception
> {{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4829) Improve logging of StreamTask commits

2017-03-01 Thread Steven Schlansker (JIRA)
Steven Schlansker created KAFKA-4829:


 Summary: Improve logging of StreamTask commits
 Key: KAFKA-4829
 URL: https://issues.apache.org/jira/browse/KAFKA-4829
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Steven Schlansker
Priority: Minor


Currently I see this every commit interval:

2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread
- stream-thread [StreamThread-1] Committing task StreamTask 1_31
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread
- stream-thread [StreamThread-1] Committing task StreamTask 2_31

We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
This means every commit interval we log a few hundred lines of the above
which is an order of magnitude chattier than anything else in the log
during normal operations.

To improve visibility of important messages, we should reduce the chattiness of 
normal commits and highlight abnormal commits.  An example proposal:

existing message is fine at TRACE level for diagnostics
{{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}

normal fast case, wrap them all up into one summary line
{{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}

some kind of threshold / messaging in case it doesn't complete quickly
or logs an exception
{{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4829) Improve logging of StreamTask commits

2017-03-01 Thread Steven Schlansker (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Schlansker updated KAFKA-4829:
-
Description: 
Currently I see this every commit interval:

{code}
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 1_31
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
task StreamTask 2_31
{code}

We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
This means every commit interval we log a few hundred lines of the above
which is an order of magnitude chattier than anything else in the log
during normal operations.

To improve visibility of important messages, we should reduce the chattiness of 
normal commits and highlight abnormal commits.  An example proposal:

existing message is fine at TRACE level for diagnostics
{{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}

normal fast case, wrap them all up into one summary line
{{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}

some kind of threshold / messaging in case it doesn't complete quickly
or logs an exception
{{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}

Thoughts?

  was:
Currently I see this every commit interval:

2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread
- stream-thread [StreamThread-1] Committing task StreamTask 1_31
2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
o.a.k.s.p.internals.StreamThread
- stream-thread [StreamThread-1] Committing task StreamTask 2_31

We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
This means every commit interval we log a few hundred lines of the above
which is an order of magnitude chattier than anything else in the log
during normal operations.

To improve visibility of important messages, we should reduce the chattiness of 
normal commits and highlight abnormal commits.  An example proposal:

existing message is fine at TRACE level for diagnostics
{{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}

normal fast case, wrap them all up into one summary line
{{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}

some kind of threshold / messaging in case it doesn't complete quickly
or logs an exception
{{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}

Thoughts?


> Improve logging of StreamTask commits
> -
>
> Key: KAFKA-4829
> URL: https://issues.apache.org/jira/browse/KAFKA-4829
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Steven Schlansker
>Priority: Minor
>
> Currently I see this every commit interval:
> {code}
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 1_31
> 2017-02-28T21:27:16.659Z INFO <> [StreamThread-1] 
> o.a.k.s.p.internals.StreamThread - stream-thread [StreamThread-1] Committing 
> task StreamTask 2_31
> {code}
> We have ~10 tasks in our topology, 4 topics, and 32 partitions per topic.
> This means every commit interval we log a few hundred lines of the above
> which is an order of magnitude chattier than anything else in the log
> during normal operations.
> To improve visibility of important messages, we should reduce the chattiness 
> of normal commits and highlight abnormal commits.  An example proposal:
> existing message is fine at TRACE level for diagnostics
> {{TRACE o.a.k.s.p.i.StreamThread - Committing task StreamTask 1_31}}
> normal fast case, wrap them all up into one summary line
> {{INFO o.a.k.s.p.i.StreamThreads - 64 stream tasks committed in 25ms}}
> some kind of threshold / messaging in case it doesn't complete quickly
> or logs an exception
> {{ERROR o.a.k.s.p.i.StreamThread - StreamTask 1_32 did not commit in 100ms}}
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Michael Noll
FYI: The difference between `groupBy` (may trigger re-partitioning) vs.
`groupByKey` (does not trigger re-partitioning) also applies to:

- `map` vs. `mapValues`
- `flatMap` vs. `flatMapValues`



On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy  wrote:

> If you use stream.groupByKey() then there will be no repartitioning as long
> as there have been no key changing operations preceding it, i.e, map,
> selectKey, flatMap, transform. If you use stream.groupBy(...) then we see
> it as a key changing operation, hence we need to repartition the data.
>
> On Wed, 1 Mar 2017 at 18:59 Tianji Li  wrote:
>
> > Hi there,
> >
> > I wonder if it makes sense to give the option to disable auto
> > repartitioning while doing groupBy.
> >
> > I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
> > an internal topic for repartition will be automatically created and
> synced
> > to brokers, which is useful when aggregation keys are not the ones used
> > when ingesting raw data.
> >
> > However, in my case, I have carefully partitioned the data when ingesting
> > my raw topics. If I do groupBy followed by aggregation, there will be TWO
> > change logs topics, one for groupBy another or aggregation.
> >
> > Does it make sense to make the groupBy one configurable?
> >
> > Thanks
> > Tianji
> >
>



-- 
*Michael G. Noll*
Product Manager | Confluent
+1 650 453 5860 | @miguno 
Follow us: Twitter  | Blog



[jira] [Commented] (KAFKA-4677) Avoid unnecessary task movement across threads during rebalance

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890856#comment-15890856
 ] 

ASF GitHub Bot commented on KAFKA-4677:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2609


> Avoid unnecessary task movement across threads during rebalance
> ---
>
> Key: KAFKA-4677
> URL: https://issues.apache.org/jira/browse/KAFKA-4677
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.3.0
>
>
> StreamPartitionAssigner tries to follow a sticky assignment policy to avoid 
> expensive task migration. Currently, it does this in a best-effort approach.
> We could observe a case, for which tasks did migrate for no good reason, thus 
> we assume that the current implementation could be improved to be more sticky.
> The concrete scenario is as follows:
> assume we have topology with 3 tasks, A, B, C
> assume we have 3 threads, each executing one task: 1-A, 2-B, 3-C
> for some reason, thread 1 goes down and a rebalance gets triggered
> thread 2 and 3 get their partitions revoked
> sometimes (not sure what the exact condition for this is), the new assignment 
> flips the assignment for task B and C (task A is newly assigned to either 
> thread 2 or 3)
> > possible new assignment 2(A,C) and 3-B
> There is no obvious reason (like load-balancing) why the task assignment for 
> B and C does change to the other thread resulting in unnecessary task 
> migration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2609: KAFKA-4677: [Follow Up] add optimization to Sticky...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2609


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4743) Add a tool to Reset Consumer Group Offsets

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890839#comment-15890839
 ] 

ASF GitHub Bot commented on KAFKA-4743:
---

GitHub user jeqo opened a pull request:

https://github.com/apache/kafka/pull/2624

KAFKA-4743: Add Reset Consumer Group Offsets tooling [KIP-122]



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeqo/kafka 
feature/rewind-consumer-group-offset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2624


commit ef14ccfc99360a389bf6597a7eb92cf8a1076c69
Author: Jorge Quilcate 
Date:   2017-02-17T15:20:59Z

Add reset-offset option

commit 25b3e758f32375431b56e0d810d7e66a154f8d78
Author: Jorge Quilcate 
Date:   2017-02-17T15:58:18Z

Add reset-offset print assignments

commit 398bd76580d4c937430a2c5fbee0e27659acc5da
Author: Jorge Quilcate 
Date:   2017-02-20T02:28:12Z

Draft Reset to specific commit

commit 3aa04620600b71c1d4fc6c6bf37f8cfd95bc6974
Author: Jorge Quilcate 
Date:   2017-02-20T11:48:29Z

Add plus, minus, earliest, latest cases

commit 01a65569046bba6f2425cb4429484afbcd548f2d
Author: Jorge Quilcate 
Date:   2017-02-20T16:17:22Z

change period by duration

commit 43fef5a88956fca084b8c43912be145215468daa
Author: Jorge Quilcate 
Date:   2017-02-20T22:07:17Z

add export/import reset plan

commit 308b2b7623169427c926cf61d77ec3dba1f2626a
Author: Jorge Quilcate 
Date:   2017-02-20T22:10:50Z

fix print

commit 34376f18078092584028b9c704a07b28c3d41143
Author: Jorge Quilcate 
Date:   2017-02-21T10:45:59Z

rename options, change json to csv format

commit bafc4e0ebd7e0673e638f39e39a7666735687420
Author: Jorge Quilcate 
Date:   2017-02-21T11:32:28Z

fix csv generation

commit d1b4588ac4f497a4e3605ad429dc17dd41535468
Author: Jorge Quilcate 
Date:   2017-03-01T17:50:06Z

add test cases

commit 93badea267b91d33c0ddbba64847dd53ca580f47
Author: Jorge Quilcate 
Date:   2017-03-01T18:23:04Z

add test cases for duration and datetime

commit 0af432fec9e800f2c207b81b71927b7ec73a7edf
Author: Jorge Quilcate 
Date:   2017-03-01T18:55:44Z

add test case for export import

commit 8178a4a8612cd86d5516cde9f0f61d5df9971dfb
Author: Jorge Quilcate 
Date:   2017-03-01T19:05:15Z

fix messages




> Add a tool to Reset Consumer Group Offsets
> --
>
> Key: KAFKA-4743
> URL: https://issues.apache.org/jira/browse/KAFKA-4743
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer, core, tools
>Reporter: Jorge Quilcate
>  Labels: kip
>
> Add an external tool to reset Consumer Group offsets, and achieve rewind over 
> the topics, without changing client-side code.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2624: KAFKA-4743: Add Reset Consumer Group Offsets tooli...

2017-03-01 Thread jeqo
GitHub user jeqo opened a pull request:

https://github.com/apache/kafka/pull/2624

KAFKA-4743: Add Reset Consumer Group Offsets tooling [KIP-122]



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeqo/kafka 
feature/rewind-consumer-group-offset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2624


commit ef14ccfc99360a389bf6597a7eb92cf8a1076c69
Author: Jorge Quilcate 
Date:   2017-02-17T15:20:59Z

Add reset-offset option

commit 25b3e758f32375431b56e0d810d7e66a154f8d78
Author: Jorge Quilcate 
Date:   2017-02-17T15:58:18Z

Add reset-offset print assignments

commit 398bd76580d4c937430a2c5fbee0e27659acc5da
Author: Jorge Quilcate 
Date:   2017-02-20T02:28:12Z

Draft Reset to specific commit

commit 3aa04620600b71c1d4fc6c6bf37f8cfd95bc6974
Author: Jorge Quilcate 
Date:   2017-02-20T11:48:29Z

Add plus, minus, earliest, latest cases

commit 01a65569046bba6f2425cb4429484afbcd548f2d
Author: Jorge Quilcate 
Date:   2017-02-20T16:17:22Z

change period by duration

commit 43fef5a88956fca084b8c43912be145215468daa
Author: Jorge Quilcate 
Date:   2017-02-20T22:07:17Z

add export/import reset plan

commit 308b2b7623169427c926cf61d77ec3dba1f2626a
Author: Jorge Quilcate 
Date:   2017-02-20T22:10:50Z

fix print

commit 34376f18078092584028b9c704a07b28c3d41143
Author: Jorge Quilcate 
Date:   2017-02-21T10:45:59Z

rename options, change json to csv format

commit bafc4e0ebd7e0673e638f39e39a7666735687420
Author: Jorge Quilcate 
Date:   2017-02-21T11:32:28Z

fix csv generation

commit d1b4588ac4f497a4e3605ad429dc17dd41535468
Author: Jorge Quilcate 
Date:   2017-03-01T17:50:06Z

add test cases

commit 93badea267b91d33c0ddbba64847dd53ca580f47
Author: Jorge Quilcate 
Date:   2017-03-01T18:23:04Z

add test cases for duration and datetime

commit 0af432fec9e800f2c207b81b71927b7ec73a7edf
Author: Jorge Quilcate 
Date:   2017-03-01T18:55:44Z

add test case for export import

commit 8178a4a8612cd86d5516cde9f0f61d5df9971dfb
Author: Jorge Quilcate 
Date:   2017-03-01T19:05:15Z

fix messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4743) Add a tool to Reset Consumer Group Offsets

2017-03-01 Thread Jorge Quilcate (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Quilcate updated KAFKA-4743:
--
Description: 
Add an external tool to reset Consumer Group offsets, and achieve rewind over 
the topics, without changing client-side code.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling

  was:
Add an external tool to reset Consumer Group offsets, and achieve rewind over 
the topics, without changing client-side code.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+a+tool+to+Reset+Consumer+Group+Offsets


> Add a tool to Reset Consumer Group Offsets
> --
>
> Key: KAFKA-4743
> URL: https://issues.apache.org/jira/browse/KAFKA-4743
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer, core, tools
>Reporter: Jorge Quilcate
>  Labels: kip
>
> Add an external tool to reset Consumer Group offsets, and achieve rewind over 
> the topics, without changing client-side code.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Damian Guy
If you use stream.groupByKey() then there will be no repartitioning as long
as there have been no key changing operations preceding it, i.e, map,
selectKey, flatMap, transform. If you use stream.groupBy(...) then we see
it as a key changing operation, hence we need to repartition the data.

On Wed, 1 Mar 2017 at 18:59 Tianji Li  wrote:

> Hi there,
>
> I wonder if it makes sense to give the option to disable auto
> repartitioning while doing groupBy.
>
> I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
> an internal topic for repartition will be automatically created and synced
> to brokers, which is useful when aggregation keys are not the ones used
> when ingesting raw data.
>
> However, in my case, I have carefully partitioned the data when ingesting
> my raw topics. If I do groupBy followed by aggregation, there will be TWO
> change logs topics, one for groupBy another or aggregation.
>
> Does it make sense to make the groupBy one configurable?
>
> Thanks
> Tianji
>


groupBy without auto-repartition topics for Kafka Streams

2017-03-01 Thread Tianji Li
Hi there,

I wonder if it makes sense to give the option to disable auto
repartitioning while doing groupBy.

I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
an internal topic for repartition will be automatically created and synced
to brokers, which is useful when aggregation keys are not the ones used
when ingesting raw data.

However, in my case, I have carefully partitioned the data when ingesting
my raw topics. If I do groupBy followed by aggregation, there will be TWO
change logs topics, one for groupBy another or aggregation.

Does it make sense to make the groupBy one configurable?

Thanks
Tianji


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-03-01 Thread radai
@michael:

i used void because im used to java beans. thinking about it, i dont see
much use for returning false from adding a header: if the headers are in
read-only you should probably thrown an IllegalStateException because lets
face it, 99% of users dont check return values.
returning "this" is probably more useful because it would allow chaining:

Headers.add().add().remove()

On Wed, Mar 1, 2017 at 6:47 AM, Ismael Juma  wrote:

> Hi Becket,
>
> Thanks for sharing your thoughts. More inline.
>
> On Wed, Mar 1, 2017 at 2:54 AM, Becket Qin  wrote:
>
> > As you can imagine if the ProducerRecord has a value as a List and the
> > Interceptor.onSend() can actually add an element to the List. If the
> > producer.send() is called on the same ProducerRecord again, the value
> list
> > would have one more element than the previously sent ProducerRecord even
> > though the ProducerRecord itself is not mutable, right? Same thing can
> > apply to any modifiable class type.
> >
>
> The difference is that the user chooses the value type. They are free to
> choose a mutable or immutable type. A generic interceptor cannot mutate the
> value because it doesn't know the type (and it could be immutable). One
> could write an interceptor that checked the type of the value at runtime
> and did things based on that. But again, the user who creates the record is
> in control.
>
> From this standpoint allowing headers to be mutable doesn't really weaken
> > the mutability we already have. Admittedly a mutable header is kind of
> > guiding user towards to change the headers in the existing object instead
> > of creating a new one.
>
>
> Yes, with headers, we are providing a model for the user (the user doesn't
> get to choose it like with keys and values) and for the interceptors. So, I
> think it's not the same.
>
>
> > But I think reusing an object while it is possible
> > to be modified by user code is a risk that users themselves are willing
> to
> > take. And we do not really protect against that.
>
>
> If users want to take that risk, it's fine. But we give them the option to
> protect themselves. With mutable headers, there is no option.
>
>
> > But there still seems
> > value to allow the users to not pay the overhead of creating tons of
> > objects if they do not reuse an object to send it twice, which is
> probably
> > a more common case.
> >
>
> I think the assumption that there would be tons of objects created is
> incorrect (I suggested a solution that would only involve one additional
> reference in the `Header` instance). The usability of the immutable API
> seemed to be more of an issue.
>
> In any case, if we do add the `close()` method, then we need to add a note
> to the compatibility section stating that once a producer record is sent,
> it cannot be sent again as this would cause interceptors that add headers
> to fail.
>
> Ismael
>


[jira] [Created] (KAFKA-4828) ProcessorTopologyTestDriver does not work when using .through()

2017-03-01 Thread Hamidreza Afzali (JIRA)
Hamidreza Afzali created KAFKA-4828:
---

 Summary: ProcessorTopologyTestDriver does not work when using 
.through()
 Key: KAFKA-4828
 URL: https://issues.apache.org/jira/browse/KAFKA-4828
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Hamidreza Afzali
Assignee: Hamidreza Afzali


*Problem:*

ProcessorTopologyTestDriver does not work when testing a topology that uses 
through().

{code}
org.apache.kafka.streams.errors.StreamsException: Store count2's change log 
(count2-topic) does not contain partition 1
at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.validatePartitionExists(StoreChangelogReader.java:81)
{code}

*Example:*

{code}
object Topology1 {

  def main(args: Array[String]): Unit = {

val inputTopic = "input"
val stateStore = "count"
val stateStore2 = "count2"
val outputTopic2 = "count2-topic"
val inputs = Seq[(String, Integer)](("A", 1), ("A", 2))

val props = new Properties
props.put(StreamsConfig.APPLICATION_ID_CONFIG, UUID.randomUUID().toString)
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")

val builder = new KStreamBuilder
builder.stream(Serdes.String, Serdes.Integer, inputTopic)
  .groupByKey(Serdes.String, Serdes.Integer)
  .count(stateStore)
  .through(Serdes.String, Serdes.Long, outputTopic2, stateStore2)

val driver = new ProcessorTopologyTestDriver(new StreamsConfig(props), 
builder, stateStore, stateStore2)
inputs.foreach {
  case (key, value) => {
driver.process(inputTopic, key, value, Serdes.String.serializer, 
Serdes.Integer.serializer)
val record = driver.readOutput(outputTopic2, 
Serdes.String.deserializer, Serdes.Long.deserializer)
println(record)
  }
}
  }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4827) Kafka connect: Escape special characters in connector name

2017-03-01 Thread Aymeric Bouvet (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aymeric Bouvet updated KAFKA-4827:
--
Description: 
When creating a connector, if the connector name (and possibly other 
properties) end with a carriage return, kafka-connect will create the config 
but report error

{code}
cat << EOF > file-connector.json
{
  "name": "file-connector\r",
  "config": {
"topic": "kafka-connect-logs\r",
"tasks.max": "1",
"file": "/var/log/ansible-confluent/connect.log",
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector"
  }
}
EOF
curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" 
-d @file-connector.json localhost:8083/connectors 
{code}

returns an error 500  and log the following

{code}
[2017-03-01 18:25:23,895] WARN  (org.eclipse.jetty.servlet.ServletHandler)
javax.servlet.ServletException: java.lang.IllegalArgumentException: Illegal 
character in path at index 27: /connectors/file-connector4
at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
at 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Illegal character in path at 
index 27: /connectors/file-connector4
at java.net.URI.create(URI.java:852)
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
at 
org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
at org.glassfish.jersey.internal.Errors$1.call(

[jira] [Updated] (KAFKA-4827) Kafka connect: error with special characters in connector name

2017-03-01 Thread Aymeric Bouvet (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aymeric Bouvet updated KAFKA-4827:
--
Summary: Kafka connect: error with special characters in connector name  
(was: Kafka connect: Escape special characters in connector name)

> Kafka connect: error with special characters in connector name
> --
>
> Key: KAFKA-4827
> URL: https://issues.apache.org/jira/browse/KAFKA-4827
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Aymeric Bouvet
>Priority: Minor
>
> When creating a connector, if the connector name (and possibly other 
> properties) end with a carriage return, kafka-connect will create the config 
> but report error
> {code}
> cat << EOF > file-connector.json
> {
>   "name": "file-connector\r",
>   "config": {
> "topic": "kafka-connect-logs\r",
> "tasks.max": "1",
> "file": "/var/log/ansible-confluent/connect.log",
> "connector.class": 
> "org.apache.kafka.connect.file.FileStreamSourceConnector"
>   }
> }
> EOF
> curl -X POST -H "Content-Type: application/json" -H "Accept: 
> application/json" -d @file-connector.json localhost:8083/connectors 
> {code}
> returns an error 500  and log the following
> {code}
> [2017-03-01 18:25:23,895] WARN  (org.eclipse.jetty.servlet.ServletHandler)
> javax.servlet.ServletException: java.lang.IllegalArgumentException: Illegal 
> character in path at index 27: /connectors/file-connector4
> at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
> at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at 
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Illegal character in path at 
> index 27: /connectors/file-connector4
> at java.net.URI.create(URI.java:852)
> at 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:100)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
> at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
> at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
> at 
> org.glassfish.jersey.server.mo

[jira] [Updated] (KAFKA-4826) Fix some findbugs warnings in Kafka Streams

2017-03-01 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4826:
---
Component/s: streams

> Fix some findbugs warnings in Kafka Streams
> ---
>
> Key: KAFKA-4826
> URL: https://issues.apache.org/jira/browse/KAFKA-4826
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Fix some findbugs warnings in Kafka Streams



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4827) Kafka connect: Escape special characters in connector name

2017-03-01 Thread Aymeric Bouvet (JIRA)
Aymeric Bouvet created KAFKA-4827:
-

 Summary: Kafka connect: Escape special characters in connector name
 Key: KAFKA-4827
 URL: https://issues.apache.org/jira/browse/KAFKA-4827
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.1.0
Reporter: Aymeric Bouvet
Priority: Minor


When creating a connector, if the connector name (and possibly other 
properties) end with a carriage return, kafka-connect will create the config 
but report error

cat << EOF > file-connector
.json
{
  "name": "file-connector\r",
  "config": {
"topic": "kafka-connect-logs\r",
"tasks.max": "1",
"file": "/var/log/ansible-confluent/connect.log",
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector"
  }
}
EOF
curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" 
-d @file-connector
.json localhost:8083/connectors 

returns an error 500  and log the following

[2017-03-01 18:25:23,895] WARN  (org.eclipse.jetty.servlet.ServletHandler)
javax.servlet.ServletException: java.lang.IllegalArgumentException: Illegal 
character in path at index 27: /connectors/file-connector4
at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
at 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Illegal character in path at 
index 27: /connectors/file-connector4
at java.net.URI.create(URI.java:852)
at 
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at 
org.glassfish.jersey.s

[jira] [Commented] (KAFKA-3686) Kafka producer is not fault tolerant

2017-03-01 Thread Xu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890746#comment-15890746
 ] 

Xu Zhang commented on KAFKA-3686:
-

Any update there? I think we have the same issue.

> Kafka producer is not fault tolerant
> 
>
> Key: KAFKA-3686
> URL: https://issues.apache.org/jira/browse/KAFKA-3686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Luca Bruno
>
> *Setup*
> I have a cluster of 3 kafka server, a topic with 12 partitions with replica 
> 2, and a zookeeper cluster of 3 nodes.
> Producer config:
> {code}
>  props.put("bootstrap.servers", "k1:9092,k2:9092,k3:9092");
>  props.put("acks", "1");
>  props.put("batch.size", 16384);
>  props.put("retries", 3);
>  props.put("buffer.memory", 33554432);
>  props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
>  props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> {code}
> Producer code:
> {code}
>  Producer producer = new KafkaProducer<>(props);
>  for(int i = 0; i < 10; i++) {
>  Future f = producer.send(new ProducerRecord String>("topic", null, Integer.toString(i)));
>  f.get();
>  }
> {code}
> *Problem*
> Cut the network between the producer (p1) and one of the kafka servers (say 
> k1).
> The cluster is healthy, hence the kafka bootstrap tells the producer that 
> there are 3 kafka servers (as I understood it), and the leaders of the 
> partitions of the topic.
> So the producer will send messages to all of the 3 leaders for each 
> partition. If the leader happens to be k1 for a message, the producer raises 
> the following exception after request.timeout.ms:
> {code}
> Exception in thread "main" java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Batch Expired
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56)
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43)
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
>   at Test.main(Test.java:25)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Batch Expired
> {code}
> In theory, the application should handle the failure. In practice, messages 
> are getting lost, even though there are other 2 leaders available for writing.
> I tried with values of acks both 1 and -1.
> *What I expected*
> Given the client library is automatically deciding the hashing / round robin 
> schema for the partition, I would say it's not very important which partition 
> the message is being sent to.
> I expect the client library to handle the failure by sending the message to a 
> partition of a different leader.
> Neither kafka-clients nor rdkafka handle this failure. Given those are the 
> main client libraries being used for kafka as far as I know, I find it a 
> serious problem in terms of fault tolerance.
> EDIT: I cannot add comments to this issue, don't understand why. To answer 
> [~fpj] yes, I want the first. In the case of network partitions I want to 
> ensure my messages are stored. If the libraries don't do that, it means I 
> have to reimplement them. Or otherwise, postpone sending such messages until 
> the network partition resolves (which means implementing some kind of backlog 
> on disk of the producer, which should instead be the kafka purpose after 
> all). In both cases, it's something that is not documented and it's very 
> inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4826) Fix some findbugs warnings in Kafka Streams

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890725#comment-15890725
 ] 

ASF GitHub Bot commented on KAFKA-4826:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2623

KAFKA-4826. Fix some findbugs warnings in Kafka Streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4826

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2623


commit e8f7f125a21625def8e1826e14858a655b3bba45
Author: Colin P. Mccabe 
Date:   2017-03-01T18:14:04Z

KAFKA-4826. Fix some findbugs warnings in Kafka Streams




> Fix some findbugs warnings in Kafka Streams
> ---
>
> Key: KAFKA-4826
> URL: https://issues.apache.org/jira/browse/KAFKA-4826
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Fix some findbugs warnings in Kafka Streams



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2623: KAFKA-4826. Fix some findbugs warnings in Kafka St...

2017-03-01 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2623

KAFKA-4826. Fix some findbugs warnings in Kafka Streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4826

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2623


commit e8f7f125a21625def8e1826e14858a655b3bba45
Author: Colin P. Mccabe 
Date:   2017-03-01T18:14:04Z

KAFKA-4826. Fix some findbugs warnings in Kafka Streams




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4826) Fix some findbugs warnings in Kafka Streams

2017-03-01 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-4826:
--

 Summary: Fix some findbugs warnings in Kafka Streams
 Key: KAFKA-4826
 URL: https://issues.apache.org/jira/browse/KAFKA-4826
 Project: Kafka
  Issue Type: Bug
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Fix some findbugs warnings in Kafka Streams



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread Aneesh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890699#comment-15890699
 ] 

Aneesh commented on KAFKA-4823:
---

[~ijuma]  Kafka 0.8.2.0 worked for me.I did go through REST proxy , but 
couldn't get an idea how to implement that.Can you provide some step by step 
documentation and implemented Producer/Consumer on the same.

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4825) Likely Data Loss in ReassignPartitionsTest System Test

2017-03-01 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890661#comment-15890661
 ] 

Ben Stopford commented on KAFKA-4825:
-

This could be a result of KIP-101 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation)

> Likely Data Loss in ReassignPartitionsTest System Test
> --
>
> Key: KAFKA-4825
> URL: https://issues.apache.org/jira/browse/KAFKA-4825
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
> Attachments: problem.zip
>
>
> A failure in the below test may imply to a genuine missing message. 
> kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.security_protocol=PLAINTEXT
> The test - which reassigns partition whilst bouncing cluster members - 
> reconciles messages ack'd with messages received in the consumer. 
> The interesting part is that we received two ack's for the same offset, with 
> different messages:
> {"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7447","time_ms":1488349980718,"offset":372,"key":null}
> {"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7487","time_ms":1488349981780,"offset":372,"key":null}
> When searching the log files, via kafka.tools.DumpLogSegments, only the later 
> message is found. 
> The missing message lies midway through the test and appears to occur after a 
> leader moves (after 7447 is sent there is a ~1s pause, then 7487 is sent, 
> along with a backlog of messages for partitions 11, 16, 6). 
> The overall implication is a message appears to be acknowledged but later 
> lost. 
> Looking at the test itself it seems valid. The producer is initialised with 
> acks = -1. The callback checks for an exception in the onCompletion callback 
> and uses this to track acknowledgement in the test. 
> https://jenkins.confluent.io/job/system-test-kafka/521/console
> http://testing.confluent.io/confluent-kafka-system-test-results/?prefix=2017-03-01--001.1488363091--apache--trunk--c9872cb/ReassignPartitionsTest/test_reassign_partitions/bounce_brokers=True.security_protocol=PLAINTEXT/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-4823:
-
Component/s: (was: KafkaConnect)

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-03-01 Thread Dong Lin
Hi Ismael,

I actually mean log_start_offset. I realized that it is a better name after
I start implementation because "logStartOffset" is already used in
Log.scala and LogCleanerManager.scala. So I changed it from
log_begin_offset to log_start_offset in the patch. But I forgot to update
the KIP and specify it in the mailing thread.

Thanks for catching this. Let me update the KIP to reflect this change.

Dong


On Wed, Mar 1, 2017 at 6:15 AM, Ismael Juma  wrote:

> Hi Dong,
>
> When you say "logStartOffset", do you mean "log_begin_offset "? I could
> only find the latter in the KIP. If so, would log_start_offset be a better
> name?
>
> Ismael
>
> On Tue, Feb 28, 2017 at 4:26 AM, Dong Lin  wrote:
>
> > Hi Jun and everyone,
> >
> > I would like to change the KIP in the following way. Currently, if any
> > replica if offline, the purge result for a partition will
> > be NotEnoughReplicasException and its low_watermark will be 0. The
> > motivation for this approach is that we want to guarantee that the data
> > before purgedOffset has been deleted on all replicas of this partition if
> > purge result indicates success.
> >
> > But this approach seems too conservative. It should be sufficient in most
> > cases to just tell user success and set low_watermark to minimum
> > logStartOffset of all live replicas in the PurgeResponse if
> logStartOffset
> > of all live replicas have reached purgedOffset. This is because for an
> > offline replicas to become online and be elected leader, it should have
> > received one FetchReponse from the current leader which should tell it to
> > purge beyond purgedOffset. The benefit of doing this change is that we
> can
> > allow purge operation to succeed when some replica is offline.
> >
> > Are you OK with this change? If so, I will go ahead to update the KIP and
> > implement this behavior.
> >
> > Thanks,
> > Dong
> >
> >
> >
> > On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin  wrote:
> >
> > > Hey Jun,
> > >
> > > Do you have time to review the KIP again or vote for it?
> > >
> > > Hey Ewen,
> > >
> > > Can you also review the KIP again or vote for it? I have discussed with
> > > Radai and Becket regarding your concern. We still think putting it in
> > Admin
> > > Client seems more intuitive because there is use-case where application
> > > which manages topic or produces data may also want to purge data. It
> > seems
> > > weird if they need to create a consumer to do this.
> > >
> > > Thanks,
> > > Dong
> > >
> > > On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
> > > gharatmayures...@gmail.com> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Thanks,
> > >>
> > >> Mayuresh
> > >>
> > >> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin 
> wrote:
> > >>
> > >> > Sorry for the duplicated email. It seems that gmail will put the
> > voting
> > >> > email in this thread if I simply replace DISCUSS with VOTE in the
> > >> subject.
> > >> >
> > >> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin 
> > wrote:
> > >> >
> > >> > > Hi all,
> > >> > >
> > >> > > It seems that there is no further concern with the KIP-107. At
> this
> > >> point
> > >> > > we would like to start the voting process. The KIP can be found at
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107
> > >> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
> > >> > >
> > >> > > Thanks,
> > >> > > Dong
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -Regards,
> > >> Mayuresh R. Gharat
> > >> (862) 250-7125
> > >>
> > >
> > >
> >
>


[jira] [Created] (KAFKA-4825) Likely Data Loss in ReassignPartitionsTest System Test

2017-03-01 Thread Ben Stopford (JIRA)
Ben Stopford created KAFKA-4825:
---

 Summary: Likely Data Loss in ReassignPartitionsTest System Test
 Key: KAFKA-4825
 URL: https://issues.apache.org/jira/browse/KAFKA-4825
 Project: Kafka
  Issue Type: Bug
Reporter: Ben Stopford
 Attachments: problem.zip

A failure in the below test may imply to a genuine missing message. 

kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.security_protocol=PLAINTEXT

The test - which reassigns partition whilst bouncing cluster members - 
reconciles messages ack'd with messages received in the consumer. 

The interesting part is that we received two ack's for the same offset, with 
different messages:

{"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7447","time_ms":1488349980718,"offset":372,"key":null}

{"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7487","time_ms":1488349981780,"offset":372,"key":null}

When searching the log files, via kafka.tools.DumpLogSegments, only the later 
message is found. 

The missing message lies midway through the test and appears to occur after a 
leader moves (after 7447 is sent there is a ~1s pause, then 7487 is sent, along 
with a backlog of messages for partitions 11, 16, 6). 

The overall implication is a message appears to be acknowledged but later lost. 

Looking at the test itself it seems valid. The producer is initialised with 
acks = -1. The callback checks for an exception in the onCompletion callback 
and uses this to track acknowledgement in the test. 


https://jenkins.confluent.io/job/system-test-kafka/521/console
http://testing.confluent.io/confluent-kafka-system-test-results/?prefix=2017-03-01--001.1488363091--apache--trunk--c9872cb/ReassignPartitionsTest/test_reassign_partitions/bounce_brokers=True.security_protocol=PLAINTEXT/




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4092) retention.bytes should not be allowed to be less than segment.bytes

2017-03-01 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890540#comment-15890540
 ] 

Andrew Olson commented on KAFKA-4092:
-

Note that this Jira is being reverted in 0.10.2.1 (see KAFKA-4788 for details).

> retention.bytes should not be allowed to be less than segment.bytes
> ---
>
> Key: KAFKA-4092
> URL: https://issues.apache.org/jira/browse/KAFKA-4092
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Dustin Cote
>Assignee: Dustin Cote
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> Right now retention.bytes can be as small as the user wants but it doesn't 
> really get acted on for the active segment if retention.bytes is smaller than 
> segment.bytes.  We shouldn't allow retention.bytes to be less than 
> segment.bytes and validate that at startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890468#comment-15890468
 ] 

Ismael Juma commented on KAFKA-4823:


If you can't upgrade the Java version in the client, the options are: use a 
REST proxy (search for "kafka rest proxy") or use the producer from Kafka 
0.8.2.0 (not recommended since a lot of bugs have been fixed since that 
release).

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread Aneesh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890422#comment-15890422
 ] 

Aneesh commented on KAFKA-4823:
---

[~bernie huang] I use JDK1.6 to compile my app jar,still it fails

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4824) add unique identifier to clientId

2017-03-01 Thread Clemens Valiente (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clemens Valiente resolved KAFKA-4824.
-
Resolution: Duplicate

> add unique identifier to clientId
> -
>
> Key: KAFKA-4824
> URL: https://issues.apache.org/jira/browse/KAFKA-4824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Affects Versions: 0.10.2.0
>Reporter: Clemens Valiente
>Assignee: Clemens Valiente
>Priority: Minor
>
> As discussed with [~damianguy]
> streams clientId is set to applicationId-threadId-consumer which of course 
> wouldn’t be unique if running multiple instances per host.
> *Example use case:*
> deployment as docker container on marathon so it might end up multiple times 
> on the same host and of course they're all running with the same 
> configuration. All clients on the same host would have the same clientid
> *Problem*
> Consumers running with the same client id make it hard to identify lagging 
> consumers correctly 
> *Potential solution:*
> Internally we do have a processId. It might make sense if we attached that to 
> the clientId. it would then be unique across instances



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (KAFKA-4824) add unique identifier to clientId

2017-03-01 Thread Clemens Valiente (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clemens Valiente closed KAFKA-4824.
---

> add unique identifier to clientId
> -
>
> Key: KAFKA-4824
> URL: https://issues.apache.org/jira/browse/KAFKA-4824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Affects Versions: 0.10.2.0
>Reporter: Clemens Valiente
>Assignee: Clemens Valiente
>Priority: Minor
>
> As discussed with [~damianguy]
> streams clientId is set to applicationId-threadId-consumer which of course 
> wouldn’t be unique if running multiple instances per host.
> *Example use case:*
> deployment as docker container on marathon so it might end up multiple times 
> on the same host and of course they're all running with the same 
> configuration. All clients on the same host would have the same clientid
> *Problem*
> Consumers running with the same client id make it hard to identify lagging 
> consumers correctly 
> *Potential solution:*
> Internally we do have a processId. It might make sense if we attached that to 
> the clientId. it would then be unique across instances



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4824) add unique identifier to clientId

2017-03-01 Thread Clemens Valiente (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890387#comment-15890387
 ] 

Clemens Valiente commented on KAFKA-4824:
-

duplicate of KAFKA-4117

> add unique identifier to clientId
> -
>
> Key: KAFKA-4824
> URL: https://issues.apache.org/jira/browse/KAFKA-4824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Affects Versions: 0.10.2.0
>Reporter: Clemens Valiente
>Priority: Minor
>
> As discussed with [~damianguy]
> streams clientId is set to applicationId-threadId-consumer which of course 
> wouldn’t be unique if running multiple instances per host.
> *Example use case:*
> deployment as docker container on marathon so it might end up multiple times 
> on the same host and of course they're all running with the same 
> configuration. All clients on the same host would have the same clientid
> *Problem*
> Consumers running with the same client id make it hard to identify lagging 
> consumers correctly 
> *Potential solution:*
> Internally we do have a processId. It might make sense if we attached that to 
> the clientId. it would then be unique across instances



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4824) add unique identifier to clientId

2017-03-01 Thread Clemens Valiente (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clemens Valiente reassigned KAFKA-4824:
---

Assignee: Clemens Valiente

> add unique identifier to clientId
> -
>
> Key: KAFKA-4824
> URL: https://issues.apache.org/jira/browse/KAFKA-4824
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Affects Versions: 0.10.2.0
>Reporter: Clemens Valiente
>Assignee: Clemens Valiente
>Priority: Minor
>
> As discussed with [~damianguy]
> streams clientId is set to applicationId-threadId-consumer which of course 
> wouldn’t be unique if running multiple instances per host.
> *Example use case:*
> deployment as docker container on marathon so it might end up multiple times 
> on the same host and of course they're all running with the same 
> configuration. All clients on the same host would have the same clientid
> *Problem*
> Consumers running with the same client id make it hard to identify lagging 
> consumers correctly 
> *Potential solution:*
> Internally we do have a processId. It might make sense if we attached that to 
> the clientId. it would then be unique across instances



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4824) add unique identifier to clientId

2017-03-01 Thread Clemens Valiente (JIRA)
Clemens Valiente created KAFKA-4824:
---

 Summary: add unique identifier to clientId
 Key: KAFKA-4824
 URL: https://issues.apache.org/jira/browse/KAFKA-4824
 Project: Kafka
  Issue Type: Improvement
  Components: consumer, streams
Affects Versions: 0.10.2.0
Reporter: Clemens Valiente
Priority: Minor


As discussed with [~damianguy]
streams clientId is set to applicationId-threadId-consumer which of course 
wouldn’t be unique if running multiple instances per host.

*Example use case:*
deployment as docker container on marathon so it might end up multiple times on 
the same host and of course they're all running with the same configuration. 
All clients on the same host would have the same clientid

*Problem*
Consumers running with the same client id make it hard to identify lagging 
consumers correctly 

*Potential solution:*
Internally we do have a processId. It might make sense if we attached that to 
the clientId. it would then be unique across instances



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread Aneesh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890350#comment-15890350
 ] 

Aneesh commented on KAFKA-4823:
---

[~bernie huang] if I use jdk 1.7 to package my jar file, this application wont 
compile it,I think I have done something like this is past.I can give it a try 
now and update.

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-82 - Add Record Headers

2017-03-01 Thread Ismael Juma
Hi Becket,

Thanks for sharing your thoughts. More inline.

On Wed, Mar 1, 2017 at 2:54 AM, Becket Qin  wrote:

> As you can imagine if the ProducerRecord has a value as a List and the
> Interceptor.onSend() can actually add an element to the List. If the
> producer.send() is called on the same ProducerRecord again, the value list
> would have one more element than the previously sent ProducerRecord even
> though the ProducerRecord itself is not mutable, right? Same thing can
> apply to any modifiable class type.
>

The difference is that the user chooses the value type. They are free to
choose a mutable or immutable type. A generic interceptor cannot mutate the
value because it doesn't know the type (and it could be immutable). One
could write an interceptor that checked the type of the value at runtime
and did things based on that. But again, the user who creates the record is
in control.

>From this standpoint allowing headers to be mutable doesn't really weaken
> the mutability we already have. Admittedly a mutable header is kind of
> guiding user towards to change the headers in the existing object instead
> of creating a new one.


Yes, with headers, we are providing a model for the user (the user doesn't
get to choose it like with keys and values) and for the interceptors. So, I
think it's not the same.


> But I think reusing an object while it is possible
> to be modified by user code is a risk that users themselves are willing to
> take. And we do not really protect against that.


If users want to take that risk, it's fine. But we give them the option to
protect themselves. With mutable headers, there is no option.


> But there still seems
> value to allow the users to not pay the overhead of creating tons of
> objects if they do not reuse an object to send it twice, which is probably
> a more common case.
>

I think the assumption that there would be tons of objects created is
incorrect (I suggested a solution that would only involve one additional
reference in the `Header` instance). The usability of the immutable API
seemed to be more of an issue.

In any case, if we do add the `close()` method, then we need to add a note
to the compatibility section stating that once a producer record is sent,
it cannot be sent again as this would cause interceptors that add headers
to fail.

Ismael


[jira] [Commented] (KAFKA-4823) Creating Kafka Producer on application running on Java 1.6

2017-03-01 Thread bernie huang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890242#comment-15890242
 ] 

bernie huang commented on KAFKA-4823:
-

Do you use jdk1.7 package your application jar file,and use this jar run in 
jdk1.6?
If so,just like jdk lower version can't run higher version class,maybe you can 
package your application jar file use jdk 1.6 and test it again.

> Creating Kafka Producer on application running on Java 1.6
> --
>
> Key: KAFKA-4823
> URL: https://issues.apache.org/jira/browse/KAFKA-4823
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Application running on Java 1.6
>Reporter: Aneesh
>
> I have an application running on Java 1.6 which cannot be upgraded.This 
> application need to have interfaces to post (producer )messages to Kafka 
> server remote box.Also receive messages as consumer .The code runs fine from 
> my local env which is on java 1.7.But the same producer and consumer fails 
> when executed within the application with the error
> Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major 
> version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6
> Is there someway I can still do it ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-03-01 Thread Ismael Juma
Hi Dong,

When you say "logStartOffset", do you mean "log_begin_offset "? I could
only find the latter in the KIP. If so, would log_start_offset be a better
name?

Ismael

On Tue, Feb 28, 2017 at 4:26 AM, Dong Lin  wrote:

> Hi Jun and everyone,
>
> I would like to change the KIP in the following way. Currently, if any
> replica if offline, the purge result for a partition will
> be NotEnoughReplicasException and its low_watermark will be 0. The
> motivation for this approach is that we want to guarantee that the data
> before purgedOffset has been deleted on all replicas of this partition if
> purge result indicates success.
>
> But this approach seems too conservative. It should be sufficient in most
> cases to just tell user success and set low_watermark to minimum
> logStartOffset of all live replicas in the PurgeResponse if logStartOffset
> of all live replicas have reached purgedOffset. This is because for an
> offline replicas to become online and be elected leader, it should have
> received one FetchReponse from the current leader which should tell it to
> purge beyond purgedOffset. The benefit of doing this change is that we can
> allow purge operation to succeed when some replica is offline.
>
> Are you OK with this change? If so, I will go ahead to update the KIP and
> implement this behavior.
>
> Thanks,
> Dong
>
>
>
> On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin  wrote:
>
> > Hey Jun,
> >
> > Do you have time to review the KIP again or vote for it?
> >
> > Hey Ewen,
> >
> > Can you also review the KIP again or vote for it? I have discussed with
> > Radai and Becket regarding your concern. We still think putting it in
> Admin
> > Client seems more intuitive because there is use-case where application
> > which manages topic or produces data may also want to purge data. It
> seems
> > weird if they need to create a consumer to do this.
> >
> > Thanks,
> > Dong
> >
> > On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com> wrote:
> >
> >> +1 (non-binding)
> >>
> >> Thanks,
> >>
> >> Mayuresh
> >>
> >> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin  wrote:
> >>
> >> > Sorry for the duplicated email. It seems that gmail will put the
> voting
> >> > email in this thread if I simply replace DISCUSS with VOTE in the
> >> subject.
> >> >
> >> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin 
> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > It seems that there is no further concern with the KIP-107. At this
> >> point
> >> > > we would like to start the voting process. The KIP can be found at
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107
> >> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
> >> > >
> >> > > Thanks,
> >> > > Dong
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -Regards,
> >> Mayuresh R. Gharat
> >> (862) 250-7125
> >>
> >
> >
>


[jira] [Commented] (KAFKA-4631) Refresh consumer metadata more frequently for unknown subscribed topics

2017-03-01 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890234#comment-15890234
 ] 

Rajini Sivaram commented on KAFKA-4631:
---

The attached PR solves the issue in the second failure in KAFKA-4779. The 
system test failure was due to unavailable topics where leader is not known 
rather than nonexistent topics. The PR requests metadata refresh for both 
unavailable and non-existent topics (See 
https://github.com/apache/kafka/pull/2608 for details).

> Refresh consumer metadata more frequently for unknown subscribed topics
> ---
>
> Key: KAFKA-4631
> URL: https://issues.apache.org/jira/browse/KAFKA-4631
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> By default, the consumer refreshes metadata every 5 minutes. In testing, it 
> can often happen that a topic is created at about the same time that the 
> consumer is started. In the worst case, creation finishes after the consumer 
> fetches metadata, and the test must wait 5 minutes for the consumer to 
> refresh metadata in order to discover the topic. To address this problem, 
> users can decrease the metadata refresh interval, but this means more 
> frequent refreshes even after all topics are known. An improvement would be 
> to internally let the consumer fetch metadata more frequently when the 
> consumer encounters unknown topics. Perhaps every 5-10 seconds would be 
> reasonable, for example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-123: Allow per stream/table timestamp extractor

2017-03-01 Thread Michael Noll
+1 (non-binding)

Thanks for the KIP!

On Wed, Mar 1, 2017 at 1:49 PM, Bill Bejeck  wrote:

> +1
>
> Thanks
> Bill
>
> On Wed, Mar 1, 2017 at 5:06 AM, Eno Thereska 
> wrote:
>
> > +1 (non binding).
> >
> > Thanks
> > Eno
> > > On 28 Feb 2017, at 17:22, Matthias J. Sax 
> wrote:
> > >
> > > +1
> > >
> > > Thanks a lot for the KIP!
> > >
> > > -Matthias
> > >
> > >
> > > On 2/28/17 1:35 AM, Damian Guy wrote:
> > >> Thanks for the KIP Jeyhun!
> > >>
> > >> +1
> > >>
> > >> On Tue, 28 Feb 2017 at 08:59 Jeyhun Karimov 
> > wrote:
> > >>
> > >>> Dear community,
> > >>>
> > >>> I'd like to start the vote for KIP-123:
> > >>> https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=68714788
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>> --
> > >>> -Cheers
> > >>>
> > >>> Jeyhun
> > >>>
> > >>
> > >
> >
> >
>


[jira] [Commented] (KAFKA-4631) Refresh consumer metadata more frequently for unknown subscribed topics

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890223#comment-15890223
 ] 

ASF GitHub Bot commented on KAFKA-4631:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/2622

KAFKA-4631: Request metadata in consumer if topic/partitions unavailable

If leader node of one more more partitions in a consumer subscription are 
temporarily unavailable, request metadata refresh so that partitions skipped 
for assignment dont have to wait for metadata expiry before reassignment. 
Metadata refresh is also requested if a subscribe topic or assigned partition 
doesn't exist.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4631

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2622


commit 406427953b2789e5a717136bac6b0b3ada5eb9eb
Author: Rajini Sivaram 
Date:   2017-02-28T12:09:44Z

KAFKA-4631: Request metadata in consumer if topic/partitions unavailable




> Refresh consumer metadata more frequently for unknown subscribed topics
> ---
>
> Key: KAFKA-4631
> URL: https://issues.apache.org/jira/browse/KAFKA-4631
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Jason Gustafson
>Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> By default, the consumer refreshes metadata every 5 minutes. In testing, it 
> can often happen that a topic is created at about the same time that the 
> consumer is started. In the worst case, creation finishes after the consumer 
> fetches metadata, and the test must wait 5 minutes for the consumer to 
> refresh metadata in order to discover the topic. To address this problem, 
> users can decrease the metadata refresh interval, but this means more 
> frequent refreshes even after all topics are known. An improvement would be 
> to internally let the consumer fetch metadata more frequently when the 
> consumer encounters unknown topics. Perhaps every 5-10 seconds would be 
> reasonable, for example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2622: KAFKA-4631: Request metadata in consumer if topic/...

2017-03-01 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/2622

KAFKA-4631: Request metadata in consumer if topic/partitions unavailable

If leader node of one more more partitions in a consumer subscription are 
temporarily unavailable, request metadata refresh so that partitions skipped 
for assignment dont have to wait for metadata expiry before reassignment. 
Metadata refresh is also requested if a subscribe topic or assigned partition 
doesn't exist.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4631

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2622


commit 406427953b2789e5a717136bac6b0b3ada5eb9eb
Author: Rajini Sivaram 
Date:   2017-02-28T12:09:44Z

KAFKA-4631: Request metadata in consumer if topic/partitions unavailable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4779) Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890202#comment-15890202
 ] 

ASF GitHub Bot commented on KAFKA-4779:
---

Github user rajinisivaram closed the pull request at:

https://github.com/apache/kafka/pull/2608


> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> 
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 87, in stop_producer_and_consumer
> self.check_alive()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out: 
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the 
> producer produced ~15k messages, of which ~7k were acked, and the consumer 
> only got around ~2600 before timing out. 
> There are a lot of messages like the following for different request types on 
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in 
> produce request on partition test_topic-0. The topic/partition may not exist 
> or the user may not have Describe access to it 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2608: KAFKA-4779: Request metadata in consumer if partit...

2017-03-01 Thread rajinisivaram
Github user rajinisivaram closed the pull request at:

https://github.com/apache/kafka/pull/2608


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-124: Request rate quotas

2017-03-01 Thread Rajini Sivaram
Colin,

Thank you for the feedback. Since we are reusing the existing
throttle_time_ms field for produce/fetch responses, changing this to
microseconds would be a breaking change. Since we don't currently plan to
throttle at sub-millisecond intervals, perhaps it makes sense to keep the
value consistent with the existing responses (and metrics which expose this
value) and change them all together in future if required?

Regards,

Rajini

On Tue, Feb 28, 2017 at 5:58 PM, Colin McCabe  wrote:

> I noticed that the throttle_time_ms added to all the message responses
> is in milliseconds.  Does it make sense to express this in microseconds
> in case we start doing more fine-grained CPU throttling later on?  An
> int32 should still be more than enough if using microseconds.
>
> best,
> Colin
>
>
> On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote:
> > Hi, Jay,
> >
> > 2. Regarding request.unit vs request.percentage. I started with
> > request.percentage too. The reasoning for request.unit is the following.
> > Suppose that the capacity has been reached on a broker and the admin
> > needs
> > to add a new user. A simple way to increase the capacity is to increase
> > the
> > number of io threads, assuming there are still enough cores. If the limit
> > is based on percentage, the additional capacity automatically gets
> > distributed to existing users and we haven't really carved out any
> > additional resource for the new user. Now, is it easy for a user to
> > reason
> > about 0.1 unit vs 10%. My feeling is that both are hard and have to be
> > configured empirically. Not sure if percentage is obviously easier to
> > reason about.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps  wrote:
> >
> > > A couple of quick points:
> > >
> > > 1. Even though the implementation of this quota is only using io thread
> > > time, i think we should call it something like "request-time". This
> will
> > > give us flexibility to improve the implementation to cover network
> threads
> > > in the future and will avoid exposing internal details like our thread
> > > pools on the server.
> > >
> > > 2. Jun/Roger, I get what you are trying to fix but the idea of
> thread/units
> > > is super unintuitive as a user-facing knob. I had to read the KIP like
> > > eight times to understand this. I'm not sure that your point that
> > > increasing the number of threads is a problem with a percentage-based
> > > value, it really depends on whether the user thinks about the
> "percentage
> > > of request processing time" or "thread units". If they think "I have
> > > allocated 10% of my request processing time to user x" then it is a bug
> > > that increasing the thread count decreases that percent as it does in
> the
> > > current proposal. As a practical matter I think the only way to
> actually
> > > reason about this is as a percent---I just don't believe people are
> going
> > > to think, "ah, 4.3 thread units, that is the right amount!". Instead I
> > > think they have to understand this thread unit concept, figure out what
> > > they have set in number of threads, compute a percent and then come up
> with
> > > the number of thread units, and these will all be wrong if that thread
> > > count changes. I also think this ties us to throttling the I/O thread
> pool,
> > > which may not be where we want to end up.
> > >
> > > 3. For what it's worth I do think having a single throttle_ms field in
> all
> > > the responses that combines all throttling from all quotas is probably
> the
> > > simplest. There could be a use case for having separate fields for
> each,
> > > but I think that is actually harder to use/monitor in the common case
> so
> > > unless someone has a use case I think just one should be fine.
> > >
> > > -Jay
> > >
> > > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > I have updated the KIP based on the discussions so far.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you all for the feedback.
> > > > >
> > > > > Ismael #1. It makes sense not to throttle inter-broker requests
> like
> > > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot
> use
> > > > these
> > > > > requests to bypass quotas for DoS attacks is to ensure that ACLs
> > > prevent
> > > > > clients from using these requests and unauthorized requests are
> > > included
> > > > > towards quotas.
> > > > >
> > > > > Ismael #2, Jay #1 : I was thinking that these quotas can return a
> > > > separate
> > > > > throttle time, and all utilization based quotas could use the same
> > > field
> > > > > (we won't add another one for network thread utilization for
> instance).
> > > > But
> > > > > perhaps it makes sense to keep byte rate quotas separate in
> > > produce/fetch
> > > > > responses to provide separate metrics? Agre

[jira] [Commented] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.

2017-03-01 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890172#comment-15890172
 ] 

Ismael Juma commented on KAFKA-4822:


The data is sent as soon as possible by default (linger.ms=0), so if you wait 
for the Future to complete or use a callback, it should behave the same as 
before (the fact that the send is being done by a background thread is an 
implementation detail.

> Kafka producer implementation without additional threads, similar to sync 
> producer of 0.8.
> --
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.

2017-03-01 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890172#comment-15890172
 ] 

Ismael Juma edited comment on KAFKA-4822 at 3/1/17 1:38 PM:


The data is sent as soon as possible by default (linger.ms=0), so if you wait 
for the Future to complete or use a callback, it should behave the same as 
before (the fact that the send is being done by a background thread is an 
implementation detail).


was (Author: ijuma):
The data is sent as soon as possible by default (linger.ms=0), so if you wait 
for the Future to complete or use a callback, it should behave the same as 
before (the fact that the send is being done by a background thread is an 
implementation detail.

> Kafka producer implementation without additional threads, similar to sync 
> producer of 0.8.
> --
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-01 Thread Florian Hussonnois
Hi Eno,

Yes, but the state() method only returns the global state of the
KafkaStream application (ie: CREATED, RUNNING, REBALANCING,
PENDING_SHUTDOWN, NOT_RUNNING).

An alternative to this KIP would be to change this method to return more
information instead of adding a new method.

2017-03-01 13:46 GMT+01:00 Eno Thereska :

> Thanks Florian,
>
> Have you had a chance to look at the new state methods in 0.10.2, e.g.,
> KafkaStreams.state()?
>
> Thanks
> Eno
> > On 1 Mar 2017, at 11:54, Florian Hussonnois 
> wrote:
> >
> > Hi all,
> >
> > I have just created KIP-130 to add a new method to the KafkaStreams API
> in
> > order to expose the states of threads and active tasks.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> 130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
> >
> >
> > Thanks,
> >
> > --
> > Florian HUSSONNOIS
>
>


-- 
Florian HUSSONNOIS


[jira] [Commented] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.

2017-03-01 Thread Giri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890139#comment-15890139
 ] 

Giri commented on KAFKA-4822:
-

A producer implementation with finer control will be helpful in the above like 
cases.

> Kafka producer implementation without additional threads, similar to sync 
> producer of 0.8.
> --
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.

2017-03-01 Thread Giri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890130#comment-15890130
 ] 

Giri commented on KAFKA-4822:
-

i understand that the sync can be achieved by the above workaround, but still 
another thread will be spawned per producer to send the data in background.

i have a case where a predefined set of threads (40 to 80 depending on the 
machine conf) receive data from a persistent medium and this data is sent to 
kafka (1 producer per thread) and i have to commit the position of the 
persistent medium to protect against restarts. I have achieved this in 0.8 sync 
producer by storing the data from persistent medium in list and after some 
content is cached i sent to kafka as batch and committed the position in the 
persistent medium. But now i do not have explicit control over when the data 
will be sent to kafka as data is completely handled by the new network thread 
and also the batch full and new batch are not visible to user (lost at 
KafkaProducer.doSend()). 

> Kafka producer implementation without additional threads, similar to sync 
> producer of 0.8.
> --
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Jozef.koval
+1 (non-binding)

Jozef

P.S. I volunteer to help with this KIP.


Sent from [ProtonMail](https://protonmail.ch), encrypted email based in 
Switzerland.



 Original Message 
Subject: Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11
Local Time: March 1, 2017 1:46 PM
UTC Time: March 1, 2017 12:46 PM
From: bbej...@gmail.com
To: dev@kafka.apache.org

+1

Thanks,
Bill

On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska  wrote:

> +1, thanks.
>
> Eno
> > On 28 Feb 2017, at 17:17, Guozhang Wang  wrote:
> >
> > +1
> >
> > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint 
> >> wrote:
> >>
> >>> +1
> >>>
> >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> >>>
> 
> 
>  +1.
> 
> 
> 
>  Best,
> 
>  Dongjin
> 
> 
>  --
> 
> 
> 
> 
>  Dongjin Lee
> 
> 
> 
> 
> 
>  Software developer in Line+.
> 
>  So interested in massive-scale machine learning.
> 
> 
> 
>  facebook: www.facebook.com/dongjin.lee.kr (http://www.facebook.com/
>  dongjin.lee.kr)
> 
>  linkedin: kr.linkedin.com/in/dongjinleekr (
> >> http://kr.linkedin.com/in/
>  dongjinleekr)
> 
> 
>  github: (http://goog_969573159/)github.com/dongjinleekr (
>  http://github.com/dongjinleekr)
> 
>  twitter: www.twitter.com/dongjinleekr (http://www.twitter.com/
>  dongjinleekr)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> > On Feb 28, 2017 at 6:40 PM, mailto:ism...@juma.me.uk
> >> )>
>  wrote:
> >
> >
> >
> > Hi everyone,
> >
> > Since the few who responded in the discuss thread were in favour and
>  there
> > were no objections, I would like to initiate the voting process for
> > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> >
> > The vote will run for a minimum of 72 hours.
> >
> > Thanks,
> > Ismael
> >
> 
> >>>
> >>
> >
> >
> >
> > --
> > -- Guozhang
>
>

Re: [VOTE] KIP-123: Allow per stream/table timestamp extractor

2017-03-01 Thread Bill Bejeck
+1

Thanks
Bill

On Wed, Mar 1, 2017 at 5:06 AM, Eno Thereska  wrote:

> +1 (non binding).
>
> Thanks
> Eno
> > On 28 Feb 2017, at 17:22, Matthias J. Sax  wrote:
> >
> > +1
> >
> > Thanks a lot for the KIP!
> >
> > -Matthias
> >
> >
> > On 2/28/17 1:35 AM, Damian Guy wrote:
> >> Thanks for the KIP Jeyhun!
> >>
> >> +1
> >>
> >> On Tue, 28 Feb 2017 at 08:59 Jeyhun Karimov 
> wrote:
> >>
> >>> Dear community,
> >>>
> >>> I'd like to start the vote for KIP-123:
> >>> https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68714788
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>> --
> >>> -Cheers
> >>>
> >>> Jeyhun
> >>>
> >>
> >
>
>


Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-01 Thread Eno Thereska
Thanks Florian,

Have you had a chance to look at the new state methods in 0.10.2, e.g., 
KafkaStreams.state()? 

Thanks
Eno
> On 1 Mar 2017, at 11:54, Florian Hussonnois  wrote:
> 
> Hi all,
> 
> I have just created KIP-130 to add a new method to the KafkaStreams API in
> order to expose the states of threads and active tasks.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
> 
> 
> Thanks,
> 
> -- 
> Florian HUSSONNOIS



Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-01 Thread Bill Bejeck
+1

Thanks,
Bill

On Wed, Mar 1, 2017 at 5:42 AM, Eno Thereska  wrote:

> +1, thanks.
>
> Eno
> > On 28 Feb 2017, at 17:17, Guozhang Wang  wrote:
> >
> > +1
> >
> > On Tue, Feb 28, 2017 at 4:28 AM, Tom Crayford 
> wrote:
> >
> >> +1 (non-binding)
> >>
> >> On Tue, 28 Feb 2017 at 11:42, Molnár Bálint 
> >> wrote:
> >>
> >>> +1
> >>>
> >>> 2017-02-28 12:17 GMT+01:00 Dongjin Lee :
> >>>
> 
> 
>  +1.
> 
> 
> 
>  Best,
> 
>  Dongjin
> 
> 
>  --
> 
> 
> 
> 
>  Dongjin Lee
> 
> 
> 
> 
> 
>  Software developer in Line+.
> 
>  So interested in massive-scale machine learning.
> 
> 
> 
>  facebook:   www.facebook.com/dongjin.lee.kr (http://www.facebook.com/
>  dongjin.lee.kr)
> 
>  linkedin:   kr.linkedin.com/in/dongjinleekr (
> >> http://kr.linkedin.com/in/
>  dongjinleekr)
> 
> 
>  github:   (http://goog_969573159/)github.com/dongjinleekr (
>  http://github.com/dongjinleekr)
> 
>  twitter:   www.twitter.com/dongjinleekr (http://www.twitter.com/
>  dongjinleekr)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> > On Feb 28, 2017 at 6:40 PM,  mailto:ism...@juma.me.uk
> >> )>
>  wrote:
> >
> >
> >
> > Hi everyone,
> >
> > Since the few who responded in the discuss thread were in favour and
>  there
> > were no objections, I would like to initiate the voting process for
> > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> >
> > The vote will run for a minimum of 72 hours.
> >
> > Thanks,
> > Ismael
> >
> 
> >>>
> >>
> >
> >
> >
> > --
> > -- Guozhang
>
>


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-03-01 Thread Tom Crayford
+1 (non-binding)

On Tue, Feb 28, 2017 at 6:56 PM, Apurva Mehta  wrote:

> +1 (non-binding) for 0.11.0
>
> I do agree with Ismael's point that exactly-once should go through one
> release of stabilization before bumping the version to 1.0.
>
> Thanks,
> Apurva
>
> On Mon, Feb 27, 2017 at 7:47 PM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > With 0.10.2.0 out of the way, I would like to volunteer to be the release
> > manager for our next time-based release. See https://cwiki.apache.org/c
> > onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> > communication on time-based releases or need a reminder.
> >
> > I put together a draft release plan with June 2017 as the release month
> (as
> > previously agreed) and a list of KIPs that have already been voted:
> >
> > *https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68716876
> >  action?pageId=68716876
> > >*
> >
> > I haven't set exact dates for the various stages (feature freeze, code
> > freeze, etc.) for now as Ewen is going to send out an email with some
> > suggested tweaks based on his experience as release manager for 0.10.2.0.
> > We can set the exact dates after that discussion.
> >
> > As we are starting the process early this time, we should expect the
> number
> > of KIPs in the plan to grow (so don't worry if your KIP is not there
> yet),
> > but it's good to see that we already have 10 (including 2 merged and 2
> with
> > PR reviews in progress).
> >
> > Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and
> KIP-101
> > (Leader Generation in Replication) require message format changes, which
> > typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> > it makes sense to also include KIP-106 (Unclean leader election should be
> > false by default) and KIP-118 (Drop support for Java 7). We would also
> take
> > the chance to remove deprecated code, in that case.
> >
> > Given the above, how do people feel about 0.11.0.0 as the next Kafka
> > version? Please share your thoughts.
> >
> > Thanks,
> > Ismael
> >
>


[DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

2017-03-01 Thread Florian Hussonnois
Hi all,

I have just created KIP-130 to add a new method to the KafkaStreams API in
order to expose the states of threads and active tasks.

https://cwiki.apache.org/confluence/display/KAFKA/KIP+130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API


Thanks,

-- 
Florian HUSSONNOIS


[jira] [Commented] (KAFKA-4800) Streams State transition ASCII diagrams need fixing and polishing

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889997#comment-15889997
 ] 

ASF GitHub Bot commented on KAFKA-4800:
---

GitHub user cvaliente opened a pull request:

https://github.com/apache/kafka/pull/2621

KAFKA-4800: Streams State transition ASCII diagrams need fixing and 
polishing

added  tags to not break javadoc display of the ASCII diagrams.
see broken ascii here:

https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/KafkaStreams.State.html

fix can be checked with gradle :streams:javadoc and then checking 
streams/build/docs/javadoc/org/apache/kafka/streams/KafkaStreams.State.html

I also fixed the diagram in StreamThread.java however currently no javadoc 
is generated for that one (since it's internal)

@enothereska please have a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cvaliente/kafka KAFKA-4800-ASCII-diagrams

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2621.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2621


commit 88e83d4633b782f8554f5d3b871e703c0a580657
Author: Clemens Valiente 
Date:   2017-03-01T10:47:21Z

KAFKA-4800 fix display and transitions of ASCII diagrams




> Streams State transition ASCII diagrams need fixing and polishing
> -
>
> Key: KAFKA-4800
> URL: https://issues.apache.org/jira/browse/KAFKA-4800
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Clemens Valiente
>Priority: Minor
>  Labels: newbie
> Fix For: 0.10.3.0
>
>
> The ASCII transition diagram in KafkaStreams.java on top of "public enum 
> State" does not read well in Javadoc. Also the self-loops to running and 
> rebalancing are not necessary. Same with the StreamThread.java diagram.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >