kafka0.10 spark2.10

2017-06-21 Thread lk_kafka
hi,all: 
when I run stream application for a few minutes ,I got this error : 

17/06/22 10:34:56 INFO ConsumerCoordinator: Revoking previously assigned 
partitions [comment-0, profile-1, profile-3, cwb-3, bizs-1, cwb-1, 
weibocomment-0, bizs-2, pages-0, bizs-4, pages-2, weibo-0, pages-4, weibo-4, 
clicks-1, comment-1, weibo-2, clicks-3, weibocomment-4, weibocomment-2, 
profile-0, profile-2, profile-4, cwb-4, cwb-2, cwb-0, bizs-0, bizs-3, pages-1, 
weibo-1, pages-3, clicks-2, weibo-3, clicks-4, comment-2, weibocomment-3, 
clicks-0, weibocomment-1] for group youedata1
17/06/22 10:34:56 INFO AbstractCoordinator: (Re-)joining group youedata1
17/06/22 10:34:56 INFO AbstractCoordinator: Successfully joined group youedata1 
with generation 3
17/06/22 10:34:56 INFO ConsumerCoordinator: Setting newly assigned partitions 
[comment-0, profile-1, profile-3, cwb-3, bizs-1, cwb-1, weibocomment-0, bizs-2, 
bizs-4, pages-4, weibo-4, clicks-1, comment-1, clicks-3, weibocomment-4, 
weibocomment-2, profile-0, profile-2, profile-4, cwb-4, cwb-2, cwb-0, bizs-0, 
bizs-3, pages-3, clicks-2, weibo-3, clicks-4, comment-2, weibocomment-3, 
clicks-0, weibocomment-1] for group youedata1
17/06/22 10:34:56 ERROR JobScheduler: Error generating jobs for time 
1498098896000 ms
java.lang.IllegalStateException: No current assignment for partition pages-2
 at 
org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:264)
 at 
org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:336)
 at 
org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1236)
 at 
org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:197)
 at 
org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:214)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
 at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
 at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
 at scala.Option.orElse(Option.scala:289)
 at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
 at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
 at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:117)
 at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
 at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
 at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
 at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
 at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
 at scala.util.Try$.apply(Try.scala:192)
 at 
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
 at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
 at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
 at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


each topic have 5 partition  ,  2 replicas .   I don't know why this error 
happen.


2017-06-22


lk_kafka 

RE: Kafka MirrorMaker - not replicating messages after being brought up

2017-06-21 Thread ext-gfenol...@eramet-sln.nc
Hello,

I have the same problem with Kafka 0.10.1.0, but MirrorMaker is not replicating 
anything, without any error message.
I’ve been scratching my head for a demi-dozen of hours now, and I can’t think 
of what’s going on with my setup, my hundreds of topics keep unmirrored to my 
destination brokers... so maybe it’s related.



-Message d'origine-
De : Richard Shaw [mailto:rich...@aggress.net]
Envoyé : jeudi 22 juin 2017 15:06
À : users@kafka.apache.org
Objet : Re: Kafka MirrorMaker - not replicating messages after being brought up

Karan, have you got auto.offset.reset in your consumer.properties?

https://kafka.apache.org/documentation/#newconsumerconfigs

On Thu, Jun 22, 2017 at 2:00 AM, karan alang  wrote:

> Hi All,
>
> I've 2 Kafka clusters (Kafka 10) & I'm trying to test the MirrorMaker
> functionality.
>
> Here is what i did :
>
> 1) I have identical topics Topic1 on 2 Kafka clusters - Cluster1 &
> Cluster2
> 2) On Cluster1, I publish 100 messages on Topic1
>
> 3) I've 2 consumers reading messages from the 2 topics on Cluster1 &
> Cluster2
>
> 4) I start MirrorMaker AFTER THE MESSAGES HAVE BEEN PUBLISHED ON
> Cluster1
>
> $KAFKA10_HOME1/bin/kafka-run-class.sh kafka.tools.MirrorMaker
> --consumer.config $KAFKA10_HOME1/config/mmConsumer.config
> --num.streams 3 --producer.config
> $KAFKA10_HOME1/config/mmProducer.config
> --whitelist="mmtopic1" --abort.on.send.failure true
>
>
> I expected that Once the MirrorMaker was started, the 100 messages
> published on Cluster1, would be published on Cluster2
>
> However that is not happening...
>
> What needs to be done to enable this ?
>

CONFIDENTIALITE
L'information contenue dans ce courrier électronique et ses pièces jointes est 
confidentielle, et est établie à l'intention exclusive de ses destinataires. 
Dans le cas où ce message ne vous serait pas destiné, nous vous remercions de 
bien vouloir en aviser immédiatement l'émetteur et de procéder à sa 
suppression. Toutes copies, diffusions ou accès non autorisés à ce message sont 
interdits à toutes personnes, autre que le(s) destinataire(s). Un courrier 
électronique est susceptible d’altération ou de falsification et peut entrainer 
des pertes et/ou la destruction de données. Le Groupe ERAMET et/ou ses filiales 
déclinent toute responsabilité en la matière. En conséquence ce courrier 
électronique ainsi que ses pièces jointes sont utilisés à votre propre risque.

CONFIDENTIALITY
The information contained in this e-mail and any accompanying documents is 
confidential or otherwise protected from disclosure. If you are not the 
intended recipient, please immediately alert the sender by reply e-mail and 
delete this message and any attachments. Any copy, dissemination or 
unauthorized access of the contents of this message by anyone other than the 
intended recipient is strictly prohibited. E-mails may be susceptible to 
falsification or alteration and cause data corruption and/or loss of data. 
ERAMET and/or any of its subsidiaries decline any liability resulting from the 
consequences thereof. Therefore, this e-mail and any attachments are used at 
your own risk.


Re: Kafka MirrorMaker - not replicating messages after being brought up

2017-06-21 Thread Richard Shaw
Karan, have you got auto.offset.reset in your consumer.properties?

https://kafka.apache.org/documentation/#newconsumerconfigs

On Thu, Jun 22, 2017 at 2:00 AM, karan alang  wrote:

> Hi All,
>
> I've 2 Kafka clusters (Kafka 10) & I'm trying to test the MirrorMaker
> functionality.
>
> Here is what i did :
>
> 1) I have identical topics Topic1 on 2 Kafka clusters - Cluster1 & Cluster2
> 2) On Cluster1, I publish 100 messages on Topic1
>
> 3) I've 2 consumers reading messages from the 2 topics on Cluster1 &
> Cluster2
>
> 4) I start MirrorMaker AFTER THE MESSAGES HAVE BEEN PUBLISHED ON Cluster1
>
> $KAFKA10_HOME1/bin/kafka-run-class.sh kafka.tools.MirrorMaker
> --consumer.config $KAFKA10_HOME1/config/mmConsumer.config --num.streams 3
> --producer.config $KAFKA10_HOME1/config/mmProducer.config
> --whitelist="mmtopic1" --abort.on.send.failure true
>
>
> I expected that Once the MirrorMaker was started, the 100 messages
> published on Cluster1, would be published on Cluster2
>
> However that is not happening...
>
> What needs to be done to enable this ?
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Guozhang Wang
I have been thinking about reducing all these overloaded functions for
stateful operations (there are some other places that introduces overloaded
functions but let's focus on these only in this discussion), what I used to
have is to use some "materialize" function on the KTables, like:

---

// specifying the topology

KStream stream1 = builder.stream();
KTable table1 = stream1.groupby(...).aggregate(initializer, aggregator,
sessionMerger, sessionWindows);  // do not allow to pass-in a state store
supplier here any more

// additional specs along with the topology above

table1.materialize("queryableStoreName"); // or..
table1.materialize("queryableStoreName").enableCaching().enableLogging();
// or..
table1.materialize(stateStoreSupplier); // add the metrics / logging /
caching / windowing functionalities on top of the store, or..
table1.materialize(stateStoreSupplier).enableCaching().enableLogging(); //
etc..

---

But thinking about it more, I feel Damian's first proposal is better since
my proposal would likely to break the concatenation (e.g. we may not be
able to do sth. like "table1.filter().map().groupBy().aggregate()" if we
want to use different specs for the intermediate filtered KTable).


But since this is a incompatibility change, and we are going to remove the
compatibility annotations soon it means we only have one chance and we
really have to make it right. So I'd call out for anyone try to rewrite
your examples / demo code with the proposed new API and see if it feel
natural, for example, if I want to use a different storage engine than the
default rockDB engine how could I easily specify that with the proposed
APIs?

Meanwhile Damian could you provide a formal set of APIs for people to
exercise on them? Also could you briefly describe how custom storage
engines could be swapped in with the above APIs?



Guozhang


On Wed, Jun 21, 2017 at 9:08 AM, Eno Thereska 
wrote:

> To make it clear, it’s outlined by Damian, I just copy pasted what he told
> me in person :)
>
> Eno
>
> > On Jun 21, 2017, at 4:40 PM, Bill Bejeck  wrote:
> >
> > +1 for the approach outlined above by Eno.
> >
> > On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy 
> wrote:
> >
> >> Thanks Eno.
> >>
> >> Yes i agree. We could apply this same approach to most of the operations
> >> where we have multiple overloads, i.e., we have a single method for each
> >> operation that takes the required parameters and everything else is
> >> specified as you have done above.
> >>
> >> On Wed, 21 Jun 2017 at 16:24 Eno Thereska 
> wrote:
> >>
> >>> (cc’ing user-list too)
> >>>
> >>> Given that we already have StateStoreSuppliers that are configurable
> >> using
> >>> the fluent-like API, probably it’s worth discussing the other examples
> >> with
> >>> joins and serdes first since those have many overloads and are in need
> of
> >>> some TLC.
> >>>
> >>> So following your example, I guess you’d have something like:
> >>> .join()
> >>>   .withKeySerdes(…)
> >>>   .withValueSerdes(…)
> >>>   .withJoinType(“outer”)
> >>>
> >>> etc?
> >>>
> >>> I like the approach since it still remains declarative and it’d reduce
> >> the
> >>> number of overloads by quite a bit.
> >>>
> >>> Eno
> >>>
>  On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> 
>  Hi,
> 
>  I'd like to get a discussion going around some of the API choices
> we've
>  made in the DLS. In particular those that relate to stateful
> operations
>  (though this could expand).
>  As it stands we lean heavily on overloaded methods in the API, i.e,
> >> there
>  are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
> >> i
>  feel it is only going to get worse as we add more optional params. In
>  particular we've had some requests to be able to turn caching off, or
>  change log configs,  on a per operator basis (note this can be done
> now
> >>> if
>  you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> 
>  So this is a bit of an open question. How can we change the DSL
> >> overloads
>  so that it flows, is simple to use and understand, and is easily
> >> extended
>  in the future?
> 
>  One option would be to use a fluent API approach for providing the
> >>> optional
>  params, so something like this:
> 
>  groupedStream.count()
>   .withStoreName("name")
>   .withCachingEnabled(false)
>   .withLoggingEnabled(config)
>   .table()
> 
> 
> 
>  Another option would be to provide a Builder to the count method, so
> it
>  would look something like this:
>  groupedStream.count(new
>  CountBuilder("storeName").withCachingEnabled(false).build())
> 
>  Another option is to say: Hey we don't need this, what are you on
> >> about!
> 
>  The 

Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-21 Thread Guozhang Wang
Thanks for the updated KIP, some more comments:

1.The config name is "default.deserialization.exception.handler" while the
interface class name is "RecordExceptionHandler", which is more general
than the intended purpose. Could we rename the class name accordingly?

2. Could you describe the full implementation of "DefaultExceptionHandler",
currently it is not clear to me how it is implemented with the configured
value.

In addition, I think we do not need to include an additional
"DEFAULT_DESERIALIZATION_EXCEPTION_RESPONSE_CONFIG" as the configure()
function is mainly used for users to pass any customized parameters that is
out of the Streams library; plus adding such additional config sounds
over-complicated for a default exception handler. Instead I'd suggest we
just provide two handlers (or three if people feel strong about the
LogAndThresholdExceptionHandler), one for FailOnExceptionHandler and one
for LogAndContinueOnExceptionHandler. And we can set
LogAndContinueOnExceptionHandler
by default.


Guozhang








On Wed, Jun 21, 2017 at 1:39 AM, Eno Thereska 
wrote:

> Thanks Guozhang,
>
> I’ve updated the KIP and hopefully addressed all the comments so far. In
> the process also changed the name of the KIP to reflect its scope better:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> deserialization+exception+handlers  confluence/display/KAFKA/KIP-161:+streams+deserialization+
> exception+handlers>
>
> Any other feedback appreciated, otherwise I’ll start the vote soon.
>
> Thanks
> Eno
>
> > On Jun 12, 2017, at 6:28 AM, Guozhang Wang  wrote:
> >
> > Eno, Thanks for bringing this proposal up and sorry for getting late on
> > this. Here are my two cents:
> >
> > 1. First some meta comments regarding "fail fast" v.s. "making
> progress". I
> > agree that in general we should better "enforce user to do the right
> thing"
> > in system design, but we also need to keep in mind that Kafka is a
> > multi-tenant system, i.e. from a Streams app's pov you probably would not
> > control the whole streaming processing pipeline end-to-end. E.g. Your
> input
> > data may not be controlled by yourself; it could be written by another
> app,
> > or another team in your company, or even a different organization, and if
> > an error happens maybe you cannot fix "to do the right thing" just by
> > yourself in time. In such an environment I think it is important to leave
> > the door open to let users be more resilient. So I find the current
> > proposal which does leave the door open for either fail-fast or make
> > progress quite reasonable.
> >
> > 2. On the other hand, if the question is whether we should provide a
> > built-in "send to bad queue" handler from the library, I think that might
> > be an overkill: with some tweaks (see my detailed comments below) on the
> > API we can allow users to implement such handlers pretty easily. In
> fact, I
> > feel even "LogAndThresholdExceptionHandler" is not necessary as a
> built-in
> > handler, as it would then require users to specify the threshold via
> > configs, etc. I think letting people provide such "eco-libraries" may be
> > better.
> >
> > 3. Regarding the CRC error: today we validate CRC on both the broker end
> > upon receiving produce requests and on consumer end upon receiving fetch
> > responses; and if the CRC validation fails in the former case it would
> not
> > be appended to the broker logs. So if we do see a CRC failure on the
> > consumer side it has to be that either we have a flipped bit on the
> broker
> > disks or over the wire. For the first case it is fatal while for the
> second
> > it is retriable. Unfortunately we cannot tell which case it is when
> seeing
> > CRC validation failures. But in either case, just skipping and making
> > progress seems not a good choice here, and hence I would personally
> exclude
> > these errors from the general serde errors to NOT leave the door open of
> > making progress.
> >
> > Currently such errors are thrown as KafkaException that wraps an
> > InvalidRecordException, which may be too general and we could consider
> just
> > throwing the InvalidRecordException directly. But that could be an
> > orthogonal discussion if we agrees that CRC failures should not be
> > considered in this KIP.
> >
> > 
> >
> > Now some detailed comments:
> >
> > 4. Could we consider adding the processor context in the handle()
> function
> > as well? This context will be wrapping as the source node that is about
> to
> > process the record. This could expose more info like which task / source
> > node sees this error, which timestamp of the message, etc, and also can
> > allow users to implement their handlers by exposing some metrics, by
> > calling context.forward() to implement the "send to bad queue" behavior
> etc.
> >
> > 5. Could you add the string name of
> > StreamsConfig.DEFAULT_RECORD_EXCEPTION_HANDLER as 

Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Garrett Barton
Getting good concurrency in a webapp is more than doable.  Check out these
benchmarks:
https://www.techempower.com/benchmarks/#section=data-r14=ph=db
I linked to the single query one because thats closest to a single
operation like you will be doing.

I'd also note if the data delivery does not need to be guaranteed you could
go faster switching the web servers over to UDP and using async mode on the
kafka producers.

On Wed, Jun 21, 2017 at 2:23 PM, Tauzell, Dave  wrote:

> I’m not really familiar with Netty so I won’t be of much help.   Maybe try
> posting on a Netty forum to see what they think?
> -Dave
>
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 10:28 AM
> To: Tauzell, Dave
> Cc: users@kafka.apache.org; senthilec...@apache.org; d...@kafka.apache.org
> Subject: Re: Handling 2 to 3 Million Events before Kafka
>
> So netty would work for this case ?  I do have netty server and seems to
> be i'm not getting the expected results .. here is the git
> https://github.com/senthilec566/netty4-server , is this right
> implementation ?
>
> Cheers,
> Senthil
>
> On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
> I see.
>
> 1.   You don’t want the 100k machines sending directly to kafka.
>
> 2.   You can only have a small number of web servers
>
> People certainly have web-servers handling over 100k concurrent
> connections.  See this for some examples:  https://github.com/smallnest/
> C1000K-Servers .
>
> It seems possible with the right sort of kafka producer tuning.
>
> -Dave
>
> From: SenthilKumar K [mailto:senthilec...@gmail.com senthilec...@gmail.com>]
> Sent: Wednesday, June 21, 2017 8:55 AM
> To: Tauzell, Dave
> Cc: users@kafka.apache.org;
> senthilec...@apache.org;
> d...@kafka.apache.org; Senthil kumar
> Subject: Re: Handling 2 to 3 Million Events before Kafka
>
> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
> memory ..
>
> Hi Dave ,  The problem is not with Kafka , it's all about how do you
> handle huge data before kafka.  I did a simple test with 5 node Kafka
> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
> scaling issue ...
>
> All we are trying is before kafka how do we handle messages from different
> servers ...  Webservers can send fast to kafka but still i can handle only
> 50k events per second which is less for my use case.. also i can't deploy
> 20 webservers to handle this load. I'm looking for an option what could be
> the best candidate before kafka , it should be super fast in getting all
> and send it to kafka producer ..
>
>
> --Senthil
>
> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com senthilec...@gmail.com>]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: users@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil
> kumar; d...@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>


Kafka MirrorMaker - not replicating messages after being brought up

2017-06-21 Thread karan alang
Hi All,

I've 2 Kafka clusters (Kafka 10) & I'm trying to test the MirrorMaker
functionality.

Here is what i did :

1) I have identical topics Topic1 on 2 Kafka clusters - Cluster1 & Cluster2
2) On Cluster1, I publish 100 messages on Topic1

3) I've 2 consumers reading messages from the 2 topics on Cluster1 &
Cluster2

4) I start MirrorMaker AFTER THE MESSAGES HAVE BEEN PUBLISHED ON Cluster1

$KAFKA10_HOME1/bin/kafka-run-class.sh kafka.tools.MirrorMaker
--consumer.config $KAFKA10_HOME1/config/mmConsumer.config --num.streams 3
--producer.config $KAFKA10_HOME1/config/mmProducer.config
--whitelist="mmtopic1" --abort.on.send.failure true


I expected that Once the MirrorMaker was started, the 100 messages
published on Cluster1, would be published on Cluster2

However that is not happening...

What needs to be done to enable this ?


Re: Max message size and compression

2017-06-21 Thread mayank rathi
If you are compressing messages than size of "compressed" message should be
less than what's specified in these parameters.

On Sat, Jun 17, 2017 at 7:46 PM, Eli Jordan 
wrote:

> Hi
>
> max.message.bytes controls the maximum message size the kafka server will
> process
>
> message.max.bytes controls the maximum message size the consumer will
> process
>
> max.request.size controls the maximum request size for the producer
>
> Whats not clear to me (and I can't find documented anywhere) is if the
> message size limits are imposed on compressed or uncompressed messages,
> when compression is enabled.
>
> Note: I'm use Kafka 0.10.2.0 if that makes any difference.
>
> Any pointers or advice on this would be greatly appreciated.
>
> Thanks
> Eli




-- 
NOTICE: This email message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.


Re: [DISCUSS] KIP-163: Lower the Minimum Required ACL Permission of OffsetFetch

2017-06-21 Thread Vahid S Hashemian
I appreciate everyone's feedback so far on this KIP.

Before starting a vote, I'd like to also ask for feedback on the 
"Additional Food for Thought" section in the KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-163%3A+Lower+the+Minimum+Required+ACL+Permission+of+OffsetFetch#KIP-163:LowertheMinimumRequiredACLPermissionofOffsetFetch-AdditionalFoodforThought
I just added some more details in that section, which I hope further 
clarifies the suggestion there.

Thanks.
--Vahid



From:   Vahid S Hashemian/Silicon Valley/IBM
To: d...@kafka.apache.org
Cc: "Kafka User" 
Date:   06/08/2017 11:29 AM
Subject:[DISCUSS] KIP-163: Lower the Minimum Required ACL 
Permission of OffsetFetch


Hi all,

I'm resending my earlier note hoping it would spark some conversation this 
time around :)

Thanks.
--Vahid





From:   "Vahid S Hashemian" 
To: dev , "Kafka User" 
Date:   05/30/2017 08:33 AM
Subject:KIP-163: Lower the Minimum Required ACL Permission of 
OffsetFetch



Hi,

I started a new KIP to improve the minimum required ACL permissions of 
some of the APIs: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-163%3A+Lower+the+Minimum+Required+ACL+Permission+of+OffsetFetch

The KIP is to address KAFKA-4585.

Feedback and suggestions are welcome!

Thanks.
--Vahid








Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
That looks better than mine, nice! I think the tooling matters a lot to the
usability of the product we're shipping, being able to test out Kafka's
features on your own hardware/setup is very important to knowing if it can
work.

On Wed, Jun 21, 2017 at 8:01 PM, Apurva Mehta  wrote:

> Hi Tom,
>
> I actually made modifications to the produce performance tool to do real
> transactions earlier this week as part of our benchmarking (results
> published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
> https://github.com/apache/kafka/pull/3400/files
>
> I think my version is more complete since it runs the full gamut of APIs:
> initTransactions, beginTransaction, commitTransaction. Also, it is the
> version used for our published benchmarks.
>
> I am not sure that this tool is a blocker for the release though, since it
> doesn't really affect the usability of the feature any way.
>
> Thanks,
> Apurva
>
> On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford 
> wrote:
>
> > Hi there,
> >
> > I'm -1 (non-binding) on shipping this RC.
> >
> > Heroku has carried on performance testing with 0.11 RC1. We have updated
> > our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> > transactional features enabled, we get slightly better performance than
> > 0.10.2.1 with 10.2.1 client libraries.
> >
> > However, we attempted to run a performance test today with transactions,
> > idempotence and consumer read_committed enabled, but couldn't, because
> > enabling transactions requires the producer to call `initTransactions`
> > before starting to send messages, and the producer performance tool
> doesn't
> > allow for that.
> >
> > I'm -1 (non-binding) on shipping this RC in this state, because users
> > expect to be able to use the inbuilt performance testing tools, and
> > preventing them from testing the impact of the new features using the
> > inbuilt tools isn't great. I made a PR for this:
> > https://github.com/apache/kafka/pull/3398 (the change is very small).
> > Happy
> > to make a jira as well, if that makes sense.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Hi Ismael,
> > >
> > > Thanks for running the release.
> > >
> > > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > > these checkstyle errors:
> > >
> > > :clients:checkstyleMain
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > > [ApiExceptionBuilder, BrokerNotAvailableException,
> > > ClusterAuthorizationException, ConcurrentTransactionsException,
> > > ControllerMovedException, CoordinatorLoadInProgressException,
> > > CoordinatorNotAvailableException, CorruptRecordException,
> > > DuplicateSequenceNumberException, GroupAuthorizationException,
> > > IllegalGenerationException, IllegalSaslStateException,
> > > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > > InvalidConfigurationException, InvalidFetchSizeException,
> > > InvalidGroupIdException, InvalidPartitionsException,
> > > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > > InvalidReplicationFactorException, InvalidRequestException,
> > > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > > InvalidTimestampException, InvalidTopicException,
> > > InvalidTxnStateException, InvalidTxnTimeoutException,
> > > LeaderNotAvailableException, NetworkException, NotControllerException,
> > > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > > NotEnoughReplicasException, NotLeaderForPartitionException,
> > > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > > OperationNotAttemptedException, OutOfOrderSequenceException,
> > > PolicyViolationException, ProducerFencedException,
> > > RebalanceInProgressException, RecordBatchTooLargeException,
> > > RecordTooLargeException, ReplicaNotAvailableException,
> > > SecurityDisabledException, TimeoutException,
> TopicAuthorizationException,
> > > TopicExistsException, TransactionCoordinatorFencedException,
> > > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > > UnknownServerException, UnknownTopicOrPartitionException,
> > > UnsupportedForMessageFormatException, UnsupportedSaslMechanismExcept
> ion,
> > > UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > > Class Fan-Out Complexity is 60 (max allowed is 40).
> > > [ClassFanOutComplexity]
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> > > 

Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Apurva Mehta
Hi Tom,

I actually made modifications to the produce performance tool to do real
transactions earlier this week as part of our benchmarking (results
published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
https://github.com/apache/kafka/pull/3400/files

I think my version is more complete since it runs the full gamut of APIs:
initTransactions, beginTransaction, commitTransaction. Also, it is the
version used for our published benchmarks.

I am not sure that this tool is a blocker for the release though, since it
doesn't really affect the usability of the feature any way.

Thanks,
Apurva

On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford  wrote:

> Hi there,
>
> I'm -1 (non-binding) on shipping this RC.
>
> Heroku has carried on performance testing with 0.11 RC1. We have updated
> our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> transactional features enabled, we get slightly better performance than
> 0.10.2.1 with 10.2.1 client libraries.
>
> However, we attempted to run a performance test today with transactions,
> idempotence and consumer read_committed enabled, but couldn't, because
> enabling transactions requires the producer to call `initTransactions`
> before starting to send messages, and the producer performance tool doesn't
> allow for that.
>
> I'm -1 (non-binding) on shipping this RC in this state, because users
> expect to be able to use the inbuilt performance testing tools, and
> preventing them from testing the impact of the new features using the
> inbuilt tools isn't great. I made a PR for this:
> https://github.com/apache/kafka/pull/3398 (the change is very small).
> Happy
> to make a jira as well, if that makes sense.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Ismael,
> >
> > Thanks for running the release.
> >
> > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > these checkstyle errors:
> >
> > :clients:checkstyleMain
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > [ApiExceptionBuilder, BrokerNotAvailableException,
> > ClusterAuthorizationException, ConcurrentTransactionsException,
> > ControllerMovedException, CoordinatorLoadInProgressException,
> > CoordinatorNotAvailableException, CorruptRecordException,
> > DuplicateSequenceNumberException, GroupAuthorizationException,
> > IllegalGenerationException, IllegalSaslStateException,
> > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > InvalidConfigurationException, InvalidFetchSizeException,
> > InvalidGroupIdException, InvalidPartitionsException,
> > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > InvalidReplicationFactorException, InvalidRequestException,
> > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > InvalidTimestampException, InvalidTopicException,
> > InvalidTxnStateException, InvalidTxnTimeoutException,
> > LeaderNotAvailableException, NetworkException, NotControllerException,
> > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > NotEnoughReplicasException, NotLeaderForPartitionException,
> > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > OperationNotAttemptedException, OutOfOrderSequenceException,
> > PolicyViolationException, ProducerFencedException,
> > RebalanceInProgressException, RecordBatchTooLargeException,
> > RecordTooLargeException, ReplicaNotAvailableException,
> > SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> > TopicExistsException, TransactionCoordinatorFencedException,
> > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > UnknownServerException, UnknownTopicOrPartitionException,
> > UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> > UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Fan-Out Complexity is 60 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> > Class Fan-Out Complexity is 43 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> > Class Fan-Out Complexity is 42 (max allowed is 40).
> > [ClassFanOutComplexity]
> > :clients:checkstyleMain FAILED
> >
> > I wonder if there is an issue with my VM since I don't get similar errors
> > on Ubuntu or Mac.
> >
> > --Vahid
> >
> >
> >
> >
> > From: 

Re: Kafka MirrorMaker - errors/warning

2017-06-21 Thread karan alang
Hi All - here is the update on this.

I was able to fix the following warnings -


1) WARN Property bootstrap.servers is not valid
(kafka.utils.VerifiableProperties)

-> removed bootstrap.servers from mmConsumer.config (IT IS REQUIRED
ONLY IN mmProducer.config)

2) zk.connectiontimeout.ms is not valid
(kafka.utils.VerifiableProperties)

   replaced zk.connectiontimeout.ms with ->
zookeeper.connection.timeout.ms=100 (in mmConsumer.config)

3) #queue.enqueueTimeout.ms=-1
replaced with queue.enqueue.timeout.ms=-1 (in
mmProducer.config)


However, couple are still pending ->

a) producer.type = async was supplied but isn't a known config

*How do i ensure the producer is an "async" producer ?*


b) queue.time = 100 was supplied but isn't a known config


*What needs to be done for this ?*


On Tue, Jun 20, 2017 at 12:40 PM, karan alang  wrote:

> Hi All - i've setup Kafka MirrorMaker using link - Kafka Mirror Maker
> best Practices
> 
> ,
>
> and am getting the following warnings ->
>
> [2017-06-20 12:29:31,381] WARN The configuration queue.time = 100 was
> supplied but isn't a known config. (org.apache.kafka.clients.
> producer.ProducerConfig)
> [2017-06-20 12:29:31,381] WARN The configuration producer.type = async was
> supplied but isn't a known config. (org.apache.kafka.clients.
> producer.ProducerConfig)
> [2017-06-20 12:29:31,381] WARN The configuration queue.enqueueTimeout.ms
> = -1 was supplied but isn't a known config. (org.apache.kafka.clients.
> producer.ProducerConfig)
> [2017-06-20 12:29:31,460] WARN Property bootstrap.servers is not valid
> (kafka.utils.VerifiableProperties)
> [2017-06-20 12:29:31,460] WARN Property zk.connectiontimeout.ms is not
> valid (kafka.utils.VerifiableProperties)
> [2017-06-20 12:29:31,627] WARN Property bootstrap.servers is not valid
> (kafka.utils.VerifiableProperties)
> [2017-06-20 12:29:31,627] WARN Property zk.connectiontimeout.ms is not
> valid (kafka.utils.VerifiableProperties)
> [2017-06-20 12:29:31,635] WARN Property bootstrap.servers is not valid
> (kafka.utils.VerifiableProperties)
> [2017-06-20 12:29:31,636] WARN Property zk.connectiontimeout.ms is not
> valid (kafka.utils.VerifiableProperties)
>
>
>
> *Kafka version - 0.9*
>
> *Producer Config ->*
>
> bootstrap.servers=localhost:7092,localhost:7093,localhost:
> 7094,localhost:7095
> producer.type=async
> queue.time=100
> queue.enqueueTimeout.ms=-1
>
> *Consumer Config :*
>
> zookeeper.connect=localhost:2181
> zk.connectiontimeout.ms=100
> bootstrap.servers=localhost:9092,localhost:9093,localhost:
> 9094,localhost:9095
> consumer.timeout.ms=-1
> #security.protocol=PLAINTEXTSASL
> group.id=kafka-mirror
>
>
> Command used to start Mirror Maker :
>
> $KAFKA_HOME1/bin/kafka-run-class.sh kafka.tools.MirrorMaker
> --consumer.config $KAFKA_HOME1/config/mmConsumer.config --num.streams 3
> --producer.config $KAFKA_HOME1/config/mmProducer.config
> --whitelist="mmtopic" --abort.on.send.failure true
>
>
> Any ideas what needs to be done to remove these warning .. i'm trying to
> test the optimizations specified in the link above.
>
>
>


New Kafka Producer or the Old One ???

2017-06-21 Thread karan alang
Hello All -

I've *Kafka 0.9* & I'm running this command to publish records to Kafka
topics -

$KAFKA_HOME/bin/kafka-verifiable-producer.sh --topic mmtopic1
--max-messages 500 --broker-list
localhost:9092,localhost:9093,localhost:9094,localhost:9095
--producer.config $KAFKA_HOME/config/producer.properties

Does it use the new Kafka Producer or the Old (Simple or High level)
producer ?
Is there a configuration that determines this ?


Also how do i configure this to be Async Producer ?
When i  put the following in the config file - it Does Not recognize it.

   producer.type=async

Thanks for your help in advance !


RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
I’m not really familiar with Netty so I won’t be of much help.   Maybe try 
posting on a Netty forum to see what they think?
-Dave

From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 10:28 AM
To: Tauzell, Dave
Cc: users@kafka.apache.org; senthilec...@apache.org; d...@kafka.apache.org
Subject: Re: Handling 2 to 3 Million Events before Kafka

So netty would work for this case ?  I do have netty server and seems to be i'm 
not getting the expected results .. here is the git 
https://github.com/senthilec566/netty4-server , is this right implementation ?

Cheers,
Senthil

On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave 
> wrote:
I see.

1.   You don’t want the 100k machines sending directly to kafka.

2.   You can only have a small number of web servers

People certainly have web-servers handling over 100k concurrent connections.  
See this for some examples:  https://github.com/smallnest/C1000K-Servers .

It seems possible with the right sort of kafka producer tuning.

-Dave

From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 8:55 AM
To: Tauzell, Dave
Cc: users@kafka.apache.org; 
senthilec...@apache.org; 
d...@kafka.apache.org; Senthil kumar
Subject: Re: Handling 2 to 3 Million Events before Kafka

Thanks Jeyhun. Yes http server would be problematic here w.r.t network , memory 
..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle 
huge data before kafka.  I did a simple test with 5 node Kafka Cluster which 
gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling issue ...

All we are trying is before kafka how do we handle messages from different 
servers ...  Webservers can send fast to kafka but still i can handle only 50k 
events per second which is less for my use case.. also i can't deploy 20 
webservers to handle this load. I'm looking for an option what could be the 
best candidate before kafka , it should be super fast in getting all and send 
it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave 
> wrote:
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: users@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; 
d...@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.




Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
Hi there,

I'm -1 (non-binding) on shipping this RC.

Heroku has carried on performance testing with 0.11 RC1. We have updated
our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
transactional features enabled, we get slightly better performance than
0.10.2.1 with 10.2.1 client libraries.

However, we attempted to run a performance test today with transactions,
idempotence and consumer read_committed enabled, but couldn't, because
enabling transactions requires the producer to call `initTransactions`
before starting to send messages, and the producer performance tool doesn't
allow for that.

I'm -1 (non-binding) on shipping this RC in this state, because users
expect to be able to use the inbuilt performance testing tools, and
preventing them from testing the impact of the new features using the
inbuilt tools isn't great. I made a PR for this:
https://github.com/apache/kafka/pull/3398 (the change is very small). Happy
to make a jira as well, if that makes sense.

Thanks

Tom Crayford
Heroku Kafka

On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi Ismael,
>
> Thanks for running the release.
>
> Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> these checkstyle errors:
>
> :clients:checkstyleMain
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> [ApiExceptionBuilder, BrokerNotAvailableException,
> ClusterAuthorizationException, ConcurrentTransactionsException,
> ControllerMovedException, CoordinatorLoadInProgressException,
> CoordinatorNotAvailableException, CorruptRecordException,
> DuplicateSequenceNumberException, GroupAuthorizationException,
> IllegalGenerationException, IllegalSaslStateException,
> InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> InvalidConfigurationException, InvalidFetchSizeException,
> InvalidGroupIdException, InvalidPartitionsException,
> InvalidPidMappingException, InvalidReplicaAssignmentException,
> InvalidReplicationFactorException, InvalidRequestException,
> InvalidRequiredAcksException, InvalidSessionTimeoutException,
> InvalidTimestampException, InvalidTopicException,
> InvalidTxnStateException, InvalidTxnTimeoutException,
> LeaderNotAvailableException, NetworkException, NotControllerException,
> NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> NotEnoughReplicasException, NotLeaderForPartitionException,
> OffsetMetadataTooLarge, OffsetOutOfRangeException,
> OperationNotAttemptedException, OutOfOrderSequenceException,
> PolicyViolationException, ProducerFencedException,
> RebalanceInProgressException, RecordBatchTooLargeException,
> RecordTooLargeException, ReplicaNotAvailableException,
> SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> TopicExistsException, TransactionCoordinatorFencedException,
> TransactionalIdAuthorizationException, UnknownMemberIdException,
> UnknownServerException, UnknownTopicOrPartitionException,
> UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Fan-Out Complexity is 60 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> Class Fan-Out Complexity is 43 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> Class Fan-Out Complexity is 42 (max allowed is 40).
> [ClassFanOutComplexity]
> :clients:checkstyleMain FAILED
>
> I wonder if there is an issue with my VM since I don't get similar errors
> on Ubuntu or Mac.
>
> --Vahid
>
>
>
>
> From:   Ismael Juma 
> To: d...@kafka.apache.org, Kafka Users ,
> kafka-clients 
> Date:   06/18/2017 03:32 PM
> Subject:[VOTE] 0.11.0.0 RC1
> Sent by:isma...@gmail.com
>
>
>
> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 0.11.0.0.
>
> This is a major version release of Apache Kafka. It includes 32 new KIPs.
> See
> the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
> feature
> highlights:
>
> * Exactly-once delivery and transactional messaging
> * Streams exactly-once semantics
> * Admin client with support for topic, ACLs and config management
> * Record headers
> * Request rate quotas
> * Improved resiliency: replication 

Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska
To make it clear, it’s outlined by Damian, I just copy pasted what he told me 
in person :)

Eno

> On Jun 21, 2017, at 4:40 PM, Bill Bejeck  wrote:
> 
> +1 for the approach outlined above by Eno.
> 
> On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy  wrote:
> 
>> Thanks Eno.
>> 
>> Yes i agree. We could apply this same approach to most of the operations
>> where we have multiple overloads, i.e., we have a single method for each
>> operation that takes the required parameters and everything else is
>> specified as you have done above.
>> 
>> On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:
>> 
>>> (cc’ing user-list too)
>>> 
>>> Given that we already have StateStoreSuppliers that are configurable
>> using
>>> the fluent-like API, probably it’s worth discussing the other examples
>> with
>>> joins and serdes first since those have many overloads and are in need of
>>> some TLC.
>>> 
>>> So following your example, I guess you’d have something like:
>>> .join()
>>>   .withKeySerdes(…)
>>>   .withValueSerdes(…)
>>>   .withJoinType(“outer”)
>>> 
>>> etc?
>>> 
>>> I like the approach since it still remains declarative and it’d reduce
>> the
>>> number of overloads by quite a bit.
>>> 
>>> Eno
>>> 
 On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
 
 Hi,
 
 I'd like to get a discussion going around some of the API choices we've
 made in the DLS. In particular those that relate to stateful operations
 (though this could expand).
 As it stands we lean heavily on overloaded methods in the API, i.e,
>> there
 are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
>> i
 feel it is only going to get worse as we add more optional params. In
 particular we've had some requests to be able to turn caching off, or
 change log configs,  on a per operator basis (note this can be done now
>>> if
 you pass in a StateStoreSupplier, but this can be a bit cumbersome).
 
 So this is a bit of an open question. How can we change the DSL
>> overloads
 so that it flows, is simple to use and understand, and is easily
>> extended
 in the future?
 
 One option would be to use a fluent API approach for providing the
>>> optional
 params, so something like this:
 
 groupedStream.count()
  .withStoreName("name")
  .withCachingEnabled(false)
  .withLoggingEnabled(config)
  .table()
 
 
 
 Another option would be to provide a Builder to the count method, so it
 would look something like this:
 groupedStream.count(new
 CountBuilder("storeName").withCachingEnabled(false).build())
 
 Another option is to say: Hey we don't need this, what are you on
>> about!
 
 The above has focussed on state store related overloads, but the same
>>> ideas
 could  be applied to joins etc, where we presently have many join
>> methods
 and many overloads.
 
 Anyway, i look forward to hearing your opinions.
 
 Thanks,
 Damian
>>> 
>>> 
>> 



Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Bill Bejeck
+1 for the approach outlined above by Eno.

On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy  wrote:

> Thanks Eno.
>
> Yes i agree. We could apply this same approach to most of the operations
> where we have multiple overloads, i.e., we have a single method for each
> operation that takes the required parameters and everything else is
> specified as you have done above.
>
> On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:
>
> > (cc’ing user-list too)
> >
> > Given that we already have StateStoreSuppliers that are configurable
> using
> > the fluent-like API, probably it’s worth discussing the other examples
> with
> > joins and serdes first since those have many overloads and are in need of
> > some TLC.
> >
> > So following your example, I guess you’d have something like:
> > .join()
> >.withKeySerdes(…)
> >.withValueSerdes(…)
> >.withJoinType(“outer”)
> >
> > etc?
> >
> > I like the approach since it still remains declarative and it’d reduce
> the
> > number of overloads by quite a bit.
> >
> > Eno
> >
> > > On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> > >
> > > Hi,
> > >
> > > I'd like to get a discussion going around some of the API choices we've
> > > made in the DLS. In particular those that relate to stateful operations
> > > (though this could expand).
> > > As it stands we lean heavily on overloaded methods in the API, i.e,
> there
> > > are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
> i
> > > feel it is only going to get worse as we add more optional params. In
> > > particular we've had some requests to be able to turn caching off, or
> > > change log configs,  on a per operator basis (note this can be done now
> > if
> > > you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> > >
> > > So this is a bit of an open question. How can we change the DSL
> overloads
> > > so that it flows, is simple to use and understand, and is easily
> extended
> > > in the future?
> > >
> > > One option would be to use a fluent API approach for providing the
> > optional
> > > params, so something like this:
> > >
> > > groupedStream.count()
> > >   .withStoreName("name")
> > >   .withCachingEnabled(false)
> > >   .withLoggingEnabled(config)
> > >   .table()
> > >
> > >
> > >
> > > Another option would be to provide a Builder to the count method, so it
> > > would look something like this:
> > > groupedStream.count(new
> > > CountBuilder("storeName").withCachingEnabled(false).build())
> > >
> > > Another option is to say: Hey we don't need this, what are you on
> about!
> > >
> > > The above has focussed on state store related overloads, but the same
> > ideas
> > > could  be applied to joins etc, where we presently have many join
> methods
> > > and many overloads.
> > >
> > > Anyway, i look forward to hearing your opinions.
> > >
> > > Thanks,
> > > Damian
> >
> >
>


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
So netty would work for this case ?  I do have netty server and seems to be
i'm not getting the expected results .. here is the git
https://github.com/senthilec566/netty4-server , is this right
implementation ?

Cheers,
Senthil

On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave  wrote:

> I see.
>
> 1.   You don’t want the 100k machines sending directly to kafka.
>
> 2.   You can only have a small number of web servers
>
>
>
> People certainly have web-servers handling over 100k concurrent
> connections.  See this for some examples:  https://github.com/smallnest/
> C1000K-Servers .
>
>
>
> It seems possible with the right sort of kafka producer tuning.
>
>
>
> -Dave
>
>
>
> *From:* SenthilKumar K [mailto:senthilec...@gmail.com]
> *Sent:* Wednesday, June 21, 2017 8:55 AM
> *To:* Tauzell, Dave
> *Cc:* users@kafka.apache.org; senthilec...@apache.org;
> d...@kafka.apache.org; Senthil kumar
> *Subject:* Re: Handling 2 to 3 Million Events before Kafka
>
>
>
> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
> memory ..
>
>
>
> Hi Dave ,  The problem is not with Kafka , it's all about how do you
> handle huge data before kafka.  I did a simple test with 5 node Kafka
> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
> scaling issue ...
>
>
>
> All we are trying is before kafka how do we handle messages from different
> servers ...  Webservers can send fast to kafka but still i can handle only
> 50k events per second which is less for my use case.. also i can't deploy
> 20 webservers to handle this load. I'm looking for an option what could be
> the best candidate before kafka , it should be super fast in getting all
> and send it to kafka producer ..
>
>
>
>
>
> --Senthil
>
>
>
> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
>
> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: users@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; d...@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Damian Guy
Thanks Eno.

Yes i agree. We could apply this same approach to most of the operations
where we have multiple overloads, i.e., we have a single method for each
operation that takes the required parameters and everything else is
specified as you have done above.

On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:

> (cc’ing user-list too)
>
> Given that we already have StateStoreSuppliers that are configurable using
> the fluent-like API, probably it’s worth discussing the other examples with
> joins and serdes first since those have many overloads and are in need of
> some TLC.
>
> So following your example, I guess you’d have something like:
> .join()
>.withKeySerdes(…)
>.withValueSerdes(…)
>.withJoinType(“outer”)
>
> etc?
>
> I like the approach since it still remains declarative and it’d reduce the
> number of overloads by quite a bit.
>
> Eno
>
> > On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> >
> > Hi,
> >
> > I'd like to get a discussion going around some of the API choices we've
> > made in the DLS. In particular those that relate to stateful operations
> > (though this could expand).
> > As it stands we lean heavily on overloaded methods in the API, i.e, there
> > are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> > feel it is only going to get worse as we add more optional params. In
> > particular we've had some requests to be able to turn caching off, or
> > change log configs,  on a per operator basis (note this can be done now
> if
> > you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> >
> > So this is a bit of an open question. How can we change the DSL overloads
> > so that it flows, is simple to use and understand, and is easily extended
> > in the future?
> >
> > One option would be to use a fluent API approach for providing the
> optional
> > params, so something like this:
> >
> > groupedStream.count()
> >   .withStoreName("name")
> >   .withCachingEnabled(false)
> >   .withLoggingEnabled(config)
> >   .table()
> >
> >
> >
> > Another option would be to provide a Builder to the count method, so it
> > would look something like this:
> > groupedStream.count(new
> > CountBuilder("storeName").withCachingEnabled(false).build())
> >
> > Another option is to say: Hey we don't need this, what are you on about!
> >
> > The above has focussed on state store related overloads, but the same
> ideas
> > could  be applied to joins etc, where we presently have many join methods
> > and many overloads.
> >
> > Anyway, i look forward to hearing your opinions.
> >
> > Thanks,
> > Damian
>
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska
(cc’ing user-list too)

Given that we already have StateStoreSuppliers that are configurable using the 
fluent-like API, probably it’s worth discussing the other examples with joins 
and serdes first since those have many overloads and are in need of some TLC.

So following your example, I guess you’d have something like:
.join()
   .withKeySerdes(…)
   .withValueSerdes(…)
   .withJoinType(“outer”)

etc?

I like the approach since it still remains declarative and it’d reduce the 
number of overloads by quite a bit.

Eno

> On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> 
> Hi,
> 
> I'd like to get a discussion going around some of the API choices we've
> made in the DLS. In particular those that relate to stateful operations
> (though this could expand).
> As it stands we lean heavily on overloaded methods in the API, i.e, there
> are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> feel it is only going to get worse as we add more optional params. In
> particular we've had some requests to be able to turn caching off, or
> change log configs,  on a per operator basis (note this can be done now if
> you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> 
> So this is a bit of an open question. How can we change the DSL overloads
> so that it flows, is simple to use and understand, and is easily extended
> in the future?
> 
> One option would be to use a fluent API approach for providing the optional
> params, so something like this:
> 
> groupedStream.count()
>   .withStoreName("name")
>   .withCachingEnabled(false)
>   .withLoggingEnabled(config)
>   .table()
> 
> 
> 
> Another option would be to provide a Builder to the count method, so it
> would look something like this:
> groupedStream.count(new
> CountBuilder("storeName").withCachingEnabled(false).build())
> 
> Another option is to say: Hey we don't need this, what are you on about!
> 
> The above has focussed on state store related overloads, but the same ideas
> could  be applied to joins etc, where we presently have many join methods
> and many overloads.
> 
> Anyway, i look forward to hearing your opinions.
> 
> Thanks,
> Damian



RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
I see.

1.   You don’t want the 100k machines sending directly to kafka.

2.   You can only have a small number of web servers

People certainly have web-servers handling over 100k concurrent connections.  
See this for some examples:  https://github.com/smallnest/C1000K-Servers .

It seems possible with the right sort of kafka producer tuning.

-Dave

From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 8:55 AM
To: Tauzell, Dave
Cc: users@kafka.apache.org; senthilec...@apache.org; d...@kafka.apache.org; 
Senthil kumar
Subject: Re: Handling 2 to 3 Million Events before Kafka

Thanks Jeyhun. Yes http server would be problematic here w.r.t network , memory 
..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle 
huge data before kafka.  I did a simple test with 5 node Kafka Cluster which 
gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling issue ...

All we are trying is before kafka how do we handle messages from different 
servers ...  Webservers can send fast to kafka but still i can handle only 50k 
events per second which is less for my use case.. also i can't deploy 20 
webservers to handle this load. I'm looking for an option what could be the 
best candidate before kafka , it should be super fast in getting all and send 
it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave 
> wrote:
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: users@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; 
d...@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.



Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
memory ..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle
huge data before kafka.  I did a simple test with 5 node Kafka Cluster
which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling
issue ...

All we are trying is before kafka how do we handle messages from different
servers ...  Webservers can send fast to kafka but still i can handle only
50k events per second which is less for my use case.. also i can't deploy
20 webservers to handle this load. I'm looking for an option what could be
the best candidate before kafka , it should be super fast in getting all
and send it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave  wrote:

> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: users@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; d...@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: users@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; d...@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Jeyhun Karimov
Hi,

With kafka you can increase overall throughput  by increasing the number of
nodes in a cluster.
I had a similar issue, where we needed to ingest vast amounts of data to
streaming system.
In our case, kafka was a bottleneck, because of disk I/O. To solve it, we
implemented (simple) distributed pub-sub system with C which reside data in
memory. Also you should take account your network bandwidth and the
(upper-bound) capability of your processing engine or http server.


Cheers,
Jeyhun


On Wed, Jun 21, 2017 at 2:58 PM SenthilKumar K 
wrote:

> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
>
-- 
-Cheers

Jeyhun


Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
is really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events
per instance ..  I'm thinking some other solution would be right choice
before Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil


Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-21 Thread Eno Thereska
Thanks Guozhang,

I’ve updated the KIP and hopefully addressed all the comments so far. In the 
process also changed the name of the KIP to reflect its scope better: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
 


Any other feedback appreciated, otherwise I’ll start the vote soon.

Thanks
Eno

> On Jun 12, 2017, at 6:28 AM, Guozhang Wang  wrote:
> 
> Eno, Thanks for bringing this proposal up and sorry for getting late on
> this. Here are my two cents:
> 
> 1. First some meta comments regarding "fail fast" v.s. "making progress". I
> agree that in general we should better "enforce user to do the right thing"
> in system design, but we also need to keep in mind that Kafka is a
> multi-tenant system, i.e. from a Streams app's pov you probably would not
> control the whole streaming processing pipeline end-to-end. E.g. Your input
> data may not be controlled by yourself; it could be written by another app,
> or another team in your company, or even a different organization, and if
> an error happens maybe you cannot fix "to do the right thing" just by
> yourself in time. In such an environment I think it is important to leave
> the door open to let users be more resilient. So I find the current
> proposal which does leave the door open for either fail-fast or make
> progress quite reasonable.
> 
> 2. On the other hand, if the question is whether we should provide a
> built-in "send to bad queue" handler from the library, I think that might
> be an overkill: with some tweaks (see my detailed comments below) on the
> API we can allow users to implement such handlers pretty easily. In fact, I
> feel even "LogAndThresholdExceptionHandler" is not necessary as a built-in
> handler, as it would then require users to specify the threshold via
> configs, etc. I think letting people provide such "eco-libraries" may be
> better.
> 
> 3. Regarding the CRC error: today we validate CRC on both the broker end
> upon receiving produce requests and on consumer end upon receiving fetch
> responses; and if the CRC validation fails in the former case it would not
> be appended to the broker logs. So if we do see a CRC failure on the
> consumer side it has to be that either we have a flipped bit on the broker
> disks or over the wire. For the first case it is fatal while for the second
> it is retriable. Unfortunately we cannot tell which case it is when seeing
> CRC validation failures. But in either case, just skipping and making
> progress seems not a good choice here, and hence I would personally exclude
> these errors from the general serde errors to NOT leave the door open of
> making progress.
> 
> Currently such errors are thrown as KafkaException that wraps an
> InvalidRecordException, which may be too general and we could consider just
> throwing the InvalidRecordException directly. But that could be an
> orthogonal discussion if we agrees that CRC failures should not be
> considered in this KIP.
> 
> 
> 
> Now some detailed comments:
> 
> 4. Could we consider adding the processor context in the handle() function
> as well? This context will be wrapping as the source node that is about to
> process the record. This could expose more info like which task / source
> node sees this error, which timestamp of the message, etc, and also can
> allow users to implement their handlers by exposing some metrics, by
> calling context.forward() to implement the "send to bad queue" behavior etc.
> 
> 5. Could you add the string name of
> StreamsConfig.DEFAULT_RECORD_EXCEPTION_HANDLER as well in the KIP?
> Personally I find "default" prefix a bit misleading since we do not allow
> users to override it per-node yet. But I'm okay either way as I can see we
> may extend it in the future and probably would like to not rename the
> config again. Also from the experience of `default partitioner` and
> `default timestamp extractor` we may also make sure that the passed in
> object can be either a string "class name" or a class object?
> 
> 
> Guozhang
> 
> 
> On Wed, Jun 7, 2017 at 2:16 PM, Jan Filipiak 
> wrote:
> 
>> Hi Eno,
>> 
>> On 07.06.2017 22:49, Eno Thereska wrote:
>> 
>>> Comments inline:
>>> 
>>> On 5 Jun 2017, at 18:19, Jan Filipiak  wrote:
 
 Hi
 
 just my few thoughts
 
 On 05.06.2017 11:44, Eno Thereska wrote:
 
> Hi there,
> 
> Sorry for the late reply, I was out this past week. Looks like good
> progress was made with the discussions either way. Let me recap a couple 
> of
> points I saw into one big reply:
> 
> 1. Jan mentioned CRC errors. I think this is a good point. As these
> happen in Kafka, before Kafka Streams gets a chance to inspect anything,
> I'd like to hear the opinion of more Kafka folks like 

Re: ticketing system Design

2017-06-21 Thread Michal Borowiecki
If your business flow involves human actions, personally I would look at 
a business process engine like the open source camunda.


Even if you don't choose to use it in production, you can use it to 
prototype and evolve your design at the inception stage.


There's a simple to run example that integrates with kafka here:

https://github.com/flowing/flowing-retail

And a tutorial that involves a human action in the flow here (but no kafka):

https://docs.camunda.org/get-started/bpmn20/

(NB. My personal interest in camunda is for integrating it as a process 
manager/saga element in an event-sourced service at some point)


Cheers,
Michał

On 21/06/17 03:25, Tarun Garg wrote:

need some more input on this.


Kafka is a queue it doesn't take any action.


sender(producer) sends data to kafka and consumer pulls data from kafka queue. 
so there is no assignment of data to any consumer.

if a process/human cann't take any action then kafka cann't help in this case.

hope it answers.


From: Abhimanyu Nagrath 
Sent: Monday, June 19, 2017 8:01 PM
To: users@kafka.apache.org
Subject: Re: ticketing system Design

Hi ,

Can anyone suggest me where I can get the answer for these type of
questions?


Regards,
Abhimanyu

On Thu, Jun 8, 2017 at 6:49 PM, Abhimanyu Nagrath <
abhimanyunagr...@gmail.com> wrote:


Hi ,

Is Apache Kafka along with storm can be used to design a ticketing system.
By ticketing system, I mean that there are millions of tasks stored in
Kafka queues and there are processes/humans to take some actions on the
task. there are come constraints that same task should not be assigned to
two processes/humans and if a task flows to a process/human and no action
is performed it should be reassigned.
  I am not sure whether this can be solved using Kafka.Any help is
appreciated



Regards,
Abhimanyu



--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612