Re: Kafka Streaming: RocksDbSessionBytesStoreSupplier seems lost data in Kubernetes

2023-06-29 Thread Matthias J. Sax
The class `RocksDbSessionBytesStoreSupplier` is in package `internal` 
and thus, you should not use it directly. Instead, you should use the 
public factory class `org.apache.kafka.streams.state.Stores`


However, your usage seems correct in general.

Not sure why you pass-in the supplier directly though? In the end, if 
you want to set a name for the store, you can use 
`Materialized.as("..."), and you can set retention time via 
`Materailazed#withRetention(...)` (what would be the proper usage of the 
API).


Besides this, the store should be backed by a changelog topic and thus 
you should never lose any data, independent of you deployment.


Of course, I would recommend to use a stateful set and re-attach storage 
to the pod to avoid re-creating the store from the changelog.


HTH,

-Matthias


On 6/28/23 8:49 AM, An, Hongguo (CORP) wrote:

Hi:
I am using RocksDbSessionBytesStoreSupplier in my kafka streaming application 
for an aggregation like this:


var materialized =

Materialized.>as(

  new 
RocksDbSessionBytesStoreSupplier(env.getProperty("messages.cdc.pft.topic", 
"NASHCM.PAYROLL.PFT.FILENUMBER"),

Duration.parse(env.getProperty("pft.duration", 
"P7D")).toMillis()))

.withKeySerde(stringSerde)

.withValueSerde(listSerde);




stream.windowedBy(SessionWindows

.with(Duration.parse(env.getProperty("pft.gap", "PT0.1S")))

.grace(Duration.parse(env.getProperty("pft.duration", 
"PT0.05S")))

   )

   .aggregate(ArrayList::new,

(k, v, list)->{list.add(v); return list;},

(k, list1, list2)->{list1.addAll(list2); return list1;},

materialized)

.toStream().foreach((key, value) -> {

//sometimes value is null, but this should never happened – and we do see some 
messages not processed.

}



The application runs on Kubernetes, should we not use 
RocksDbSessionBytesStoreSupplier?



Thanks

Andrew



This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Kafka Streaming: RocksDbSessionBytesStoreSupplier seems lost data in Kubernetes

2023-06-28 Thread An, Hongguo (CORP)
Hi:
I am using RocksDbSessionBytesStoreSupplier in my kafka streaming application 
for an aggregation like this:


var materialized =

   Materialized.>as(

 new 
RocksDbSessionBytesStoreSupplier(env.getProperty("messages.cdc.pft.topic", 
"NASHCM.PAYROLL.PFT.FILENUMBER"),

   
Duration.parse(env.getProperty("pft.duration", "P7D")).toMillis()))

   .withKeySerde(stringSerde)

   .withValueSerde(listSerde);




stream.windowedBy(SessionWindows

   .with(Duration.parse(env.getProperty("pft.gap", "PT0.1S")))

   .grace(Duration.parse(env.getProperty("pft.duration", 
"PT0.05S")))

  )

  .aggregate(ArrayList::new,

   (k, v, list)->{list.add(v); return list;},

   (k, list1, list2)->{list1.addAll(list2); return list1;},

   materialized)

.toStream().foreach((key, value) -> {

//sometimes value is null, but this should never happened – and we do see some 
messages not processed.

}



The application runs on Kubernetes, should we not use 
RocksDbSessionBytesStoreSupplier?



Thanks

Andrew



This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Re: Kafka streaming partition assignment

2018-05-13 Thread Liam Clarke
Thank you Matthias - we're using version 1.0. I can tell my team to relax
and look at upgrading :)


On Mon, May 14, 2018 at 3:48 PM, Matthias J. Sax 
wrote:

> It depends on your version. The behavior is known and we put one
> improvement into 1.1 release: https://github.com/apache/kafka/pull/4410
>
> Thus, it's "by design" (for 1.0 and older) but we we want to improve it.
> Cf: https://issues.apache.org/jira/browse/KAFKA-4969
>
> -Matthias
>
> On 5/13/18 7:52 PM, Liam Clarke wrote:
> > Hi all,
> >
> > We are running a KStreaming app with a basic topology of
> >
> > consume from topic A -> transform and write through topic B (making the
> app
> > a consumer of topic B also) ->  finally write to topic C
> >
> > We are running it with two instances of the application. Topic A has 100
> > partitions, topics B and C both have 10 partitions.
> >
> > What we've noticed in the consumer group info that is confusing us is
> that
> > instance 1 of the app will own 45 partitions of topic A and all 10
> > partitions of topic B, whereas the 2nd instance will own 55 partitions of
> > topic A and 0 of topic B.
> >
> > Our expectation is that we'd see an even split of partitions across both
> > instances - is this a realistic expectation? Or is this working as
> intended?
> >
> > Regards,
> >
> > Liam Clarke
> >
>
>


Re: Kafka streaming partition assignment

2018-05-13 Thread Matthias J. Sax
It depends on your version. The behavior is known and we put one
improvement into 1.1 release: https://github.com/apache/kafka/pull/4410

Thus, it's "by design" (for 1.0 and older) but we we want to improve it.
Cf: https://issues.apache.org/jira/browse/KAFKA-4969

-Matthias

On 5/13/18 7:52 PM, Liam Clarke wrote:
> Hi all,
> 
> We are running a KStreaming app with a basic topology of
> 
> consume from topic A -> transform and write through topic B (making the app
> a consumer of topic B also) ->  finally write to topic C
> 
> We are running it with two instances of the application. Topic A has 100
> partitions, topics B and C both have 10 partitions.
> 
> What we've noticed in the consumer group info that is confusing us is that
> instance 1 of the app will own 45 partitions of topic A and all 10
> partitions of topic B, whereas the 2nd instance will own 55 partitions of
> topic A and 0 of topic B.
> 
> Our expectation is that we'd see an even split of partitions across both
> instances - is this a realistic expectation? Or is this working as intended?
> 
> Regards,
> 
> Liam Clarke
> 



signature.asc
Description: OpenPGP digital signature


Kafka streaming partition assignment

2018-05-13 Thread Liam Clarke
Hi all,

We are running a KStreaming app with a basic topology of

consume from topic A -> transform and write through topic B (making the app
a consumer of topic B also) ->  finally write to topic C

We are running it with two instances of the application. Topic A has 100
partitions, topics B and C both have 10 partitions.

What we've noticed in the consumer group info that is confusing us is that
instance 1 of the app will own 45 partitions of topic A and all 10
partitions of topic B, whereas the 2nd instance will own 55 partitions of
topic A and 0 of topic B.

Our expectation is that we'd see an even split of partitions across both
instances - is this a realistic expectation? Or is this working as intended?

Regards,

Liam Clarke


Re: how to enhance Kafka streaming Consumer rate ?

2018-02-14 Thread Matthias J. Sax
Is your network saturated? If yes, you can try to start more instances
of Kafka Streams instead of running with multiple thread within one
instance to increase available network capacity.


-Matthias

On 2/8/18 12:30 AM, ? ? wrote:
> Hi:
> I used kafka streaming for real time analysis.
> and I put stream_thread_num same with partitions of topic
> and set ConsumerConfig.max_poll_records =500
> I use foreach  method only in this
> but find with large records in kafka. the cosumer LAG is big some times and 
> trigger kafka topic rebalance.
> How to how to enhance Kafka streaming Consumer rate ?
> 
> funk...@live.com
> 



signature.asc
Description: OpenPGP digital signature


how to enhance Kafka streaming Consumer rate ?

2018-02-08 Thread ? ?
Hi:
I used kafka streaming for real time analysis.
and I put stream_thread_num same with partitions of topic
and set ConsumerConfig.max_poll_records =500
I use foreach  method only in this
but find with large records in kafka. the cosumer LAG is big some times and 
trigger kafka topic rebalance.
How to how to enhance Kafka streaming Consumer rate ?

funk...@live.com


Re: Kafka Streaming Query

2017-11-06 Thread Dan Markhasin
You can add more instances of your application, to allow processing the
incoming data in parallel.

On 6 November 2017 at 20:11, Ranjit Kumar  wrote:

> Hi,
>
> I am using kafka streaming and state store in my application using java but
> my application logic is taking more time (around 5 ms) to pick up the next
> packet from que, do to that data is pipe lining in topic que(mean waiting
> time is increasing to pick the next packet from que).
>
> Can you please let me know how solve this to improve the performance.
>
> Thanks & Regards,
> Ranjit
>


Kafka Streaming Query

2017-11-06 Thread Ranjit Kumar
Hi,

I am using kafka streaming and state store in my application using java but
my application logic is taking more time (around 5 ms) to pick up the next
packet from que, do to that data is pipe lining in topic que(mean waiting
time is increasing to pick the next packet from que).

Can you please let me know how solve this to improve the performance.

Thanks & Regards,
Ranjit


Kafka streaming from RDBMS to RDBMS

2017-04-18 Thread Venkata B Nagothi
Hi Kafka Community,

I would like to know if KAFKA can stream out the data including large
objects (with images and videos - traditionally known as BLOBs) from one
RDBMS system to other with better performance ?

Is it something KAFKA is good at ?

I understand that KAFKA can do it and would like to know how better is it
compared to traditional ETLs.

I have used ETLs like Talend or similar to migrate Data from one RDBMS to
other, i am looking at an option which is more real-time and faster with
better performance. Any help would be appreciated.

Regards,

Venkata B N
Database Consultant


Re: System went OutOfMemory when running Kafka-Streaming

2017-02-17 Thread Eno Thereska
Hi Deepak,

Could you tell us a bit more on what your app is doing (e.g., aggregates, or 
joins)? Also which version of Kafka are you using?
Have you changed any default configs?

Thanks
Eno


> On 17 Feb 2017, at 05:30, Deepak Pengoria  wrote:
> 
> Hi,
> I ran my application on AWS machine having 32GB ram. This application is
> written in Java using Kafka-Streaming low level api. When I ran it for 72
> hours or more and load (messages) are keep on pumping to kafka-topic then
> entire memory i.e. 32 GB of aws machine gets exhausted. While I ran my java
> application with -xms4gb and -xmx12gb. As per Kafka-Streaming memory
> management,  RocksDB uses off-heap space which might be causing this issue.
> Can you please help me to resolve this problem.
> 
> Regards,
> Deepak



System went OutOfMemory when running Kafka-Streaming

2017-02-17 Thread Deepak Pengoria
Hi,
I ran my application on AWS machine having 32GB ram. This application is
written in Java using Kafka-Streaming low level api. When I ran it for 72
hours or more and load (messages) are keep on pumping to kafka-topic then
entire memory i.e. 32 GB of aws machine gets exhausted. While I ran my java
application with -xms4gb and -xmx12gb. As per Kafka-Streaming memory
management,  RocksDB uses off-heap space which might be causing this issue.
Can you please help me to resolve this problem.

Regards,
Deepak


Re: How to log/analyze the consumer lag in kafka streaming application

2017-02-04 Thread Joris Meijer
Hi Sachin,

If you check kafka-run-class.bat you can see that when environment variable
KAFKA_LOG4J_OPTS is not provided, a default log4j configuration under
"tools" will be loaded. So setting the environment variable to something
like 
"-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties"
should to the trick for you.

Joris

On Sat, Feb 4, 2017 at 11:38 AM Sachin Mittal  wrote:

Hi,
As suggested this is how I am starting my stream

D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug
-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties
TestKafkaWindowStream
log4j: Using URL
[file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic
log4j configuration.

You can see it somehow does not pick my log4j properties but always picks
some tools-log4j.properties.

I have tried many different ways to specify log4j.configurationFile but it
always seem to be picking up a different log4j properties file.

So is it something that we should always configure our log4j under
tools-log4j.properties?

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy  wrote:

> -Dlog4j.configuration=your-log4j.properties
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-02-04 Thread Sachin Mittal
Hi,
As suggested this is how I am starting my stream

D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug
-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties
TestKafkaWindowStream
log4j: Using URL
[file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic
log4j configuration.

You can see it somehow does not pick my log4j properties but always picks
some tools-log4j.properties.

I have tried many different ways to specify log4j.configurationFile but it
always seem to be picking up a different log4j properties file.

So is it something that we should always configure our log4j under
tools-log4j.properties?

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy  wrote:

> -Dlog4j.configuration=your-log4j.properties
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-27 Thread Damian Guy
If you are using jmxterm then you are going to connect to a running jvm and
you don't need to set StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG. You
need to connect jmxterm to the MBean server that will be running in the jvm
of your streams app. You'll need to provide an appropriate jmx port for it
to connect to. So, when running your streams application you should start
it with: -Dcom.sun.management.jmxremote.port=your-port

Thanks

On Fri, 27 Jan 2017 at 10:24 Sachin Mittal  wrote:

> Hi,
> I understood what I need to do.
> I think is not clear though regarding
> StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
>
> Say I decide to use jmxterm which is cli based client which I can easily
> use where my streams app is running.
> With respect to that what value should I assign it to the
> METRICS_REPORTER_CLASSES_CONFIG
> property?
>
> I am following this guide
> https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart
>
> However it is not clear what config I need to provide.
>
> Thanks
> Sachin
>
>
>
> On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy  wrote:
>
> > Hi Sachin,
> >
> > You can configure an implementation of org.apache.kafka.common.Metrics.
> > This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
> >
> > There is a list of jmx reporters here:
> > https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
> > I'm sure their are plenty more available on github. It is also fairly
> > simple to write your own.
> >
> > As for your log4j.properties. You should be able to run with:
> > -Dlog4j.configuration=your-log4j.properties
> >
> > Thanks,
> > Damian
> >
> > On Fri, 27 Jan 2017 at 07:59 Sachin Mittal  wrote:
> >
> > > Hi,
> > > Thanks for sharing the info.
> > >
> > > I am reading this document for more understanding:
> > > http://kafka.apache.org/documentation.html#monitoring
> > >
> > > Is there any special way I need to start my kafka cluster or streams
> > > application (or configure them) to report these metrics.
> > >
> > > I suppose both cluster and streams application report separate
> metrics. I
> > > mean that to collect streams metrics I need to connect to the jmx port
> on
> > > machine where my streams is running right?
> > >
> > > One issue I see is that the machines where both cluster and streams
> > > application are running are not accessible from outside where I can run
> > any
> > > UI based application like jconsole to report on these metrics.
> > >
> > > So what are other possible option. can I log the metrics values to a
> log
> > > file. or if can I enable logging in general. If yes where do I place my
> > > log4j.properties. I tried making it part of the jar which has my main
> > class
> > > but I don't see any logs getting generated.
> > >
> > > Thanks
> > > Sachin
> > >
> > >
> > >
> > > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> > are
> > > > even more detailed).
> > > >
> > > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > > > the same way as for consumer/producer metric that are documented.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > > > Hi All,
> > > > > I am running a kafka streaming application with a simple pipeline
> of:
> > > > > source topic -> group -> aggregate by key -> for each > save to a
> > sink.
> > > > >
> > > > > I source topic gets message at rate of 5000 - 1 messages per
> > > second.
> > > > > During peak load we see the delay reaching to 3 million messages.
> > > > >
> > > > > So I need to figure out where delay might be happening.
> > > > >
> > > > > 1.  Is there any mechanism in kafka streams to log time spent
> within
> > > each
> > > > > pipeline stage.
> > > > >
> > > > > 2.  Also if I want to turn on custom logging to log some times how
> > can
> > > I
> > > > do
> > > > > the same.
> > > > >
> > > > > I have a log4j.properties and I am packaging it inside a jar which
> > has
> > > > the
> > > > > main class.
> > > > > I place that jar in libs folder of kafka installation.
> > > > >
> > > > > However I see no logs generated under logs folder.
> > > > >
> > > > > So where are we suppose to add the log4j.properties.
> > > > >
> > > > > Thanks
> > > > > Sachin
> > > > >
> > > >
> > > >
> > >
> >
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-27 Thread Sachin Mittal
Hi,
I understood what I need to do.
I think is not clear though regarding
StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG

Say I decide to use jmxterm which is cli based client which I can easily
use where my streams app is running.
With respect to that what value should I assign it to the
METRICS_REPORTER_CLASSES_CONFIG
property?

I am following this guide
https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart

However it is not clear what config I need to provide.

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy  wrote:

> Hi Sachin,
>
> You can configure an implementation of org.apache.kafka.common.Metrics.
> This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
>
> There is a list of jmx reporters here:
> https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
> I'm sure their are plenty more available on github. It is also fairly
> simple to write your own.
>
> As for your log4j.properties. You should be able to run with:
> -Dlog4j.configuration=your-log4j.properties
>
> Thanks,
> Damian
>
> On Fri, 27 Jan 2017 at 07:59 Sachin Mittal  wrote:
>
> > Hi,
> > Thanks for sharing the info.
> >
> > I am reading this document for more understanding:
> > http://kafka.apache.org/documentation.html#monitoring
> >
> > Is there any special way I need to start my kafka cluster or streams
> > application (or configure them) to report these metrics.
> >
> > I suppose both cluster and streams application report separate metrics. I
> > mean that to collect streams metrics I need to connect to the jmx port on
> > machine where my streams is running right?
> >
> > One issue I see is that the machines where both cluster and streams
> > application are running are not accessible from outside where I can run
> any
> > UI based application like jconsole to report on these metrics.
> >
> > So what are other possible option. can I log the metrics values to a log
> > file. or if can I enable logging in general. If yes where do I place my
> > log4j.properties. I tried making it part of the jar which has my main
> class
> > but I don't see any logs getting generated.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax 
> > wrote:
> >
> > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> are
> > > even more detailed).
> > >
> > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > > the same way as for consumer/producer metric that are documented.
> > >
> > >
> > > -Matthias
> > >
> > > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > > Hi All,
> > > > I am running a kafka streaming application with a simple pipeline of:
> > > > source topic -> group -> aggregate by key -> for each > save to a
> sink.
> > > >
> > > > I source topic gets message at rate of 5000 - 1 messages per
> > second.
> > > > During peak load we see the delay reaching to 3 million messages.
> > > >
> > > > So I need to figure out where delay might be happening.
> > > >
> > > > 1.  Is there any mechanism in kafka streams to log time spent within
> > each
> > > > pipeline stage.
> > > >
> > > > 2.  Also if I want to turn on custom logging to log some times how
> can
> > I
> > > do
> > > > the same.
> > > >
> > > > I have a log4j.properties and I am packaging it inside a jar which
> has
> > > the
> > > > main class.
> > > > I place that jar in libs folder of kafka installation.
> > > >
> > > > However I see no logs generated under logs folder.
> > > >
> > > > So where are we suppose to add the log4j.properties.
> > > >
> > > > Thanks
> > > > Sachin
> > > >
> > >
> > >
> >
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-27 Thread Damian Guy
Hi Sachin,

You can configure an implementation of org.apache.kafka.common.Metrics.
This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG

There is a list of jmx reporters here:
https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
I'm sure their are plenty more available on github. It is also fairly
simple to write your own.

As for your log4j.properties. You should be able to run with:
-Dlog4j.configuration=your-log4j.properties

Thanks,
Damian

On Fri, 27 Jan 2017 at 07:59 Sachin Mittal  wrote:

> Hi,
> Thanks for sharing the info.
>
> I am reading this document for more understanding:
> http://kafka.apache.org/documentation.html#monitoring
>
> Is there any special way I need to start my kafka cluster or streams
> application (or configure them) to report these metrics.
>
> I suppose both cluster and streams application report separate metrics. I
> mean that to collect streams metrics I need to connect to the jmx port on
> machine where my streams is running right?
>
> One issue I see is that the machines where both cluster and streams
> application are running are not accessible from outside where I can run any
> UI based application like jconsole to report on these metrics.
>
> So what are other possible option. can I log the metrics values to a log
> file. or if can I enable logging in general. If yes where do I place my
> log4j.properties. I tried making it part of the jar which has my main class
> but I don't see any logs getting generated.
>
> Thanks
> Sachin
>
>
>
> On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax 
> wrote:
>
> > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
> > even more detailed).
> >
> > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > the same way as for consumer/producer metric that are documented.
> >
> >
> > -Matthias
> >
> > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > Hi All,
> > > I am running a kafka streaming application with a simple pipeline of:
> > > source topic -> group -> aggregate by key -> for each > save to a sink.
> > >
> > > I source topic gets message at rate of 5000 - 1 messages per
> second.
> > > During peak load we see the delay reaching to 3 million messages.
> > >
> > > So I need to figure out where delay might be happening.
> > >
> > > 1.  Is there any mechanism in kafka streams to log time spent within
> each
> > > pipeline stage.
> > >
> > > 2.  Also if I want to turn on custom logging to log some times how can
> I
> > do
> > > the same.
> > >
> > > I have a log4j.properties and I am packaging it inside a jar which has
> > the
> > > main class.
> > > I place that jar in libs folder of kafka installation.
> > >
> > > However I see no logs generated under logs folder.
> > >
> > > So where are we suppose to add the log4j.properties.
> > >
> > > Thanks
> > > Sachin
> > >
> >
> >
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-26 Thread Sachin Mittal
Hi,
Thanks for sharing the info.

I am reading this document for more understanding:
http://kafka.apache.org/documentation.html#monitoring

Is there any special way I need to start my kafka cluster or streams
application (or configure them) to report these metrics.

I suppose both cluster and streams application report separate metrics. I
mean that to collect streams metrics I need to connect to the jmx port on
machine where my streams is running right?

One issue I see is that the machines where both cluster and streams
application are running are not accessible from outside where I can run any
UI based application like jconsole to report on these metrics.

So what are other possible option. can I log the metrics values to a log
file. or if can I enable logging in general. If yes where do I place my
log4j.properties. I tried making it part of the jar which has my main class
but I don't see any logs getting generated.

Thanks
Sachin



On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax 
wrote:

> You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
> even more detailed).
>
> There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> the same way as for consumer/producer metric that are documented.
>
>
> -Matthias
>
> On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > Hi All,
> > I am running a kafka streaming application with a simple pipeline of:
> > source topic -> group -> aggregate by key -> for each > save to a sink.
> >
> > I source topic gets message at rate of 5000 - 1 messages per second.
> > During peak load we see the delay reaching to 3 million messages.
> >
> > So I need to figure out where delay might be happening.
> >
> > 1.  Is there any mechanism in kafka streams to log time spent within each
> > pipeline stage.
> >
> > 2.  Also if I want to turn on custom logging to log some times how can I
> do
> > the same.
> >
> > I have a log4j.properties and I am packaging it inside a jar which has
> the
> > main class.
> > I place that jar in libs folder of kafka installation.
> >
> > However I see no logs generated under logs folder.
> >
> > So where are we suppose to add the log4j.properties.
> >
> > Thanks
> > Sachin
> >
>
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-26 Thread Matthias J. Sax
You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
even more detailed).

There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
the same way as for consumer/producer metric that are documented.


-Matthias

On 1/24/17 10:38 PM, Sachin Mittal wrote:
> Hi All,
> I am running a kafka streaming application with a simple pipeline of:
> source topic -> group -> aggregate by key -> for each > save to a sink.
> 
> I source topic gets message at rate of 5000 - 1 messages per second.
> During peak load we see the delay reaching to 3 million messages.
> 
> So I need to figure out where delay might be happening.
> 
> 1.  Is there any mechanism in kafka streams to log time spent within each
> pipeline stage.
> 
> 2.  Also if I want to turn on custom logging to log some times how can I do
> the same.
> 
> I have a log4j.properties and I am packaging it inside a jar which has the
> main class.
> I place that jar in libs folder of kafka installation.
> 
> However I see no logs generated under logs folder.
> 
> So where are we suppose to add the log4j.properties.
> 
> Thanks
> Sachin
> 



signature.asc
Description: OpenPGP digital signature


How to log/analyze the consumer lag in kafka streaming application

2017-01-24 Thread Sachin Mittal
Hi All,
I am running a kafka streaming application with a simple pipeline of:
source topic -> group -> aggregate by key -> for each > save to a sink.

I source topic gets message at rate of 5000 - 1 messages per second.
During peak load we see the delay reaching to 3 million messages.

So I need to figure out where delay might be happening.

1.  Is there any mechanism in kafka streams to log time spent within each
pipeline stage.

2.  Also if I want to turn on custom logging to log some times how can I do
the same.

I have a log4j.properties and I am packaging it inside a jar which has the
main class.
I place that jar in libs folder of kafka installation.

However I see no logs generated under logs folder.

So where are we suppose to add the log4j.properties.

Thanks
Sachin


Re: What is difference between Kafka streaming and Samza?

2016-12-26 Thread Ofir Sharony
Here are some differences between the two:

   - KafkaStreams is a library, whereas Samza is a framework, which makes
   the learning curve of KafkaStreams a bit easier.
   - Sources - KafkaStreams works with Kafka alone, while Samza can also be
   configured with Kinesis, ElasticSearch, HDFS and others.
   - Deployment - Samza works closely with Yarn (although not a must),
   whereas KafkaStreams can be run and deployed as a simple Java library,
   where running more instances of it will cause and automatic load balance
   between the processes. Cluster is not required in KafkaStreams.
   - StateManagement - both have local state, In KafkaStreams there are
   common statefull operations (e.g join, aggregation, map) that are made
   simpler, you just call the function and the state is managed behind the
   scenes, needless to be defined explicitly
   - Configuration - In Samza there's a configuration file, whereas in KS
   it's all inside your class.
   - Code unification with batch jobs - Samza code can be written once for
   both ongoing stream processing and batch jobs, by allowing running samza
   jobs on Hadoop cluster
   - Samza supports host-affinity, allocating the same machine (that has
   the local state stored) after a job restarts, preventing startup latency
   loading the state
   - Samza supports Async I/O model - significantly improve the performance
   of jobs bottlenecked on remote I/O.
   - Samza has Rest API to query its processing streams, start & stop jobs
   - Samza is a bit more matured (KafkaStreams is the new kid in the block)


*Ofir Sharony*
BackEnd Tech Lead

Mobile: +972-54-7560277 | ofir.shar...@myheritage.com | www.myheritage.com
MyHeritage Ltd., 3 Ariel Sharon St., Or Yehuda 60250, Israel

<http://www.myheritage.com/>

<https://www.facebook.com/myheritage>
<https://twitter.com/myheritage> <http://blog.myheritage.com/>
<https://www.youtube.com/user/MyHeritageLtd>


On Mon, Dec 26, 2016 at 8:07 AM, 황보동규  wrote:

> Hi there!
>
> I’m newbie on Kafka.
> I have an interest in streaming service, especially Kafka streaming. But I
> have no Idea what’s the difference between Kafka streaming and samza.
> Both has similiar architecture and functionality, I think.
> What’s the main difference? What’s the pros and cons? It’s really helpful
> with your kind explanation. It’s also welcome to give me helpful
> documentation relate of my question.
>
> Thanks,
> Dongkyu


What is difference between Kafka streaming and Samza?

2016-12-25 Thread 황보동규
Hi there! 

I’m newbie on Kafka. 
I have an interest in streaming service, especially Kafka streaming. But I have 
no Idea what’s the difference between Kafka streaming and samza.
Both has similiar architecture and functionality, I think. 
What’s the main difference? What’s the pros and cons? It’s really helpful with 
your kind explanation. It’s also welcome to give me helpful documentation 
relate of my question.

Thanks,
Dongkyu 

Re: Is running kafka streaming application advisable on high latency WAN setup

2016-12-18 Thread Matthias J. Sax
Streams is designed to run in same DC as brokers. So you might want to
move the app, or replicate the topics you are interested in into the
second DC (using mirror maker of Confluent's proprietary replication tool).

Nevertheless, if you increase timeouts etc. you might still be able to
make it work -- but it is not recommended.


-Matthias


On 12/17/16 10:42 PM, Sachin Mittal wrote:
> Hi folks,
> I needed bit of feedback from you based on your experiences using kafka
> streaming application.
> 
> We have a replicated kafka cluster running in a data center in one city.
> We are running a kafka streaming application which reads from a source
> topic from that cluster and commits the output into local database in its
> own data center.
> 
> The distance between these two data center is about 1000 miles, with high
> latency(20 - 70 ms) 100 mbps connection between the two.
> 
> Our source topic receives 10,000 message per second and a message size is
> around 4 KB.
> 
> Since the streaming application receives lot of messages, aggregates them
> and again sends aggregated messages to a changelog topic, and then again
> reads from changelog topic and updates local store. This is a continuous
> process, with changelog topic message size may grow upto 100KB to 750KB.
> 
> So you get an idea that there is lot of network data exchange to and fro
> between 2 data centers.
> 
> In such a scenario is it advisable to run streaming application in a WAN
> kind of setup or it is better to move the streaming application within the
> LAN of kafka cluster.
> 
> We seem to be running into some request timeout issues when running the
> application on a WAN vs LAN and needed to know if network connection
> between the two could be the issue.
> 
> 
> Please let me know your thoughts.
> 
> Thanks
> Sachin
> 



signature.asc
Description: OpenPGP digital signature


Is running kafka streaming application advisable on high latency WAN setup

2016-12-17 Thread Sachin Mittal
Hi folks,
I needed bit of feedback from you based on your experiences using kafka
streaming application.

We have a replicated kafka cluster running in a data center in one city.
We are running a kafka streaming application which reads from a source
topic from that cluster and commits the output into local database in its
own data center.

The distance between these two data center is about 1000 miles, with high
latency(20 - 70 ms) 100 mbps connection between the two.

Our source topic receives 10,000 message per second and a message size is
around 4 KB.

Since the streaming application receives lot of messages, aggregates them
and again sends aggregated messages to a changelog topic, and then again
reads from changelog topic and updates local store. This is a continuous
process, with changelog topic message size may grow upto 100KB to 750KB.

So you get an idea that there is lot of network data exchange to and fro
between 2 data centers.

In such a scenario is it advisable to run streaming application in a WAN
kind of setup or it is better to move the streaming application within the
LAN of kafka cluster.

We seem to be running into some request timeout issues when running the
application on a WAN vs LAN and needed to know if network connection
between the two could be the issue.


Please let me know your thoughts.

Thanks
Sachin


Re: Kafka Streaming

2016-11-29 Thread Guozhang Wang
Hello Mohit,

I'm copy-pasting Mathieu's previous email on making Streams to work on
Windows, note it was for 0.10.0.1 but I think the process should be very
similar.



To any who follow in my footsteps, here is my trail:

   1. Upgrade to at least Kafka Streams 0.10.0.1 (currently only in RC).
  - This is necessary because .1 bumps the rocksdb dependency to 4.8.0,
  where the previous 4.4 dependency did not yet support loading a
Windows JNI
  library.
  2. Follow the instructions here (
   https://github.com/facebook/rocksdb/blob/v4.8/CMakeLists.txt) to build
   rocksdb.
  - That link is for the v4.8 tag; be sure to match this with the
  version of rocksdb that Kafka Streams depends upon in the
future.  0.10.0.1
  -> v4.8, but future releases of Kafka Streams will likely depend on
newer
  versions.
  - Be sure to include the "-DJNI=1" compile option to build the JNI
  wrapper.
  - None of the third-party dependencies (eg. snappy, jemalloc, etc.)
  seem to be required to get something functional, but, it
probably isn't the
  most efficient build.
  - Ensure that "javac" and "javah" are on your path when running
  cmake; if there are any errors in the cmake output your JNI wrapper
  probably won't build.
  - You may or may not need to make minor patches to make the project
  build in Windows.  It appears that the Windows build is often broken;
for
  example, I had to apply this patch:
  https://github.com/facebook/rocksdb/pull/1223/files
  - Phew, that should give you a build\java\Debug\rocksdbjni.dll.  So
  close to the summit... just a horrible hack left...
   3. Copy rocksdbjni.dll to librocksdbjni-win64.dll.
   4. Insert librocksdbjni-win64.dll into your rocksdbjni-4.8.0.jar.
  - Ugh, this is the horrible hack.  rocksdbjni seems to only look
  inside its own jar for its native libraries, so, this is where
it needs to
  be.
  - If you're a gradle user on Windows, you'd find this jar file
  in C:\Users\...your-windows-user...\.gradle\caches\modules-2\
files-2.1\org.rocksdb\rocksdbjni\4.8.0\b543fc4ea5b52ad790730dee376ba0
df06d9f5f7.

And there you go, almost a working Kafka Streams app in Windows.  One other
detail is that the default state storage directory doesn't seem to be
created on demand, so I had to mkdir C:\tmp\kafka-streams myself before my
app would work.



Currently we have not spent too much time running and maintaining Kafka
Streams in Windows yet, but we'd love to hear your experience and if there
is any issues come up, we would like to help debugging and fixing them.


Guozhang



On Mon, Nov 28, 2016 at 5:28 PM, Mohit Anchlia 
wrote:

> I just cloned 3.1x and tried to run a test. I am still seeing rocksdb
> error:
>
> Exception in thread "StreamThread-1" java.lang.UnsatisfiedLinkError:
> C:\Users\manchlia\AppData\Local\Temp\librocksdbjni108789031344273.dll:
> Can't find dependent libraries
>
>
> On Mon, Oct 24, 2016 at 11:26 AM, Matthias J. Sax 
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > It's a client issues... But CP 3.1 should be our in about 2 weeks...
> > Of course, you can use Kafka 0.10.1.0 for now. It was released last
> > week and does contain the fix.
> >
> > - -Matthias
> >
> > On 10/24/16 9:19 AM, Mohit Anchlia wrote:
> > > Would this be an issue if I connect to a remote Kafka instance
> > > running on the Linux box? Or is this a client issue. What's rockdb
> > > used for to keep state?
> > >
> > > On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax
> > >  wrote:
> > >
> > > Kafka 0.10.1.0 which was release last week does contain the fix
> > > already. The fix will be in CP 3.1 coming up soon!
> > >
> > > (sorry that I did mix up versions in a previous email)
> > >
> > > -Matthias
> > >
> > > On 10/23/16 12:10 PM, Mohit Anchlia wrote:
> >  So if I get it right I will not have this fix until 4
> >  months? Should I just create my own example with the next
> >  version of Kafka?
> > 
> >  On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
> >   wrote:
> > 
> >  Current version is 3.0.1 CP 3.1 should be release the next
> >  weeks
> > 
> >  So CP 3.2 should be there is about 4 month (Kafka follows a
> >  time base release cycle of 4 month and CP usually aligns with
> >  Kafka releases)
> > 
> >  -Matthias
> > 
> > 
> >  On 10/20/16 5:10 PM, Mohit Anchlia wrote:
> > >>> Any idea of when 3.2 is coming?
> > >>>
> > >>> On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
> > >>>  wrote:
> > >>>
> > >>> No problem. Asking questions is the purpose of mailing
> > >>> lists. :)
> > >>>
> > >>> The issue will be fixed in next version of examples
> > >>> branch.
> > >>>
> > >>> Examples branch is build with CP dependency and not
> > >>> with Kafka dependency. CP-3.2 is not available yet;
> > >>> only Kafka 0.1

Re: Kafka Streaming

2016-11-28 Thread Mohit Anchlia
I just cloned 3.1x and tried to run a test. I am still seeing rocksdb error:

Exception in thread "StreamThread-1" java.lang.UnsatisfiedLinkError:
C:\Users\manchlia\AppData\Local\Temp\librocksdbjni108789031344273.dll:
Can't find dependent libraries


On Mon, Oct 24, 2016 at 11:26 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> It's a client issues... But CP 3.1 should be our in about 2 weeks...
> Of course, you can use Kafka 0.10.1.0 for now. It was released last
> week and does contain the fix.
>
> - -Matthias
>
> On 10/24/16 9:19 AM, Mohit Anchlia wrote:
> > Would this be an issue if I connect to a remote Kafka instance
> > running on the Linux box? Or is this a client issue. What's rockdb
> > used for to keep state?
> >
> > On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax
> >  wrote:
> >
> > Kafka 0.10.1.0 which was release last week does contain the fix
> > already. The fix will be in CP 3.1 coming up soon!
> >
> > (sorry that I did mix up versions in a previous email)
> >
> > -Matthias
> >
> > On 10/23/16 12:10 PM, Mohit Anchlia wrote:
>  So if I get it right I will not have this fix until 4
>  months? Should I just create my own example with the next
>  version of Kafka?
> 
>  On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
>   wrote:
> 
>  Current version is 3.0.1 CP 3.1 should be release the next
>  weeks
> 
>  So CP 3.2 should be there is about 4 month (Kafka follows a
>  time base release cycle of 4 month and CP usually aligns with
>  Kafka releases)
> 
>  -Matthias
> 
> 
>  On 10/20/16 5:10 PM, Mohit Anchlia wrote:
> >>> Any idea of when 3.2 is coming?
> >>>
> >>> On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
> >>>  wrote:
> >>>
> >>> No problem. Asking questions is the purpose of mailing
> >>> lists. :)
> >>>
> >>> The issue will be fixed in next version of examples
> >>> branch.
> >>>
> >>> Examples branch is build with CP dependency and not
> >>> with Kafka dependency. CP-3.2 is not available yet;
> >>> only Kafka 0.10.1.0. Nevertheless, they should work
> >>> with Kafka dependency, too. I never tried it, but you
> >>> should give it a shot...
> >>>
> >>> But you should use example master branch because of
> >>> API changes from 0.10.0.x to 0.10.1 (and thus, changing
> >>> CP-3.1 to 0.10.1.0 will not be compatible and not
> >>> compile, while changing CP-3.2-SNAPSHOT to 0.10.1.0
> >>> should work -- hopefully ;) )
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
> >> So this issue I am seeing is fixed in the next
> >> version of example branch? Can I change my pom to
> >> point it the higher version of Kafka if that is
> >> the issue? Or do I need to wait until new branch
> >> is made available? Sorry lot of questions :)
> >>
> >> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
> >>  wrote:
> >>
> >> The branch is 0.10.0.1 and not 0.10.1.0 (sorry
> >> for so many zeros and ones -- super easy to mix
> >> up)
> >>
> >> However, examples master branch uses
> >> CP-3.1-SNAPSHOT (ie, Kafka 0.10.1.0) -- there
> >> will be a 0.10.1 examples branch, after CP-3.1
> >> was released
> >>
> >>
> >> -Matthias
> >>
> >> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> > I just now cloned this repo. It seems to be
> > using 10.1
> >
> > https://github.com/confluentinc/examples
> > and running examples in
> > https://github.com/confluentinc/examples/tree/kafka-0.10.0
> .1-
> >
> >
> cp-
> 
> >
> > 3.0
> >>>
> >
>  .1/
> >>
> >
> >>> kafka-streams
> >
> > On Thu, Oct 20, 2016 at 3:10 PM, Michael
> > Noll  wrote:
> >
> >> I suspect you are running Kafka 0.10.0.x
> >> on Windows? If so, this is a known issue
> >> that is fixed in Kafka 0.10.1 that was
> >> just released today.
> >>
> >> Also: which examples are you referring
> >> to? And, to confirm: which git branch /
> >> Kafka version / OS in case my guess above
> >> was wrong.
> >>
> >>
> >> On Thursday, October 20, 2016, Mohit
> >> Anchlia  wrote:
> >>
> >>> I am trying to run the examples from
> >>> git. While running the wordcount
> >>> example I see this error:
> >>>
> >>> Caused by:
> >>> *java.lang.RuntimeException*:
> >>> librocksdbjni-win64.dll was not found
> >>> inside JAR.
> >

Re: Kafka Streaming message loss

2016-11-21 Thread Michael Noll
Also:  Since your testing is purely local, feel free to share the code you
have been using so that we can try to reproduce what you're observing.

-Michael



On Mon, Nov 21, 2016 at 4:04 PM, Michael Noll  wrote:

> Please don't take this comment the wrong way, but have you double-checked
> whether your counting code is working correctly?  (I'm not implying this
> could be the only reason for what you're observing.)
>
> -Michael
>
>
> On Fri, Nov 18, 2016 at 4:52 PM, Eno Thereska 
> wrote:
>
>> Hi Ryan,
>>
>> Perhaps you could share some of your code so we can have a look? One
>> thing I'd check is if you are using compacted Kafka topics. If so, and if
>> you have non-unique keys, compaction happens automatically and you might
>> only see the latest value for a key.
>>
>> Thanks
>> Eno
>> > On 18 Nov 2016, at 13:49, Ryan Slade  wrote:
>> >
>> > Hi
>> >
>> > I'm trialling Kafka Streaming for a large stream processing job, however
>> > I'm seeing message loss even in the simplest scenarios.
>> >
>> > I've tried to boil it down to the simplest scenario where I see loss
>> which
>> > is the following:
>> > 1. Ingest messages from an input stream (String, String)
>> > 2. Decode message into a type from JSON
>> > 3. If succesful, send to a second stream and update an atomic counter.
>> > (String, CustomType)
>> > 4. A foreach on the second stream that updates an AtomicCounter each
>> time a
>> > message arrives.
>> >
>> > I would expect that since we have at least once guarantees that the
>> second
>> > stream would see at least as many messages as were sent to it from the
>> > first, however I consistently see message loss.
>> >
>> > I've tested multiple times sending around 200k messages. I don't see
>> losses
>> > every time, maybe around 1 in 5 runs with the same data. The losses are
>> > small, around 100 messages, but I would expect none.
>> >
>> > I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream
>> Consumer
>> > all running on the same machine in order to mitigate packet loss.
>> >
>> > I'm running Ubuntu 16.04 with OpenJDK.
>> >
>> > Any advice would be greatly appreciated as I can't move forward with
>> Kafka
>> > Streams as a solution if messages are consistently lost between stream
>> on
>> > the same machine.
>> >
>> > Thanks
>> > Ryan
>>
>>
>


Re: Kafka Streaming message loss

2016-11-21 Thread Michael Noll
Please don't take this comment the wrong way, but have you double-checked
whether your counting code is working correctly?  (I'm not implying this
could be the only reason for what you're observing.)

-Michael


On Fri, Nov 18, 2016 at 4:52 PM, Eno Thereska 
wrote:

> Hi Ryan,
>
> Perhaps you could share some of your code so we can have a look? One thing
> I'd check is if you are using compacted Kafka topics. If so, and if you
> have non-unique keys, compaction happens automatically and you might only
> see the latest value for a key.
>
> Thanks
> Eno
> > On 18 Nov 2016, at 13:49, Ryan Slade  wrote:
> >
> > Hi
> >
> > I'm trialling Kafka Streaming for a large stream processing job, however
> > I'm seeing message loss even in the simplest scenarios.
> >
> > I've tried to boil it down to the simplest scenario where I see loss
> which
> > is the following:
> > 1. Ingest messages from an input stream (String, String)
> > 2. Decode message into a type from JSON
> > 3. If succesful, send to a second stream and update an atomic counter.
> > (String, CustomType)
> > 4. A foreach on the second stream that updates an AtomicCounter each
> time a
> > message arrives.
> >
> > I would expect that since we have at least once guarantees that the
> second
> > stream would see at least as many messages as were sent to it from the
> > first, however I consistently see message loss.
> >
> > I've tested multiple times sending around 200k messages. I don't see
> losses
> > every time, maybe around 1 in 5 runs with the same data. The losses are
> > small, around 100 messages, but I would expect none.
> >
> > I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream
> Consumer
> > all running on the same machine in order to mitigate packet loss.
> >
> > I'm running Ubuntu 16.04 with OpenJDK.
> >
> > Any advice would be greatly appreciated as I can't move forward with
> Kafka
> > Streams as a solution if messages are consistently lost between stream on
> > the same machine.
> >
> > Thanks
> > Ryan
>
>


Re: Kafka Streaming message loss

2016-11-18 Thread Eno Thereska
Hi Ryan,

Perhaps you could share some of your code so we can have a look? One thing I'd 
check is if you are using compacted Kafka topics. If so, and if you have 
non-unique keys, compaction happens automatically and you might only see the 
latest value for a key.

Thanks
Eno
> On 18 Nov 2016, at 13:49, Ryan Slade  wrote:
> 
> Hi
> 
> I'm trialling Kafka Streaming for a large stream processing job, however
> I'm seeing message loss even in the simplest scenarios.
> 
> I've tried to boil it down to the simplest scenario where I see loss which
> is the following:
> 1. Ingest messages from an input stream (String, String)
> 2. Decode message into a type from JSON
> 3. If succesful, send to a second stream and update an atomic counter.
> (String, CustomType)
> 4. A foreach on the second stream that updates an AtomicCounter each time a
> message arrives.
> 
> I would expect that since we have at least once guarantees that the second
> stream would see at least as many messages as were sent to it from the
> first, however I consistently see message loss.
> 
> I've tested multiple times sending around 200k messages. I don't see losses
> every time, maybe around 1 in 5 runs with the same data. The losses are
> small, around 100 messages, but I would expect none.
> 
> I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer
> all running on the same machine in order to mitigate packet loss.
> 
> I'm running Ubuntu 16.04 with OpenJDK.
> 
> Any advice would be greatly appreciated as I can't move forward with Kafka
> Streams as a solution if messages are consistently lost between stream on
> the same machine.
> 
> Thanks
> Ryan



Kafka Streaming message loss

2016-11-18 Thread Ryan Slade
Hi

I'm trialling Kafka Streaming for a large stream processing job, however
I'm seeing message loss even in the simplest scenarios.

I've tried to boil it down to the simplest scenario where I see loss which
is the following:
1. Ingest messages from an input stream (String, String)
2. Decode message into a type from JSON
3. If succesful, send to a second stream and update an atomic counter.
(String, CustomType)
4. A foreach on the second stream that updates an AtomicCounter each time a
message arrives.

I would expect that since we have at least once guarantees that the second
stream would see at least as many messages as were sent to it from the
first, however I consistently see message loss.

I've tested multiple times sending around 200k messages. I don't see losses
every time, maybe around 1 in 5 runs with the same data. The losses are
small, around 100 messages, but I would expect none.

I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer
all running on the same machine in order to mitigate packet loss.

I'm running Ubuntu 16.04 with OpenJDK.

Any advice would be greatly appreciated as I can't move forward with Kafka
Streams as a solution if messages are consistently lost between stream on
the same machine.

Thanks
Ryan


Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread saiprasad mishra
Hi Sachin

Here is a possible general approach which might work for you in abscence of
any broadcast variable and everything being local state and also if you can
adjust you process to do everything before doing aggregation.

Basically the idea here is to use a custom topology with custom processor
with a persistent state store  and use it to change the data and send it to
a result topic so that the aggregation becomes easier .You can change the
incoming data based on the old data in this state store for a given key or
process it however way you want to and then forward to a final topic.

Below is the link for a sample custom processor.

https://apache.googlesource.com/kafka/+/0.10.0/streams/
examples/src/main/java/org/apache/kafka/streams/examples/wordcount/
WordCountProcessorDemo.java?autodive=0%2F%2F. Note that in this class it
accesses the store on init whose name should be "oldvalues" as per below
pseudo code.

It could be done something like below (if you use the TopologyBuilder api)

Some pseudo code which you might have already seen in the docs

builder.addProcessor("myprocessor",WordCountProcessorDemo::new,
"sourcetopic");

StateStoreSupplier oldValueStore = Stores.create("oldvalues").withKeys(
stringSerde).withValues(stringSerde) .persistent().build();

builder.addStateStore(oldValueStore,"myprocessor");

builder.addSink(toAggregationTopic)

Now create the final KTable out of this toAggregationTopic which has the
cleaned data.


Hope this helps and ignore this if I have misunderstood your usecase

Regards

Sai








On Thu, Nov 10, 2016 at 8:40 PM, Sachin Mittal  wrote:

> Hi,
> On the subject of deleting values from list I have following toplogy
> aggregate(old list) = new list -> left join (another list) -> output sink.
>
> While processing the output list I know which values in the list are old
> and can be removed.
> Is there any way to pass that information from downstream back to upstream?
> Any thoughts around how can I pass this information.
>
> One thing I can think off is that I can set some global variable in the
> output sink.
>
> So next time aggregate function is run it can lookup the global variable
> and remove items from the list.
> So new list = old list + new value added - old values removed.
>
> In spark we have something like broadcast variables to do the same.
>
> Is there any such similar concept in kafka streaming.
>
> This way we can keep the changelog topic message from growing and prevent
> the max message bytes exception.
>
> Thanks
> Sachin
>
>
>
>
>
> On Fri, Nov 11, 2016 at 1:02 AM, Matthias J. Sax 
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Sachin,
> >
> > my commend about deleting was about deleting values from the list, not
> > about deleting the whole key/value record.
> >
> > If you want to delete a whole key/value record it there is not update
> > for it for some time, you can combine compaction with retention. You
> > need to alter the configuration of the changelog topics and set
> >
> > cleanup.policy=compact,delete
> >
> > Than, retention.ms will be applied to the changelog, too.
> >
> >
> > - -Matthias
> >
> > On 11/10/16 3:17 AM, Sachin Mittal wrote:
> > > Hi, As per Eno suggestion I have pre-created internal changelog
> > > topics with increased max.message.bytes config to handle big
> > > messages that gets incremented over the time.
> > >
> > > As Matthias has pointed that we cannot use retention.ms setting to
> > > delete older message data after a given time, is there a way to
> > > purge older messages from my changelog topic.
> > >
> > > Remember my changelog topic is key=list of objects and this grows
> > > with time.
> > >
> > > So I would like these to be deleted from time to time because I
> > > would have already consumed the objects so that key/value can be
> > > deleted. Later if I get a new object for the same key then that's a
> > > new message and old data has no use for the streaming application.
> > >
> > > So how can I achieve the following. Would retention.bytes help
> > > here?
> > >
> > > Is there a way if i can set expire after or something like that at
> > > message level and some kafka thread would purge those messages.
> > >
> > > Thanks Sachin
> > >
> > >
> > >
> > > On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax
> > >  wrote:
> > >
> > > My two cents:
> > >
> > > Changelog topics are compacted topics, thus they do not have a
> &g

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

There is no build it support for broadcast variables. Loops are also
not supported -- the dataflow graph must be a DAG.

You could use an additional topic, plus extra (user instantiated)
producer and consumer (using manual topic partition assigment!) with
you code to build you own feedback/broadcast loop.


- -Matthias


On 11/10/16 8:40 PM, Sachin Mittal wrote:
> Hi, On the subject of deleting values from list I have following
> toplogy aggregate(old list) = new list -> left join (another list)
> -> output sink.
> 
> While processing the output list I know which values in the list
> are old and can be removed. Is there any way to pass that
> information from downstream back to upstream? Any thoughts around
> how can I pass this information.
> 
> One thing I can think off is that I can set some global variable in
> the output sink.
> 
> So next time aggregate function is run it can lookup the global
> variable and remove items from the list. So new list = old list +
> new value added - old values removed.
> 
> In spark we have something like broadcast variables to do the
> same.
> 
> Is there any such similar concept in kafka streaming.
> 
> This way we can keep the changelog topic message from growing and
> prevent the max message bytes exception.
> 
> Thanks Sachin
> 
> 
> 
> 
> 
> On Fri, Nov 11, 2016 at 1:02 AM, Matthias J. Sax
>  wrote:
> 
> Sachin,
> 
> my commend about deleting was about deleting values from the list,
> not about deleting the whole key/value record.
> 
> If you want to delete a whole key/value record it there is not
> update for it for some time, you can combine compaction with
> retention. You need to alter the configuration of the changelog
> topics and set
> 
> cleanup.policy=compact,delete
> 
> Than, retention.ms will be applied to the changelog, too.
> 
> 
> -Matthias
> 
> On 11/10/16 3:17 AM, Sachin Mittal wrote:
>>>> Hi, As per Eno suggestion I have pre-created internal
>>>> changelog topics with increased max.message.bytes config to
>>>> handle big messages that gets incremented over the time.
>>>> 
>>>> As Matthias has pointed that we cannot use retention.ms
>>>> setting to delete older message data after a given time, is
>>>> there a way to purge older messages from my changelog topic.
>>>> 
>>>> Remember my changelog topic is key=list of objects and this
>>>> grows with time.
>>>> 
>>>> So I would like these to be deleted from time to time because
>>>> I would have already consumed the objects so that key/value
>>>> can be deleted. Later if I get a new object for the same key
>>>> then that's a new message and old data has no use for the
>>>> streaming application.
>>>> 
>>>> So how can I achieve the following. Would retention.bytes
>>>> help here?
>>>> 
>>>> Is there a way if i can set expire after or something like
>>>> that at message level and some kafka thread would purge those
>>>> messages.
>>>> 
>>>> Thanks Sachin
>>>> 
>>>> 
>>>> 
>>>> On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax 
>>>>  wrote:
>>>> 
>>>> My two cents:
>>>> 
>>>> Changelog topics are compacted topics, thus they do not have
>>>> a retention time (there is an exception for windowed KTable
>>>> changlog topics that are compacted and do have a retention
>>>> time though).
>>>> 
>>>> However, I do not understand how changing retention time
>>>> should fix the issue. If your list of values grows and
>>>> exceed max.message.byte you will need to increase this
>>>> parameter (or shrink you value).
>>>> 
>>>> Besides this, Eno's answer is the way to go. In order to
>>>> figure out internal topic names, you can use
>>>> KafkaStreams#toString().
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> 
>>>> On 11/8/16 11:14 AM, Eno Thereska wrote:
>>>>>>> Hi Sachin,
>>>>>>> 
>>>>>>> One option right now would be to precreate all
>>>>>>> internal topics in Kafka, and only after that start the
>>>>>>> Kafka Streams application. This would require you
>>>>>>> knowing the internal name of the topics (in this case
>>>>>>> you probably alre

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread Sachin Mittal
Hi,
On the subject of deleting values from list I have following toplogy
aggregate(old list) = new list -> left join (another list) -> output sink.

While processing the output list I know which values in the list are old
and can be removed.
Is there any way to pass that information from downstream back to upstream?
Any thoughts around how can I pass this information.

One thing I can think off is that I can set some global variable in the
output sink.

So next time aggregate function is run it can lookup the global variable
and remove items from the list.
So new list = old list + new value added - old values removed.

In spark we have something like broadcast variables to do the same.

Is there any such similar concept in kafka streaming.

This way we can keep the changelog topic message from growing and prevent
the max message bytes exception.

Thanks
Sachin





On Fri, Nov 11, 2016 at 1:02 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Sachin,
>
> my commend about deleting was about deleting values from the list, not
> about deleting the whole key/value record.
>
> If you want to delete a whole key/value record it there is not update
> for it for some time, you can combine compaction with retention. You
> need to alter the configuration of the changelog topics and set
>
> cleanup.policy=compact,delete
>
> Than, retention.ms will be applied to the changelog, too.
>
>
> - -Matthias
>
> On 11/10/16 3:17 AM, Sachin Mittal wrote:
> > Hi, As per Eno suggestion I have pre-created internal changelog
> > topics with increased max.message.bytes config to handle big
> > messages that gets incremented over the time.
> >
> > As Matthias has pointed that we cannot use retention.ms setting to
> > delete older message data after a given time, is there a way to
> > purge older messages from my changelog topic.
> >
> > Remember my changelog topic is key=list of objects and this grows
> > with time.
> >
> > So I would like these to be deleted from time to time because I
> > would have already consumed the objects so that key/value can be
> > deleted. Later if I get a new object for the same key then that's a
> > new message and old data has no use for the streaming application.
> >
> > So how can I achieve the following. Would retention.bytes help
> > here?
> >
> > Is there a way if i can set expire after or something like that at
> > message level and some kafka thread would purge those messages.
> >
> > Thanks Sachin
> >
> >
> >
> > On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax
> >  wrote:
> >
> > My two cents:
> >
> > Changelog topics are compacted topics, thus they do not have a
> > retention time (there is an exception for windowed KTable changlog
> > topics that are compacted and do have a retention time though).
> >
> > However, I do not understand how changing retention time should
> > fix the issue. If your list of values grows and exceed
> > max.message.byte you will need to increase this parameter (or
> > shrink you value).
> >
> > Besides this, Eno's answer is the way to go. In order to figure
> > out internal topic names, you can use KafkaStreams#toString().
> >
> >
> > -Matthias
> >
> >
> >
> > On 11/8/16 11:14 AM, Eno Thereska wrote:
> >>>> Hi Sachin,
> >>>>
> >>>> One option right now would be to precreate all internal
> >>>> topics in Kafka, and only after that start the Kafka Streams
> >>>> application. This would require you knowing the internal name
> >>>> of the topics (in this case you probably already know it, but
> >>>> I agree that in general this is a bit cumbersome).
> >>>>
> >>>> Eno
> >>>>
> >>>>> On 8 Nov 2016, at 18:10, Sachin Mittal
> >>>>>  wrote:
> >>>>>
> >>>>> Per message payload size. The basic question is how can I
> >>>>> control the internal change log topics parameters so as to
> >>>>> avoid these errors.
> >>>>>
> >>>>>
> >>>>> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna
> >>>>>  wrote:
> >>>>>
> >>>>>> Are you talking about total messages and therefore size
> >>>>>> or per message payload size.
> >>>>>>
> >>>>>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal
> >>>>>>  wrote:
> >>>>>>
> >>>>

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Sachin,

my commend about deleting was about deleting values from the list, not
about deleting the whole key/value record.

If you want to delete a whole key/value record it there is not update
for it for some time, you can combine compaction with retention. You
need to alter the configuration of the changelog topics and set

cleanup.policy=compact,delete

Than, retention.ms will be applied to the changelog, too.


- -Matthias

On 11/10/16 3:17 AM, Sachin Mittal wrote:
> Hi, As per Eno suggestion I have pre-created internal changelog
> topics with increased max.message.bytes config to handle big
> messages that gets incremented over the time.
> 
> As Matthias has pointed that we cannot use retention.ms setting to
> delete older message data after a given time, is there a way to
> purge older messages from my changelog topic.
> 
> Remember my changelog topic is key=list of objects and this grows
> with time.
> 
> So I would like these to be deleted from time to time because I
> would have already consumed the objects so that key/value can be
> deleted. Later if I get a new object for the same key then that's a
> new message and old data has no use for the streaming application.
> 
> So how can I achieve the following. Would retention.bytes help
> here?
> 
> Is there a way if i can set expire after or something like that at
> message level and some kafka thread would purge those messages.
> 
> Thanks Sachin
> 
> 
> 
> On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax
>  wrote:
> 
> My two cents:
> 
> Changelog topics are compacted topics, thus they do not have a 
> retention time (there is an exception for windowed KTable changlog 
> topics that are compacted and do have a retention time though).
> 
> However, I do not understand how changing retention time should
> fix the issue. If your list of values grows and exceed
> max.message.byte you will need to increase this parameter (or
> shrink you value).
> 
> Besides this, Eno's answer is the way to go. In order to figure
> out internal topic names, you can use KafkaStreams#toString().
> 
> 
> -Matthias
> 
> 
> 
> On 11/8/16 11:14 AM, Eno Thereska wrote:
 Hi Sachin,
 
 One option right now would be to precreate all internal
 topics in Kafka, and only after that start the Kafka Streams
 application. This would require you knowing the internal name
 of the topics (in this case you probably already know it, but
 I agree that in general this is a bit cumbersome).
 
 Eno
 
> On 8 Nov 2016, at 18:10, Sachin Mittal
>  wrote:
> 
> Per message payload size. The basic question is how can I
> control the internal change log topics parameters so as to
> avoid these errors.
> 
> 
> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna
>  wrote:
> 
>> Are you talking about total messages and therefore size
>> or per message payload size.
>> 
>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal 
>>  wrote:
>> 
>>> Message size itself increases over the time.
>>> 
>>> Message is something like key=[list on objects]
>>> 
>>> This increases with time and then at a point kafka is
>>> not able to add any message to its topic because
>>> message size is greater than max.message.bytes. Since
>>> this is an internal topic based off a table I don't
>>> know how can I control this topic.
>>> 
>>> If I can set some retention.ms for this topic then I
>>> can purge old messages thereby ensuring that message
>>> size stays within limit.
>>> 
>>> Thanks Sachin
>>> 
>>> 
>>> 
>>> On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska 
>>>  wrote:
>>> 
 Hi Sachin,
 
 Could you clarify what you mean by "message size 
 increases"? Are
>> messages
 going to the changelog topic increasing in size? Or
 is the changelog
>>> topic
 getting full?
 
 Thanks Eno
 
> On 8 Nov 2016, at 16:49, Sachin Mittal 
>  wrote:
> 
> Hi, We are using aggregation by key on a kstream
> to create a ktable. As I read from 
> https://cwiki.apache.org/confluence/display/KAFKA/
 Kafka+Streams%3A+Internal+Data+Management
> it creates an internal changelog topic.
> 
> However over the time the streaming application is
> run message size increases and it starts throwing 
> max.message.bytes exception.
> 
> Is there a way to control the retention.ms time
> for internal
>> changelog
> topics so that messages are purged before they
> exceed this size.
> 
> If not is there a way to control or avoid such an
> error.
> 
> Thanks Sachin
 
 
>>> 
>> 
>> 
>> 
>> -- Radha Krishna, Proddaturi 253-234-5657
>> 
 
>> 
>

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread Damian Guy
Hi Sachin,

You can achieve what you want by setting the correct cleanup.policy on
these topics.
In this case you want cleanup.policy=compact,delete - you'll also want to
set retention.ms and/or retention.bytes.

The topic will then be compacted, but it will also delete any segments
based on the retention settings.

You can look here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist
for
further details.

HTH,
Damian

On Thu, 10 Nov 2016 at 11:17 Sachin Mittal  wrote:

> Hi,
> As per Eno suggestion I have pre-created internal changelog topics with
> increased max.message.bytes config to handle big messages that gets
> incremented over the time.
>
> As Matthias has pointed that we cannot use retention.ms setting to delete
> older message data after a given time, is there a way to purge older
> messages from my changelog topic.
>
> Remember my changelog topic is
> key=list of objects
> and this grows with time.
>
> So I would like these to be deleted from time to time because I would have
> already consumed the objects so that key/value can be deleted.
> Later if I get a new object for the same key then that's a new message and
> old data has no use for the streaming application.
>
> So how can I achieve the following.
> Would retention.bytes help here?
>
> Is there a way if i can set expire after or something like that at message
> level and some kafka thread would purge those messages.
>
> Thanks
> Sachin
>
>
>
> On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax 
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > My two cents:
> >
> > Changelog topics are compacted topics, thus they do not have a
> > retention time (there is an exception for windowed KTable changlog
> > topics that are compacted and do have a retention time though).
> >
> > However, I do not understand how changing retention time should fix
> > the issue. If your list of values grows and exceed max.message.byte
> > you will need to increase this parameter (or shrink you value).
> >
> > Besides this, Eno's answer is the way to go. In order to figure out
> > internal topic names, you can use KafkaStreams#toString().
> >
> >
> > - -Matthias
> >
> >
> >
> > On 11/8/16 11:14 AM, Eno Thereska wrote:
> > > Hi Sachin,
> > >
> > > One option right now would be to precreate all internal topics in
> > > Kafka, and only after that start the Kafka Streams application.
> > > This would require you knowing the internal name of the topics (in
> > > this case you probably already know it, but I agree that in general
> > > this is a bit cumbersome).
> > >
> > > Eno
> > >
> > >> On 8 Nov 2016, at 18:10, Sachin Mittal 
> > >> wrote:
> > >>
> > >> Per message payload size. The basic question is how can I control
> > >> the internal change log topics parameters so as to avoid these
> > >> errors.
> > >>
> > >>
> > >> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna 
> > >> wrote:
> > >>
> > >>> Are you talking about total messages and therefore size or per
> > >>> message payload size.
> > >>>
> > >>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal
> > >>>  wrote:
> > >>>
> >  Message size itself increases over the time.
> > 
> >  Message is something like key=[list on objects]
> > 
> >  This increases with time and then at a point kafka is not
> >  able to add any message to its topic because message size is
> >  greater than max.message.bytes. Since this is an internal
> >  topic based off a table I don't know how can I control this
> >  topic.
> > 
> >  If I can set some retention.ms for this topic then I can
> >  purge old messages thereby ensuring that message size stays
> >  within limit.
> > 
> >  Thanks Sachin
> > 
> > 
> > 
> >  On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska
> >   wrote:
> > 
> > > Hi Sachin,
> > >
> > > Could you clarify what you mean by "message size
> > > increases"? Are
> > >>> messages
> > > going to the changelog topic increasing in size? Or is the
> > > changelog
> >  topic
> > > getting full?
> > >
> > > Thanks Eno
> > >
> > >> On 8 Nov 2016, at 16:49, Sachin Mittal
> > >>  wrote:
> > >>
> > >> Hi, We are using aggregation by key on a kstream to
> > >> create a ktable. As I read from
> > >> https://cwiki.apache.org/confluence/display/KAFKA/
> > > Kafka+Streams%3A+Internal+Data+Management
> > >> it creates an internal changelog topic.
> > >>
> > >> However over the time the streaming application is run
> > >> message size increases and it starts throwing
> > >> max.message.bytes exception.
> > >>
> > >> Is there a way to control the retention.ms time for
> > >> internal
> > >>> changelog
> > >> topics so that messages are purged before they exceed
> > >> this size.
> > >>
> > >> If not is there a way to control or avoid such an error.
> > >>
> > >> Th

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-10 Thread Sachin Mittal
Hi,
As per Eno suggestion I have pre-created internal changelog topics with
increased max.message.bytes config to handle big messages that gets
incremented over the time.

As Matthias has pointed that we cannot use retention.ms setting to delete
older message data after a given time, is there a way to purge older
messages from my changelog topic.

Remember my changelog topic is
key=list of objects
and this grows with time.

So I would like these to be deleted from time to time because I would have
already consumed the objects so that key/value can be deleted.
Later if I get a new object for the same key then that's a new message and
old data has no use for the streaming application.

So how can I achieve the following.
Would retention.bytes help here?

Is there a way if i can set expire after or something like that at message
level and some kafka thread would purge those messages.

Thanks
Sachin



On Wed, Nov 9, 2016 at 1:42 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> My two cents:
>
> Changelog topics are compacted topics, thus they do not have a
> retention time (there is an exception for windowed KTable changlog
> topics that are compacted and do have a retention time though).
>
> However, I do not understand how changing retention time should fix
> the issue. If your list of values grows and exceed max.message.byte
> you will need to increase this parameter (or shrink you value).
>
> Besides this, Eno's answer is the way to go. In order to figure out
> internal topic names, you can use KafkaStreams#toString().
>
>
> - -Matthias
>
>
>
> On 11/8/16 11:14 AM, Eno Thereska wrote:
> > Hi Sachin,
> >
> > One option right now would be to precreate all internal topics in
> > Kafka, and only after that start the Kafka Streams application.
> > This would require you knowing the internal name of the topics (in
> > this case you probably already know it, but I agree that in general
> > this is a bit cumbersome).
> >
> > Eno
> >
> >> On 8 Nov 2016, at 18:10, Sachin Mittal 
> >> wrote:
> >>
> >> Per message payload size. The basic question is how can I control
> >> the internal change log topics parameters so as to avoid these
> >> errors.
> >>
> >>
> >> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna 
> >> wrote:
> >>
> >>> Are you talking about total messages and therefore size or per
> >>> message payload size.
> >>>
> >>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal
> >>>  wrote:
> >>>
>  Message size itself increases over the time.
> 
>  Message is something like key=[list on objects]
> 
>  This increases with time and then at a point kafka is not
>  able to add any message to its topic because message size is
>  greater than max.message.bytes. Since this is an internal
>  topic based off a table I don't know how can I control this
>  topic.
> 
>  If I can set some retention.ms for this topic then I can
>  purge old messages thereby ensuring that message size stays
>  within limit.
> 
>  Thanks Sachin
> 
> 
> 
>  On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska
>   wrote:
> 
> > Hi Sachin,
> >
> > Could you clarify what you mean by "message size
> > increases"? Are
> >>> messages
> > going to the changelog topic increasing in size? Or is the
> > changelog
>  topic
> > getting full?
> >
> > Thanks Eno
> >
> >> On 8 Nov 2016, at 16:49, Sachin Mittal
> >>  wrote:
> >>
> >> Hi, We are using aggregation by key on a kstream to
> >> create a ktable. As I read from
> >> https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams%3A+Internal+Data+Management
> >> it creates an internal changelog topic.
> >>
> >> However over the time the streaming application is run
> >> message size increases and it starts throwing
> >> max.message.bytes exception.
> >>
> >> Is there a way to control the retention.ms time for
> >> internal
> >>> changelog
> >> topics so that messages are purged before they exceed
> >> this size.
> >>
> >> If not is there a way to control or avoid such an error.
> >>
> >> Thanks Sachin
> >
> >
> 
> >>>
> >>>
> >>>
> >>> -- Radha Krishna, Proddaturi 253-234-5657
> >>>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYIjGcAAoJECnhiMLycopPMp4P/3+mEVc8bIunni9nuNUFBWk0
> S/UvCvgkb7JBqBdVl7IpDsylAB+TwdMOTf+oE13buxF+XScTV04U+DYl1T/4DE/U
> PObXQsKFutY59u6k9AIW7H+aTJPRa+3M8SHf3zEdLsukzFw+F1gJcPbFxkr871Ck
> pw2A3PuSXHe2K2u1t/SI/IuhSSk2K54gxVCbnK/XQqnpp1/JZNHP+ar6jplCM7ix
> 8EOkgLgw/Kh4i0c7yuPbGOZ1wiPtimuWJI/FtKf+i2UiT7LUAzkbNdbXzBFGDoG7
> xpSgqOhC5pBUqymHQxmSTCJvO3bAlGRg0rWmPfRjmFdcQlR7a/I6po9eVAjWpaMk
> IFlKvplRgY4ubbkbRUWGBVIv5dwl4IT6SJ5FubPZkw1A4147H0SJB09CvdwXY43+
> 5HjW76lHmYRUtdFl+RTlTxNUy/yfjnIXzLjQqHEnzcIPdnJY2lM6iUj94JPzFMUE
> nY6z68PoXdKZw2VkkkiB7bnyaH1wRFD+AZKQH8ZoH2axYExg+MxJk+Fhcd+E2yU/
> 

Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

My two cents:

Changelog topics are compacted topics, thus they do not have a
retention time (there is an exception for windowed KTable changlog
topics that are compacted and do have a retention time though).

However, I do not understand how changing retention time should fix
the issue. If your list of values grows and exceed max.message.byte
you will need to increase this parameter (or shrink you value).

Besides this, Eno's answer is the way to go. In order to figure out
internal topic names, you can use KafkaStreams#toString().


- -Matthias



On 11/8/16 11:14 AM, Eno Thereska wrote:
> Hi Sachin,
> 
> One option right now would be to precreate all internal topics in
> Kafka, and only after that start the Kafka Streams application.
> This would require you knowing the internal name of the topics (in
> this case you probably already know it, but I agree that in general
> this is a bit cumbersome).
> 
> Eno
> 
>> On 8 Nov 2016, at 18:10, Sachin Mittal 
>> wrote:
>> 
>> Per message payload size. The basic question is how can I control
>> the internal change log topics parameters so as to avoid these
>> errors.
>> 
>> 
>> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna 
>> wrote:
>> 
>>> Are you talking about total messages and therefore size or per
>>> message payload size.
>>> 
>>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal
>>>  wrote:
>>> 
 Message size itself increases over the time.
 
 Message is something like key=[list on objects]
 
 This increases with time and then at a point kafka is not
 able to add any message to its topic because message size is
 greater than max.message.bytes. Since this is an internal
 topic based off a table I don't know how can I control this
 topic.
 
 If I can set some retention.ms for this topic then I can
 purge old messages thereby ensuring that message size stays
 within limit.
 
 Thanks Sachin
 
 
 
 On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska
  wrote:
 
> Hi Sachin,
> 
> Could you clarify what you mean by "message size
> increases"? Are
>>> messages
> going to the changelog topic increasing in size? Or is the
> changelog
 topic
> getting full?
> 
> Thanks Eno
> 
>> On 8 Nov 2016, at 16:49, Sachin Mittal
>>  wrote:
>> 
>> Hi, We are using aggregation by key on a kstream to
>> create a ktable. As I read from 
>> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams%3A+Internal+Data+Management
>> it creates an internal changelog topic.
>> 
>> However over the time the streaming application is run
>> message size increases and it starts throwing
>> max.message.bytes exception.
>> 
>> Is there a way to control the retention.ms time for
>> internal
>>> changelog
>> topics so that messages are purged before they exceed
>> this size.
>> 
>> If not is there a way to control or avoid such an error.
>> 
>> Thanks Sachin
> 
> 
 
>>> 
>>> 
>>> 
>>> -- Radha Krishna, Proddaturi 253-234-5657
>>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYIjGcAAoJECnhiMLycopPMp4P/3+mEVc8bIunni9nuNUFBWk0
S/UvCvgkb7JBqBdVl7IpDsylAB+TwdMOTf+oE13buxF+XScTV04U+DYl1T/4DE/U
PObXQsKFutY59u6k9AIW7H+aTJPRa+3M8SHf3zEdLsukzFw+F1gJcPbFxkr871Ck
pw2A3PuSXHe2K2u1t/SI/IuhSSk2K54gxVCbnK/XQqnpp1/JZNHP+ar6jplCM7ix
8EOkgLgw/Kh4i0c7yuPbGOZ1wiPtimuWJI/FtKf+i2UiT7LUAzkbNdbXzBFGDoG7
xpSgqOhC5pBUqymHQxmSTCJvO3bAlGRg0rWmPfRjmFdcQlR7a/I6po9eVAjWpaMk
IFlKvplRgY4ubbkbRUWGBVIv5dwl4IT6SJ5FubPZkw1A4147H0SJB09CvdwXY43+
5HjW76lHmYRUtdFl+RTlTxNUy/yfjnIXzLjQqHEnzcIPdnJY2lM6iUj94JPzFMUE
nY6z68PoXdKZw2VkkkiB7bnyaH1wRFD+AZKQH8ZoH2axYExg+MxJk+Fhcd+E2yU/
TL8b6lEcvwHOUU13H0ztSBUIJsjdh8aLVpSTvVtClDGKJJpueNznsbxf4TiVGoOm
INFNIJFfnZ2c9rOH8AGJHkdIjkJaAB8DbxP4pYoNTPboCjeFFe/B3dBUlxLkWiDq
Ny16O/mM8+6ydEG8ZzcA
=+92S
-END PGP SIGNATURE-


Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
Hi Sachin,

One option right now would be to precreate all internal topics in Kafka, and 
only after that start the Kafka Streams application. This would require you 
knowing the internal name of the topics (in this case you probably already know 
it, but I agree that in general this is a bit cumbersome).

Eno

> On 8 Nov 2016, at 18:10, Sachin Mittal  wrote:
> 
> Per message payload size. The basic question is how can I control the
> internal change log topics parameters so as to avoid these errors.
> 
> 
> On Tue, Nov 8, 2016 at 11:37 PM, R Krishna  wrote:
> 
>> Are you talking about total messages and therefore size or per message
>> payload size.
>> 
>> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal  wrote:
>> 
>>> Message size itself increases over the time.
>>> 
>>> Message is something like
>>> key=[list on objects]
>>> 
>>> This increases with time and then at a point kafka is not able to add any
>>> message to its topic because message size is greater than
>>> max.message.bytes.
>>> Since this is an internal topic based off a table I don't know how can I
>>> control this topic.
>>> 
>>> If I can set some retention.ms for this topic then I can purge old
>>> messages
>>> thereby ensuring that message size stays within limit.
>>> 
>>> Thanks
>>> Sachin
>>> 
>>> 
>>> 
>>> On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska 
>>> wrote:
>>> 
 Hi Sachin,
 
 Could you clarify what you mean by "message size increases"? Are
>> messages
 going to the changelog topic increasing in size? Or is the changelog
>>> topic
 getting full?
 
 Thanks
 Eno
 
> On 8 Nov 2016, at 16:49, Sachin Mittal  wrote:
> 
> Hi,
> We are using aggregation by key on a kstream to create a ktable.
> As I read from
> https://cwiki.apache.org/confluence/display/KAFKA/
 Kafka+Streams%3A+Internal+Data+Management
> it creates an internal changelog topic.
> 
> However over the time the streaming application is run message size
> increases and it starts throwing max.message.bytes exception.
> 
> Is there a way to control the retention.ms time for internal
>> changelog
> topics so that messages are purged before they exceed this size.
> 
> If not is there a way to control or avoid such an error.
> 
> Thanks
> Sachin
 
 
>>> 
>> 
>> 
>> 
>> --
>> Radha Krishna, Proddaturi
>> 253-234-5657
>> 



Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Sachin Mittal
Per message payload size. The basic question is how can I control the
internal change log topics parameters so as to avoid these errors.


On Tue, Nov 8, 2016 at 11:37 PM, R Krishna  wrote:

> Are you talking about total messages and therefore size or per message
> payload size.
>
> On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal  wrote:
>
> > Message size itself increases over the time.
> >
> > Message is something like
> > key=[list on objects]
> >
> > This increases with time and then at a point kafka is not able to add any
> > message to its topic because message size is greater than
> > max.message.bytes.
> > Since this is an internal topic based off a table I don't know how can I
> > control this topic.
> >
> > If I can set some retention.ms for this topic then I can purge old
> > messages
> > thereby ensuring that message size stays within limit.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska 
> > wrote:
> >
> > > Hi Sachin,
> > >
> > > Could you clarify what you mean by "message size increases"? Are
> messages
> > > going to the changelog topic increasing in size? Or is the changelog
> > topic
> > > getting full?
> > >
> > > Thanks
> > > Eno
> > >
> > > > On 8 Nov 2016, at 16:49, Sachin Mittal  wrote:
> > > >
> > > > Hi,
> > > > We are using aggregation by key on a kstream to create a ktable.
> > > > As I read from
> > > > https://cwiki.apache.org/confluence/display/KAFKA/
> > > Kafka+Streams%3A+Internal+Data+Management
> > > > it creates an internal changelog topic.
> > > >
> > > > However over the time the streaming application is run message size
> > > > increases and it starts throwing max.message.bytes exception.
> > > >
> > > > Is there a way to control the retention.ms time for internal
> changelog
> > > > topics so that messages are purged before they exceed this size.
> > > >
> > > > If not is there a way to control or avoid such an error.
> > > >
> > > > Thanks
> > > > Sachin
> > >
> > >
> >
>
>
>
> --
> Radha Krishna, Proddaturi
> 253-234-5657
>


Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread R Krishna
Are you talking about total messages and therefore size or per message
payload size.

On Tue, Nov 8, 2016 at 10:00 AM, Sachin Mittal  wrote:

> Message size itself increases over the time.
>
> Message is something like
> key=[list on objects]
>
> This increases with time and then at a point kafka is not able to add any
> message to its topic because message size is greater than
> max.message.bytes.
> Since this is an internal topic based off a table I don't know how can I
> control this topic.
>
> If I can set some retention.ms for this topic then I can purge old
> messages
> thereby ensuring that message size stays within limit.
>
> Thanks
> Sachin
>
>
>
> On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska 
> wrote:
>
> > Hi Sachin,
> >
> > Could you clarify what you mean by "message size increases"? Are messages
> > going to the changelog topic increasing in size? Or is the changelog
> topic
> > getting full?
> >
> > Thanks
> > Eno
> >
> > > On 8 Nov 2016, at 16:49, Sachin Mittal  wrote:
> > >
> > > Hi,
> > > We are using aggregation by key on a kstream to create a ktable.
> > > As I read from
> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams%3A+Internal+Data+Management
> > > it creates an internal changelog topic.
> > >
> > > However over the time the streaming application is run message size
> > > increases and it starts throwing max.message.bytes exception.
> > >
> > > Is there a way to control the retention.ms time for internal changelog
> > > topics so that messages are purged before they exceed this size.
> > >
> > > If not is there a way to control or avoid such an error.
> > >
> > > Thanks
> > > Sachin
> >
> >
>



-- 
Radha Krishna, Proddaturi
253-234-5657


Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Sachin Mittal
Message size itself increases over the time.

Message is something like
key=[list on objects]

This increases with time and then at a point kafka is not able to add any
message to its topic because message size is greater than max.message.bytes.
Since this is an internal topic based off a table I don't know how can I
control this topic.

If I can set some retention.ms for this topic then I can purge old messages
thereby ensuring that message size stays within limit.

Thanks
Sachin



On Tue, Nov 8, 2016 at 11:22 PM, Eno Thereska 
wrote:

> Hi Sachin,
>
> Could you clarify what you mean by "message size increases"? Are messages
> going to the changelog topic increasing in size? Or is the changelog topic
> getting full?
>
> Thanks
> Eno
>
> > On 8 Nov 2016, at 16:49, Sachin Mittal  wrote:
> >
> > Hi,
> > We are using aggregation by key on a kstream to create a ktable.
> > As I read from
> > https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams%3A+Internal+Data+Management
> > it creates an internal changelog topic.
> >
> > However over the time the streaming application is run message size
> > increases and it starts throwing max.message.bytes exception.
> >
> > Is there a way to control the retention.ms time for internal changelog
> > topics so that messages are purged before they exceed this size.
> >
> > If not is there a way to control or avoid such an error.
> >
> > Thanks
> > Sachin
>
>


Re: Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Eno Thereska
Hi Sachin,

Could you clarify what you mean by "message size increases"? Are messages going 
to the changelog topic increasing in size? Or is the changelog topic getting 
full? 

Thanks
Eno

> On 8 Nov 2016, at 16:49, Sachin Mittal  wrote:
> 
> Hi,
> We are using aggregation by key on a kstream to create a ktable.
> As I read from
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams%3A+Internal+Data+Management
> it creates an internal changelog topic.
> 
> However over the time the streaming application is run message size
> increases and it starts throwing max.message.bytes exception.
> 
> Is there a way to control the retention.ms time for internal changelog
> topics so that messages are purged before they exceed this size.
> 
> If not is there a way to control or avoid such an error.
> 
> Thanks
> Sachin



Kafka streaming changelog topic max.message.bytes exception

2016-11-08 Thread Sachin Mittal
Hi,
We are using aggregation by key on a kstream to create a ktable.
As I read from
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams%3A+Internal+Data+Management
it creates an internal changelog topic.

However over the time the streaming application is run message size
increases and it starts throwing max.message.bytes exception.

Is there a way to control the retention.ms time for internal changelog
topics so that messages are purged before they exceed this size.

If not is there a way to control or avoid such an error.

Thanks
Sachin


Re: kafka streaming rocks db lock bug?

2016-10-26 Thread Guozhang Wang
Hello Ara,

I have looked through the logs you sent to me (thanks!), and I'm suspecting
the "file does not exist" issue is because your state directory is set to `
/tmp/kafka-streams` folder with the default value. As for the "LOCK: No
locks available" and "Failed to lock the state directory" issue, it seems
you are rolling bounce the stream instances (from the observed "shutdown"
log entries), and I'm suspecting it is because the state directories has
been auto deleted, it caused the thread tries to re-create it again while
still holding on the locks during the shutdown process.

Could you set the "state.dir" config to another directory and see if this
issue goes away?

http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters


As for the detailed failover description, I'd recommend you to this doc
section:

http://docs.confluent.io/3.0.1/streams/architecture.html#fault-tolerance


As well as this blog post:

http://www.confluent.io/blog/elastic-scaling-in-kafka-streams/

Guozhang


On Tue, Oct 25, 2016 at 8:34 AM, Guozhang Wang  wrote:

> Logs would be very helpful, I can look further into it.
>
> Thanks Ara.
>
> On Mon, Oct 24, 2016 at 11:04 PM, Ara Ebrahimi <
> ara.ebrah...@argyledata.com> wrote:
>
>> This was in 10.1.0. What happened was that a kafka broker went down and
>> then this happened on the kafka streaming instance which had connection to
>> this broker. I can send you all logs I got.
>>
>> Ara.
>>
>> On Oct 24, 2016, at 10:41 PM, Guozhang Wang > wangg...@gmail.com>> wrote:
>>
>> Hello Ara,
>>
>> Your encountered issue seems to be KAFKA-3812
>> <https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
>> <https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
>> upgrade to the newly released 0.10.1.0 version and see if this issue goes
>> away? If not I would love to investigate this issue further with you.
>>
>>
>> Guozhang
>>
>>
>>
>> Guozhang
>>
>>
>> On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi <
>> ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>>
>> wrote:
>>
>> And then this on a different node:
>>
>> 2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
>> [StreamThread-3] Stream thread shutdown complete
>> 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
>> org.apache.kafka.streams.errors.StreamsException: stream-thread
>> [StreamThread-3] Failed to rebalance
>> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:401)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.run(StreamThread.java:235)
>> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
>> while creating the state manager
>> at org.apache.kafka.streams.processor.internals.AbstractTask.(
>> AbstractTask.java:72)
>> at org.apache.kafka.streams.processor.internals.
>> StreamTask.(StreamTask.java:90)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.createStreamTask(StreamThread.java:622)
>> at org.apache.kafka.streams.processor.internals.
>> StreamThread.addStreamTasks(StreamThread.java:649)
>> at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
>> StreamThread.java:69)
>> at org.apache.kafka.streams.processor.internals.StreamThread$1.
>> onPartitionsAssigned(StreamThread.java:120)
>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
>> onJoinComplete(ConsumerCoordinator.java:228)
>> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> joinGroupIfNeeded(AbstractCoordinator.java:313)
>> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
>> ensureActiveGroup(AbstractCoordinator.java:277)
>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
>> ConsumerCoordinator.java:259)
>> at org.apache.kafka.clients.consumer.KafkaConsumer.
>> pollOnce(KafkaConsumer.java:1013)
>> at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
>> KafkaConsumer.java:979)
>> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
>> StreamThread.java:398)
>> ... 1 more
>> Caused by: java.io.IOException: task [7_1] Failed to lock the state
>> directory: /tmp/kafka-streams/argyle-streams/7_1
>> at org.apache.kafka.streams.processor.internals.
>> ProcessorStateManager.(ProcessorStateManager.java:98)
>> at org.apache.kafka.streams.processor.internals.AbstractTask.(
>> AbstractTask.java:69)
&

Re: kafka streaming rocks db lock bug?

2016-10-25 Thread Guozhang Wang
Logs would be very helpful, I can look further into it.

Thanks Ara.

On Mon, Oct 24, 2016 at 11:04 PM, Ara Ebrahimi 
wrote:

> This was in 10.1.0. What happened was that a kafka broker went down and
> then this happened on the kafka streaming instance which had connection to
> this broker. I can send you all logs I got.
>
> Ara.
>
> On Oct 24, 2016, at 10:41 PM, Guozhang Wang mailto:wan
> gg...@gmail.com>> wrote:
>
> Hello Ara,
>
> Your encountered issue seems to be KAFKA-3812
> <https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
> <https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
> upgrade to the newly released 0.10.1.0 version and see if this issue goes
> away? If not I would love to investigate this issue further with you.
>
>
> Guozhang
>
>
>
> Guozhang
>
>
> On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi  <mailto:ara.ebrah...@argyledata.com>>
> wrote:
>
> And then this on a different node:
>
> 2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
> [StreamThread-3] Stream thread shutdown complete
> 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
> org.apache.kafka.streams.errors.StreamsException: stream-thread
> [StreamThread-3] Failed to rebalance
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:401)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:235)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> while creating the state manager
> at org.apache.kafka.streams.processor.internals.AbstractTask.(
> AbstractTask.java:72)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.(StreamTask.java:90)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.createStreamTask(StreamThread.java:622)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.addStreamTasks(StreamThread.java:649)
> at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
> StreamThread.java:69)
> at org.apache.kafka.streams.processor.internals.StreamThread$1.
> onPartitionsAssigned(StreamThread.java:120)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> onJoinComplete(ConsumerCoordinator.java:228)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> joinGroupIfNeeded(AbstractCoordinator.java:313)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> ensureActiveGroup(AbstractCoordinator.java:277)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> ConsumerCoordinator.java:259)
> at org.apache.kafka.clients.consumer.KafkaConsumer.
> pollOnce(KafkaConsumer.java:1013)
> at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> KafkaConsumer.java:979)
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:398)
> ... 1 more
> Caused by: java.io.IOException: task [7_1] Failed to lock the state
> directory: /tmp/kafka-streams/argyle-streams/7_1
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.(ProcessorStateManager.java:98)
> at org.apache.kafka.streams.processor.internals.AbstractTask.(
> AbstractTask.java:69)
> ... 13 more
>
> Ara.
>
> On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi  mailto:ara.ebrah...@argyledata.com><
> mailto:ara.ebrah...@argyledata.com>> wrote:
>
> Hi,
>
> This happens when I hammer our 5 kafka streaming nodes (each with 4
> streaming threads) hard enough for an hour or so:
>
> 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread
> [StreamThread-2] Failed to flush state for StreamTask 3_8:
> org.apache.kafka.streams.errors.ProcessorStateException: task [3_8]
> Failed to flush state store streams-data-record-stats-avro-br-store
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.flush(ProcessorStateManager.java:322)
> at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(
> AbstractTask.java:181)
> at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> StreamThread.java:360)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> performOnAllTasks(StreamThread.java:322)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.flushAllState(StreamThread.java:357)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> shutdownTasksAndState(StreamThread.java:295)
> at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> StreamThread.java:262)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:245)
> Caused by: org.apache.kafka.st

Re: kafka streaming rocks db lock bug?

2016-10-24 Thread Ara Ebrahimi
This was in 10.1.0. What happened was that a kafka broker went down and then 
this happened on the kafka streaming instance which had connection to this 
broker. I can send you all logs I got.

Ara.

On Oct 24, 2016, at 10:41 PM, Guozhang Wang 
mailto:wangg...@gmail.com>> wrote:

Hello Ara,

Your encountered issue seems to be KAFKA-3812
<https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
<https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
upgrade to the newly released 0.10.1.0 version and see if this issue goes
away? If not I would love to investigate this issue further with you.


Guozhang



Guozhang


On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi 
mailto:ara.ebrah...@argyledata.com>>
wrote:

And then this on a different node:

2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
[StreamThread-3] Stream thread shutdown complete
2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
org.apache.kafka.streams.errors.StreamsException: stream-thread
[StreamThread-3] Failed to rebalance
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
StreamThread.java:401)
at org.apache.kafka.streams.processor.internals.
StreamThread.run(StreamThread.java:235)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
while creating the state manager
at org.apache.kafka.streams.processor.internals.AbstractTask.(
AbstractTask.java:72)
at org.apache.kafka.streams.processor.internals.
StreamTask.(StreamTask.java:90)
at org.apache.kafka.streams.processor.internals.
StreamThread.createStreamTask(StreamThread.java:622)
at org.apache.kafka.streams.processor.internals.
StreamThread.addStreamTasks(StreamThread.java:649)
at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
StreamThread.java:69)
at org.apache.kafka.streams.processor.internals.StreamThread$1.
onPartitionsAssigned(StreamThread.java:120)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
onJoinComplete(ConsumerCoordinator.java:228)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
joinGroupIfNeeded(AbstractCoordinator.java:313)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
ensureActiveGroup(AbstractCoordinator.java:277)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
ConsumerCoordinator.java:259)
at org.apache.kafka.clients.consumer.KafkaConsumer.
pollOnce(KafkaConsumer.java:1013)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
KafkaConsumer.java:979)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
StreamThread.java:398)
... 1 more
Caused by: java.io.IOException: task [7_1] Failed to lock the state
directory: /tmp/kafka-streams/argyle-streams/7_1
at org.apache.kafka.streams.processor.internals.
ProcessorStateManager.(ProcessorStateManager.java:98)
at org.apache.kafka.streams.processor.internals.AbstractTask.(
AbstractTask.java:69)
... 13 more

Ara.

On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi 
mailto:ara.ebrah...@argyledata.com><
mailto:ara.ebrah...@argyledata.com>> wrote:

Hi,

This happens when I hammer our 5 kafka streaming nodes (each with 4
streaming threads) hard enough for an hour or so:

2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread
[StreamThread-2] Failed to flush state for StreamTask 3_8:
org.apache.kafka.streams.errors.ProcessorStateException: task [3_8]
Failed to flush state store streams-data-record-stats-avro-br-store
at org.apache.kafka.streams.processor.internals.
ProcessorStateManager.flush(ProcessorStateManager.java:322)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(
AbstractTask.java:181)
at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
StreamThread.java:360)
at org.apache.kafka.streams.processor.internals.StreamThread.
performOnAllTasks(StreamThread.java:322)
at org.apache.kafka.streams.processor.internals.
StreamThread.flushAllState(StreamThread.java:357)
at org.apache.kafka.streams.processor.internals.StreamThread.
shutdownTasksAndState(StreamThread.java:295)
at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
StreamThread.java:262)
at org.apache.kafka.streams.processor.internals.
StreamThread.run(StreamThread.java:245)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
opening store streams-data-record-stats-avro-br-store-20150516 at
location /tmp/kafka-streams/argyle-streams/3_8/streams-data-
record-stats-avro-br-store/streams-data-record-stats-
avro-br-store-20150516
at org.apache.kafka.streams.state.internals.RocksDBStore.
openDB(RocksDBStore.java:196)
at org.apache.kafka.streams.state.internals.RocksDBStore.
openDB(RocksDBStore.java:158)
at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.
openDB(RocksDBWindowStore.java:72)
at org.apache.kafka.streams.state.internals.RocksDBWindowStore.
getOrCreateSegment(RocksDBWindowStore.java:402)
at org.apache.kafka.stream

Re: kafka streaming rocks db lock bug?

2016-10-24 Thread Guozhang Wang
Hello Ara,

Your encountered issue seems to be KAFKA-3812
<https://issues.apache.org/jira/browse/KAFKA-3812>, and KAFKA-3938
<https://issues.apache.org/jira/browse/KAFKA-3938>. Could you try to
upgrade to the newly released 0.10.1.0 version and see if this issue goes
away? If not I would love to investigate this issue further with you.


Guozhang



Guozhang


On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi 
wrote:

> And then this on a different node:
>
> 2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread
> [StreamThread-3] Stream thread shutdown complete
> 2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
> org.apache.kafka.streams.errors.StreamsException: stream-thread
> [StreamThread-3] Failed to rebalance
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:401)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:235)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> while creating the state manager
> at org.apache.kafka.streams.processor.internals.AbstractTask.(
> AbstractTask.java:72)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.(StreamTask.java:90)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.createStreamTask(StreamThread.java:622)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.addStreamTasks(StreamThread.java:649)
> at org.apache.kafka.streams.processor.internals.StreamThread.access$000(
> StreamThread.java:69)
> at org.apache.kafka.streams.processor.internals.StreamThread$1.
> onPartitionsAssigned(StreamThread.java:120)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> onJoinComplete(ConsumerCoordinator.java:228)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> joinGroupIfNeeded(AbstractCoordinator.java:313)
> at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> ensureActiveGroup(AbstractCoordinator.java:277)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> ConsumerCoordinator.java:259)
> at org.apache.kafka.clients.consumer.KafkaConsumer.
> pollOnce(KafkaConsumer.java:1013)
> at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> KafkaConsumer.java:979)
> at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:398)
> ... 1 more
> Caused by: java.io.IOException: task [7_1] Failed to lock the state
> directory: /tmp/kafka-streams/argyle-streams/7_1
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.(ProcessorStateManager.java:98)
> at org.apache.kafka.streams.processor.internals.AbstractTask.(
> AbstractTask.java:69)
> ... 13 more
>
> Ara.
>
> On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi  mailto:ara.ebrah...@argyledata.com>> wrote:
>
> Hi,
>
> This happens when I hammer our 5 kafka streaming nodes (each with 4
> streaming threads) hard enough for an hour or so:
>
> 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread
> [StreamThread-2] Failed to flush state for StreamTask 3_8:
> org.apache.kafka.streams.errors.ProcessorStateException: task [3_8]
> Failed to flush state store streams-data-record-stats-avro-br-store
> at org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.flush(ProcessorStateManager.java:322)
> at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(
> AbstractTask.java:181)
> at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> StreamThread.java:360)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> performOnAllTasks(StreamThread.java:322)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.flushAllState(StreamThread.java:357)
> at org.apache.kafka.streams.processor.internals.StreamThread.
> shutdownTasksAndState(StreamThread.java:295)
> at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> StreamThread.java:262)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:245)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
> opening store streams-data-record-stats-avro-br-store-20150516 at
> location /tmp/kafka-streams/argyle-streams/3_8/streams-data-
> record-stats-avro-br-store/streams-data-record-stats-
> avro-br-store-20150516
> at org.apache.kafka.streams.state.internals.RocksDBStore.
> openDB(RocksDBStore.java:196)
> at org.apache.kafka.streams.state.internals.RocksDBStore.
> openDB(RocksDBStore.java:158)
> at org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.
> openDB(RocksDBWindowStore.java:72)
> at org.apache.kafka.streams.state.internals.RocksDBWind

Re: Kafka Streaming

2016-10-24 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

It's a client issues... But CP 3.1 should be our in about 2 weeks...
Of course, you can use Kafka 0.10.1.0 for now. It was released last
week and does contain the fix.

- -Matthias

On 10/24/16 9:19 AM, Mohit Anchlia wrote:
> Would this be an issue if I connect to a remote Kafka instance
> running on the Linux box? Or is this a client issue. What's rockdb
> used for to keep state?
> 
> On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax
>  wrote:
> 
> Kafka 0.10.1.0 which was release last week does contain the fix 
> already. The fix will be in CP 3.1 coming up soon!
> 
> (sorry that I did mix up versions in a previous email)
> 
> -Matthias
> 
> On 10/23/16 12:10 PM, Mohit Anchlia wrote:
 So if I get it right I will not have this fix until 4
 months? Should I just create my own example with the next
 version of Kafka?
 
 On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax 
  wrote:
 
 Current version is 3.0.1 CP 3.1 should be release the next
 weeks
 
 So CP 3.2 should be there is about 4 month (Kafka follows a
 time base release cycle of 4 month and CP usually aligns with
 Kafka releases)
 
 -Matthias
 
 
 On 10/20/16 5:10 PM, Mohit Anchlia wrote:
>>> Any idea of when 3.2 is coming?
>>> 
>>> On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax 
>>>  wrote:
>>> 
>>> No problem. Asking questions is the purpose of mailing
>>> lists. :)
>>> 
>>> The issue will be fixed in next version of examples
>>> branch.
>>> 
>>> Examples branch is build with CP dependency and not
>>> with Kafka dependency. CP-3.2 is not available yet;
>>> only Kafka 0.10.1.0. Nevertheless, they should work
>>> with Kafka dependency, too. I never tried it, but you
>>> should give it a shot...
>>> 
>>> But you should use example master branch because of
>>> API changes from 0.10.0.x to 0.10.1 (and thus, changing
>>> CP-3.1 to 0.10.1.0 will not be compatible and not
>>> compile, while changing CP-3.2-SNAPSHOT to 0.10.1.0
>>> should work -- hopefully ;) )
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
>> So this issue I am seeing is fixed in the next
>> version of example branch? Can I change my pom to
>> point it the higher version of Kafka if that is
>> the issue? Or do I need to wait until new branch
>> is made available? Sorry lot of questions :)
>> 
>> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax 
>>  wrote:
>> 
>> The branch is 0.10.0.1 and not 0.10.1.0 (sorry
>> for so many zeros and ones -- super easy to mix
>> up)
>> 
>> However, examples master branch uses
>> CP-3.1-SNAPSHOT (ie, Kafka 0.10.1.0) -- there
>> will be a 0.10.1 examples branch, after CP-3.1
>> was released
>> 
>> 
>> -Matthias
>> 
>> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> I just now cloned this repo. It seems to be
> using 10.1
> 
> https://github.com/confluentinc/examples
> and running examples in 
> https://github.com/confluentinc/examples/tree/kafka-0.10.0
.1-
>
> 
cp-
 
> 
> 3.0
>>> 
> 
 .1/
>> 
> 
>>> kafka-streams
> 
> On Thu, Oct 20, 2016 at 3:10 PM, Michael
> Noll  wrote:
> 
>> I suspect you are running Kafka 0.10.0.x
>> on Windows? If so, this is a known issue
>> that is fixed in Kafka 0.10.1 that was
>> just released today.
>> 
>> Also: which examples are you referring
>> to? And, to confirm: which git branch /
>> Kafka version / OS in case my guess above
>> was wrong.
>> 
>> 
>> On Thursday, October 20, 2016, Mohit
>> Anchlia  wrote:
>> 
>>> I am trying to run the examples from
>>> git. While running the wordcount
>>> example I see this error:
>>> 
>>> Caused by:
>>> *java.lang.RuntimeException*: 
>>> librocksdbjni-win64.dll was not found
>>> inside JAR.
>>> 
>>> 
>>> Am I expected to include this jar
>>> locally?
>>> 
>> 
>> 
>> -- *Michael G. Noll* Product Manager | 
>> Confluent +1 650 453 5860 | @miguno 
>>  Follow us:
>> Twitter 
>>  |
>> Blog 
>> 
> 
>>> 
>> 
 
>>> 
> 
 
>> 
> 

Re: Kafka Streaming

2016-10-24 Thread Mohit Anchlia
Would this be an issue if I connect to a remote Kafka instance running on
the Linux box? Or is this a client issue. What's rockdb used for to keep
state?

On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Kafka 0.10.1.0 which was release last week does contain the fix
> already. The fix will be in CP 3.1 coming up soon!
>
> (sorry that I did mix up versions in a previous email)
>
> - -Matthias
>
> On 10/23/16 12:10 PM, Mohit Anchlia wrote:
> > So if I get it right I will not have this fix until 4 months?
> > Should I just create my own example with the next version of
> > Kafka?
> >
> > On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
> >  wrote:
> >
> > Current version is 3.0.1 CP 3.1 should be release the next weeks
> >
> > So CP 3.2 should be there is about 4 month (Kafka follows a time
> > base release cycle of 4 month and CP usually aligns with Kafka
> > releases)
> >
> > -Matthias
> >
> >
> > On 10/20/16 5:10 PM, Mohit Anchlia wrote:
>  Any idea of when 3.2 is coming?
> 
>  On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
>   wrote:
> 
>  No problem. Asking questions is the purpose of mailing lists.
>  :)
> 
>  The issue will be fixed in next version of examples branch.
> 
>  Examples branch is build with CP dependency and not with
>  Kafka dependency. CP-3.2 is not available yet; only Kafka
>  0.10.1.0. Nevertheless, they should work with Kafka
>  dependency, too. I never tried it, but you should give it a
>  shot...
> 
>  But you should use example master branch because of API
>  changes from 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to
>  0.10.1.0 will not be compatible and not compile, while
>  changing CP-3.2-SNAPSHOT to 0.10.1.0 should work -- hopefully
>  ;) )
> 
> 
>  -Matthias
> 
>  On 10/20/16 4:02 PM, Mohit Anchlia wrote:
> >>> So this issue I am seeing is fixed in the next version
> >>> of example branch? Can I change my pom to point it the
> >>> higher version of Kafka if that is the issue? Or do I
> >>> need to wait until new branch is made available? Sorry
> >>> lot of questions :)
> >>>
> >>> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
> >>>  wrote:
> >>>
> >>> The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so
> >>> many zeros and ones -- super easy to mix up)
> >>>
> >>> However, examples master branch uses CP-3.1-SNAPSHOT
> >>> (ie, Kafka 0.10.1.0) -- there will be a 0.10.1 examples
> >>> branch, after CP-3.1 was released
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> >> I just now cloned this repo. It seems to be using
> >> 10.1
> >>
> >> https://github.com/confluentinc/examples and
> >> running examples in
> >> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-
> cp-
> >
> >>
> 3.0
> 
> >>
> > .1/
> >>>
> >>
>  kafka-streams
> >>
> >> On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll
> >>  wrote:
> >>
> >>> I suspect you are running Kafka 0.10.0.x on
> >>> Windows? If so, this is a known issue that is
> >>> fixed in Kafka 0.10.1 that was just released
> >>> today.
> >>>
> >>> Also: which examples are you referring to?
> >>> And, to confirm: which git branch / Kafka
> >>> version / OS in case my guess above was wrong.
> >>>
> >>>
> >>> On Thursday, October 20, 2016, Mohit Anchlia
> >>>  wrote:
> >>>
>  I am trying to run the examples from git.
>  While running the wordcount example I see
>  this error:
> 
>  Caused by: *java.lang.RuntimeException*:
>  librocksdbjni-win64.dll was not found inside
>  JAR.
> 
> 
>  Am I expected to include this jar locally?
> 
> >>>
> >>>
> >>> -- *Michael G. Noll* Product Manager |
> >>> Confluent +1 650 453 5860 | @miguno
> >>>  Follow us: Twitter
> >>>  | Blog
> >>> 
> >>>
> >>
> 
> >>>
> >
> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYDbOJAAoJECnhiMLycopPSF0QAKFi0lG3dMccIaBuSzsgj2Of
> Iu9BnoTCr4MJoY18loPE8DUy9kbEjaf6RjMzkvXvTV+Jjys7PgTa5qLlo7hEwgb6
> UxWZYrdOXUGQcJckmsdzMIoGY79GETQuvRkN8uzJ/hfJJ4YNgtLCXaNMDTfcRmtp
> hson3M7SYRISSz7Jz9jM7PmPnR6MIWXRV+8r9jdL8IH+gRtrnT1svr/pkiAL6e7B
> gZdQHhTVI+dqWRgHfuenr2MQH2Od3UgZsK3pqVlfv0Zy7c3IL6FKN4WQgDiYOYBH
> URPY9BBQGnOTINEmX63Bwx8ns+RNgtyIO9O3ytewRmyWRkQIzO/K0zszSjblPJeG
> rNnrC9wAQH9arx6ZHAw/moEfpsRCAJT5SGjCnYa8NaDq0wlbH

Re: Kafka Streaming

2016-10-24 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Kafka 0.10.1.0 which was release last week does contain the fix
already. The fix will be in CP 3.1 coming up soon!

(sorry that I did mix up versions in a previous email)

- -Matthias

On 10/23/16 12:10 PM, Mohit Anchlia wrote:
> So if I get it right I will not have this fix until 4 months?
> Should I just create my own example with the next version of
> Kafka?
> 
> On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
>  wrote:
> 
> Current version is 3.0.1 CP 3.1 should be release the next weeks
> 
> So CP 3.2 should be there is about 4 month (Kafka follows a time
> base release cycle of 4 month and CP usually aligns with Kafka
> releases)
> 
> -Matthias
> 
> 
> On 10/20/16 5:10 PM, Mohit Anchlia wrote:
 Any idea of when 3.2 is coming?
 
 On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax 
  wrote:
 
 No problem. Asking questions is the purpose of mailing lists.
 :)
 
 The issue will be fixed in next version of examples branch.
 
 Examples branch is build with CP dependency and not with
 Kafka dependency. CP-3.2 is not available yet; only Kafka
 0.10.1.0. Nevertheless, they should work with Kafka
 dependency, too. I never tried it, but you should give it a
 shot...
 
 But you should use example master branch because of API
 changes from 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to
 0.10.1.0 will not be compatible and not compile, while
 changing CP-3.2-SNAPSHOT to 0.10.1.0 should work -- hopefully
 ;) )
 
 
 -Matthias
 
 On 10/20/16 4:02 PM, Mohit Anchlia wrote:
>>> So this issue I am seeing is fixed in the next version
>>> of example branch? Can I change my pom to point it the
>>> higher version of Kafka if that is the issue? Or do I
>>> need to wait until new branch is made available? Sorry
>>> lot of questions :)
>>> 
>>> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax 
>>>  wrote:
>>> 
>>> The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so
>>> many zeros and ones -- super easy to mix up)
>>> 
>>> However, examples master branch uses CP-3.1-SNAPSHOT
>>> (ie, Kafka 0.10.1.0) -- there will be a 0.10.1 examples
>>> branch, after CP-3.1 was released
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
>> I just now cloned this repo. It seems to be using
>> 10.1
>> 
>> https://github.com/confluentinc/examples and
>> running examples in 
>> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-
cp-
>
>> 
3.0
 
>> 
> .1/
>>> 
>> 
 kafka-streams
>> 
>> On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll 
>>  wrote:
>> 
>>> I suspect you are running Kafka 0.10.0.x on
>>> Windows? If so, this is a known issue that is
>>> fixed in Kafka 0.10.1 that was just released
>>> today.
>>> 
>>> Also: which examples are you referring to?
>>> And, to confirm: which git branch / Kafka
>>> version / OS in case my guess above was wrong.
>>> 
>>> 
>>> On Thursday, October 20, 2016, Mohit Anchlia 
>>>  wrote:
>>> 
 I am trying to run the examples from git.
 While running the wordcount example I see
 this error:
 
 Caused by: *java.lang.RuntimeException*: 
 librocksdbjni-win64.dll was not found inside
 JAR.
 
 
 Am I expected to include this jar locally?
 
>>> 
>>> 
>>> -- *Michael G. Noll* Product Manager |
>>> Confluent +1 650 453 5860 | @miguno
>>>  Follow us: Twitter
>>>  | Blog
>>> 
>>> 
>> 
 
>>> 
> 
 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYDbOJAAoJECnhiMLycopPSF0QAKFi0lG3dMccIaBuSzsgj2Of
Iu9BnoTCr4MJoY18loPE8DUy9kbEjaf6RjMzkvXvTV+Jjys7PgTa5qLlo7hEwgb6
UxWZYrdOXUGQcJckmsdzMIoGY79GETQuvRkN8uzJ/hfJJ4YNgtLCXaNMDTfcRmtp
hson3M7SYRISSz7Jz9jM7PmPnR6MIWXRV+8r9jdL8IH+gRtrnT1svr/pkiAL6e7B
gZdQHhTVI+dqWRgHfuenr2MQH2Od3UgZsK3pqVlfv0Zy7c3IL6FKN4WQgDiYOYBH
URPY9BBQGnOTINEmX63Bwx8ns+RNgtyIO9O3ytewRmyWRkQIzO/K0zszSjblPJeG
rNnrC9wAQH9arx6ZHAw/moEfpsRCAJT5SGjCnYa8NaDq0wlbHM+lb/AWge2q9C8N
/op4pBImyxpqbWVNjUhWwGoGAqTliXqs4QlyuBi0KhIKXYwsXELthillnDNGbhnA
ZRxcGOfA3CS2qb+GiwmsxKSLcTXq9U7RlNmCLmW0+Xbg/98CQGSKPMMzG7lkgK2v
d8SmN8dafa0yiXmf/CtdMGCRScNyLVtS5XLrQRyG3kXWSlNlxcjFMQGbuidFMIdv
JLRiF4FGv6zjWSWmP7L75vyZQ+yVApvk2XDX00wT/hHIFC2SE2GYWB0usLMAw8/K
jjWAQ1QCKsuW+ZzSkD7D
=DMlr
-END PGP SIGNATURE-


Re: Kafka Streaming

2016-10-23 Thread Sachin Mittal
You can build librocksdbjni locally to fix it.
I did that in my case.

It is bit tricky and you need MS visual studio 15.

I suggest use the following link:
http://mail-archives.apache.org/mod_mbox/kafka-users/201608.mbox/%3CCAHoiPjweo-xSj3TiodcDVf4wNnnJ8u6PcwWDPF7LT5ps%2BxQ3eA%40mail.gmail.com%3E

Few point to remember
1. download cmake for windows
2. make sure to checkout 4.8.fb branch
3. make sure to merge this PR to you local branch

https://github.com/facebook/rocksdb/pull/1223/files

Follow the instructions in the mail and this

https://github.com/facebook/rocksdb/blob/v4.8/CMakeLists.txt

You are good to go. I suggest kafka binary to include this dll in future
releases.

Thanks
Sachin



On Fri, Oct 21, 2016 at 3:40 AM, Michael Noll  wrote:

> I suspect you are running Kafka 0.10.0.x on Windows?  If so, this is a
> known issue that is fixed in Kafka 0.10.1 that was just released today.
>
> Also: which examples are you referring to?  And, to confirm: which git
> branch / Kafka version / OS in case my guess above was wrong.
>
>
> On Thursday, October 20, 2016, Mohit Anchlia 
> wrote:
>
> > I am trying to run the examples from git. While running the wordcount
> > example I see this error:
> >
> > Caused by: *java.lang.RuntimeException*: librocksdbjni-win64.dll was not
> > found inside JAR.
> >
> >
> > Am I expected to include this jar locally?
> >
>
>
> --
> *Michael G. Noll*
> Product Manager | Confluent
> +1 650 453 5860 | @miguno 
> Follow us: Twitter  | Blog
> 
>


Re: kafka streaming rocks db lock bug?

2016-10-23 Thread Ara Ebrahimi
And then this on a different node:

2016-10-23 13:43:57 INFO  StreamThread:286 - stream-thread [StreamThread-3] 
Stream thread shutdown complete
2016-10-23 13:43:57 ERROR StreamPipeline:169 - An exception has occurred
org.apache.kafka.streams.errors.StreamsException: stream-thread 
[StreamThread-3] Failed to rebalance
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:401)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:235)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while 
creating the state manager
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:72)
at 
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:90)
at 
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:622)
at 
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:649)
at 
org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:69)
at 
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:120)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:398)
... 1 more
Caused by: java.io.IOException: task [7_1] Failed to lock the state directory: 
/tmp/kafka-streams/argyle-streams/7_1
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:98)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69)
... 13 more

Ara.

On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi 
mailto:ara.ebrah...@argyledata.com>> wrote:

Hi,

This happens when I hammer our 5 kafka streaming nodes (each with 4 streaming 
threads) hard enough for an hour or so:

2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread [StreamThread-2] 
Failed to flush state for StreamTask 3_8:
org.apache.kafka.streams.errors.ProcessorStateException: task [3_8] Failed to 
flush state store streams-data-record-stats-avro-br-store
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:322)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:181)
at 
org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:360)
at 
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at 
org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:357)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:295)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
opening store streams-data-record-stats-avro-br-store-20150516 at location 
/tmp/kafka-streams/argyle-streams/3_8/streams-data-record-stats-avro-br-store/streams-data-record-stats-avro-br-store-20150516
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:196)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:158)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.openDB(RocksDBWindowStore.java:72)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.getOrCreateSegment(RocksDBWindowStore.java:402)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.putAndReturnInternalKey(RocksDBWindowStore.java:310)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:292)
at 
org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:101)
at 
org.apache.kafka.streams.state.internals.CachingWindowStore$1.apply(CachingWindowStore.java:87)
at 
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:117)
at 
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at 
org.apache.kafka.streams.state.internals.CachingWindowStore.flush(CachingWindowStore.java:118)
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:320)
... 7 more
Cau

kafka streaming rocks db lock bug?

2016-10-23 Thread Ara Ebrahimi
Hi,

This happens when I hammer our 5 kafka streaming nodes (each with 4 streaming 
threads) hard enough for an hour or so:

2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread [StreamThread-2] 
Failed to flush state for StreamTask 3_8:
org.apache.kafka.streams.errors.ProcessorStateException: task [3_8] Failed to 
flush state store streams-data-record-stats-avro-br-store
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:322)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:181)
at 
org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:360)
at 
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at 
org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:357)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:295)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
opening store streams-data-record-stats-avro-br-store-20150516 at location 
/tmp/kafka-streams/argyle-streams/3_8/streams-data-record-stats-avro-br-store/streams-data-record-stats-avro-br-store-20150516
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:196)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:158)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.openDB(RocksDBWindowStore.java:72)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.getOrCreateSegment(RocksDBWindowStore.java:402)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.putAndReturnInternalKey(RocksDBWindowStore.java:310)
at 
org.apache.kafka.streams.state.internals.RocksDBWindowStore.put(RocksDBWindowStore.java:292)
at 
org.apache.kafka.streams.state.internals.MeteredWindowStore.put(MeteredWindowStore.java:101)
at 
org.apache.kafka.streams.state.internals.CachingWindowStore$1.apply(CachingWindowStore.java:87)
at 
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:117)
at 
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at 
org.apache.kafka.streams.state.internals.CachingWindowStore.flush(CachingWindowStore.java:118)
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:320)
... 7 more
Caused by: org.rocksdb.RocksDBException: IO error: lock 
/tmp/kafka-streams/argyle-streams/3_8/streams-data-record-stats-avro-br-store/streams-data-record-stats-avro-br-store-20150516/LOCK:
 No locks available
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:184)
at 
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:189)
... 18 more

Some sort of a locking bug?

Note that when this happen this node stops processing anything and the other 
nodes seem to want to pick up the load, which brings the whole streaming 
cluster to a stand still. That’s very worrying. Is a document somewhere 
describing *in detail* how failover for streaming works?

Ara.





This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.




Re: Kafka Streaming

2016-10-23 Thread Mohit Anchlia
So if I get it right I will not have this fix until 4 months? Should I just
create my own example with the next version of Kafka?

On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Current version is 3.0.1
> CP 3.1 should be release the next weeks
>
> So CP 3.2 should be there is about 4 month (Kafka follows a time base
> release cycle of 4 month and CP usually aligns with Kafka releases)
>
> - -Matthias
>
>
> On 10/20/16 5:10 PM, Mohit Anchlia wrote:
> > Any idea of when 3.2 is coming?
> >
> > On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
> >  wrote:
> >
> > No problem. Asking questions is the purpose of mailing lists. :)
> >
> > The issue will be fixed in next version of examples branch.
> >
> > Examples branch is build with CP dependency and not with Kafka
> > dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0.
> > Nevertheless, they should work with Kafka dependency, too. I never
> > tried it, but you should give it a shot...
> >
> > But you should use example master branch because of API changes
> > from 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to 0.10.1.0 will
> > not be compatible and not compile, while changing CP-3.2-SNAPSHOT
> > to 0.10.1.0 should work -- hopefully ;) )
> >
> >
> > -Matthias
> >
> > On 10/20/16 4:02 PM, Mohit Anchlia wrote:
>  So this issue I am seeing is fixed in the next version of
>  example branch? Can I change my pom to point it the higher
>  version of Kafka if that is the issue? Or do I need to wait
>  until new branch is made available? Sorry lot of questions
>  :)
> 
>  On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
>   wrote:
> 
>  The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many
>  zeros and ones -- super easy to mix up)
> 
>  However, examples master branch uses CP-3.1-SNAPSHOT (ie,
>  Kafka 0.10.1.0) -- there will be a 0.10.1 examples branch,
>  after CP-3.1 was released
> 
> 
>  -Matthias
> 
>  On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> >>> I just now cloned this repo. It seems to be using 10.1
> >>>
> >>> https://github.com/confluentinc/examples and running
> >>> examples in
> >>> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-
> 3.0
> >
> >>>
> .1/
> 
> >>>
> > kafka-streams
> >>>
> >>> On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll
> >>>  wrote:
> >>>
>  I suspect you are running Kafka 0.10.0.x on Windows?
>  If so, this is a known issue that is fixed in Kafka
>  0.10.1 that was just released today.
> 
>  Also: which examples are you referring to?  And, to
>  confirm: which git branch / Kafka version / OS in
>  case my guess above was wrong.
> 
> 
>  On Thursday, October 20, 2016, Mohit Anchlia
>   wrote:
> 
> > I am trying to run the examples from git. While
> > running the wordcount example I see this error:
> >
> > Caused by: *java.lang.RuntimeException*:
> > librocksdbjni-win64.dll was not found inside JAR.
> >
> >
> > Am I expected to include this jar locally?
> >
> 
> 
>  -- *Michael G. Noll* Product Manager | Confluent +1
>  650 453 5860 | @miguno 
>  Follow us: Twitter 
>  | Blog 
> 
> >>>
> >
> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYDDbqAAoJECnhiMLycopPwVUP/RVRc1XjpUYt3aX4gHOw8eXq
> 3n4BwhxOyDNvWLSgkc+HsQmVxVdJOToN8ELRut7xXci7Z65p4J8llXhazO/8rs5N
> ZfW5nMfdgH82388UqizcNQ6BXeI89/nffZ85wL3S+b8NStC0YxpX2JoNPK1HydbM
> cPfgmAjuTsUpKRbHuUocGQK3qOROHk7nX7n75PdzdXRfJvtNVat2t9/uzEbuQb7H
> g1KtZKDEizCpKO6wFBgEr/K7Y0LUqvWPFA5PmsopBmg+ghBmwnbAUQl4M8MMYD02
> 5clTYDIv/t+ff9jUPBiIxc+i0y/2UH5GBBabZ/bIEmjmy2taabnpL9PZl+dHTm1h
> P3kqI+yiz4qstwzaYVb4er7vHv7LiahqIEKjoivtf7ZBWPC1mlISC3K8ZATV+0w3
> RdJ+7Ly1iUbPPNjrRfTeDAqT55CnRJYEyRzTeGR6MuwnDj7pZHGuZ0G8XPlPgmHs
> ucEqA3cOStcdMw83gM0bUezul4guaoR8Paj4Ky9E1JtMo1UjMWzfIGLVlqFAG7OB
> zNyq+xp+NoCXg6hZS9iU45fgWEx4vXfgRIC2sqIZRWL37CbAgR4WMqJ9TCn6Dc7A
> ZV/5q8Nr+dgWFia5i8fwvOoSeKLrydLo9BACJd9wnYDur3qx3euONsOxjQJ6it6K
> 1ABJ8pskAOdMiXQDtr+M
> =xB9H
> -END PGP SIGNATURE-
>


Re: Kafka Streaming

2016-10-22 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Current version is 3.0.1
CP 3.1 should be release the next weeks

So CP 3.2 should be there is about 4 month (Kafka follows a time base
release cycle of 4 month and CP usually aligns with Kafka releases)

- -Matthias


On 10/20/16 5:10 PM, Mohit Anchlia wrote:
> Any idea of when 3.2 is coming?
> 
> On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
>  wrote:
> 
> No problem. Asking questions is the purpose of mailing lists. :)
> 
> The issue will be fixed in next version of examples branch.
> 
> Examples branch is build with CP dependency and not with Kafka 
> dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0. 
> Nevertheless, they should work with Kafka dependency, too. I never 
> tried it, but you should give it a shot...
> 
> But you should use example master branch because of API changes
> from 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to 0.10.1.0 will
> not be compatible and not compile, while changing CP-3.2-SNAPSHOT
> to 0.10.1.0 should work -- hopefully ;) )
> 
> 
> -Matthias
> 
> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
 So this issue I am seeing is fixed in the next version of
 example branch? Can I change my pom to point it the higher
 version of Kafka if that is the issue? Or do I need to wait
 until new branch is made available? Sorry lot of questions
 :)
 
 On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax 
  wrote:
 
 The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many
 zeros and ones -- super easy to mix up)
 
 However, examples master branch uses CP-3.1-SNAPSHOT (ie,
 Kafka 0.10.1.0) -- there will be a 0.10.1 examples branch,
 after CP-3.1 was released
 
 
 -Matthias
 
 On 10/20/16 3:48 PM, Mohit Anchlia wrote:
>>> I just now cloned this repo. It seems to be using 10.1
>>> 
>>> https://github.com/confluentinc/examples and running
>>> examples in 
>>> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-
3.0
>
>>> 
.1/
 
>>> 
> kafka-streams
>>> 
>>> On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll 
>>>  wrote:
>>> 
 I suspect you are running Kafka 0.10.0.x on Windows?
 If so, this is a known issue that is fixed in Kafka
 0.10.1 that was just released today.
 
 Also: which examples are you referring to?  And, to 
 confirm: which git branch / Kafka version / OS in
 case my guess above was wrong.
 
 
 On Thursday, October 20, 2016, Mohit Anchlia 
  wrote:
 
> I am trying to run the examples from git. While
> running the wordcount example I see this error:
> 
> Caused by: *java.lang.RuntimeException*: 
> librocksdbjni-win64.dll was not found inside JAR.
> 
> 
> Am I expected to include this jar locally?
> 
 
 
 -- *Michael G. Noll* Product Manager | Confluent +1
 650 453 5860 | @miguno 
 Follow us: Twitter 
 | Blog 
 
>>> 
> 
 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYDDbqAAoJECnhiMLycopPwVUP/RVRc1XjpUYt3aX4gHOw8eXq
3n4BwhxOyDNvWLSgkc+HsQmVxVdJOToN8ELRut7xXci7Z65p4J8llXhazO/8rs5N
ZfW5nMfdgH82388UqizcNQ6BXeI89/nffZ85wL3S+b8NStC0YxpX2JoNPK1HydbM
cPfgmAjuTsUpKRbHuUocGQK3qOROHk7nX7n75PdzdXRfJvtNVat2t9/uzEbuQb7H
g1KtZKDEizCpKO6wFBgEr/K7Y0LUqvWPFA5PmsopBmg+ghBmwnbAUQl4M8MMYD02
5clTYDIv/t+ff9jUPBiIxc+i0y/2UH5GBBabZ/bIEmjmy2taabnpL9PZl+dHTm1h
P3kqI+yiz4qstwzaYVb4er7vHv7LiahqIEKjoivtf7ZBWPC1mlISC3K8ZATV+0w3
RdJ+7Ly1iUbPPNjrRfTeDAqT55CnRJYEyRzTeGR6MuwnDj7pZHGuZ0G8XPlPgmHs
ucEqA3cOStcdMw83gM0bUezul4guaoR8Paj4Ky9E1JtMo1UjMWzfIGLVlqFAG7OB
zNyq+xp+NoCXg6hZS9iU45fgWEx4vXfgRIC2sqIZRWL37CbAgR4WMqJ9TCn6Dc7A
ZV/5q8Nr+dgWFia5i8fwvOoSeKLrydLo9BACJd9wnYDur3qx3euONsOxjQJ6it6K
1ABJ8pskAOdMiXQDtr+M
=xB9H
-END PGP SIGNATURE-


Re: Kafka Streaming

2016-10-20 Thread Mohit Anchlia
Any idea of when 3.2 is coming?

On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> No problem. Asking questions is the purpose of mailing lists. :)
>
> The issue will be fixed in next version of examples branch.
>
> Examples branch is build with CP dependency and not with Kafka
> dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0.
> Nevertheless, they should work with Kafka dependency, too. I never
> tried it, but you should give it a shot...
>
> But you should use example master branch because of API changes from
> 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to 0.10.1.0 will not be
> compatible and not compile, while changing CP-3.2-SNAPSHOT to 0.10.1.0
> should work -- hopefully ;) )
>
>
> - -Matthias
>
> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
> > So this issue I am seeing is fixed in the next version of example
> > branch? Can I change my pom to point it the higher version of Kafka
> > if that is the issue? Or do I need to wait until new branch is made
> > available? Sorry lot of questions :)
> >
> > On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
> >  wrote:
> >
> > The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros
> > and ones -- super easy to mix up)
> >
> > However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka
> > 0.10.1.0) -- there will be a 0.10.1 examples branch, after CP-3.1
> > was released
> >
> >
> > -Matthias
> >
> > On 10/20/16 3:48 PM, Mohit Anchlia wrote:
>  I just now cloned this repo. It seems to be using 10.1
> 
>  https://github.com/confluentinc/examples and running examples
>  in
>  https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0
> .1/
> >
> 
> kafka-streams
> 
>  On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll
>   wrote:
> 
> > I suspect you are running Kafka 0.10.0.x on Windows?  If
> > so, this is a known issue that is fixed in Kafka 0.10.1
> > that was just released today.
> >
> > Also: which examples are you referring to?  And, to
> > confirm: which git branch / Kafka version / OS in case my
> > guess above was wrong.
> >
> >
> > On Thursday, October 20, 2016, Mohit Anchlia
> >  wrote:
> >
> >> I am trying to run the examples from git. While running
> >> the wordcount example I see this error:
> >>
> >> Caused by: *java.lang.RuntimeException*:
> >> librocksdbjni-win64.dll was not found inside JAR.
> >>
> >>
> >> Am I expected to include this jar locally?
> >>
> >
> >
> > -- *Michael G. Noll* Product Manager | Confluent +1 650 453
> > 5860 | @miguno  Follow us:
> > Twitter  | Blog
> > 
> >
> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYCVj+AAoJECnhiMLycopPisIP/RM+q1jJt04LeurXueqlAN8x
> 9Ndj/yxdHAelEe62g9JyjxqDoBn+pKkBCJXJxWqpoJUahQZ2YyCN8vA1X7w0eJnn
> /QO6y14KtB5vKI3LA7YMbxSnrP9Vuc092TMzOdiLE56SqWRneVxZlKWPPgcoi5a9
> 8BGSp/riX9ODWuUk55vYKUKSMomGPHqqMIq1s3s4ypWPWVx0Tmya5s/TyEpmxNM4
> X4NTxCUWhvlW+fnCHfQKzpGE+tc+6GfcoK+B8Dr4SINRXUlehhrLr4fqJ3JgqpRh
> ONo+2h9ANyWAZF8pMRbN0WWYhYEPHpQkOoP476jcbArniI1pSKvgXFrkhrjs4bWy
> gyo1ECL5X+UMtewYJq7iyRqU3HO1iaRIICm7mwq13KvN3U6Brxwu85c03qhv5oZj
> hF2Yz4+JaCtfpp5A4dneq4aJ7eh70FiV2IUUPQiq5+iZTkJshH21Td6QZHkOcGUD
> gyX062AC3+RQEFPRQmcqOdeIcHun7rxTJjRB7rBD0XpZ4acZweU3PfumsvfUvKjW
> D54lZB0bcqUj6HJjc4HUfu/SsTAehmAu2+V4iz188gnmDXINckUMsRN3mRnxKTSO
> 95ypzNM0XZ042U2nCUM0vIeb6sd5vL57u3J9Y4dhj697OZxfMizTyN74A/6zNq0B
> LRCZjJWbdN4ixWtH0OXW
> =8OwT
> -END PGP SIGNATURE-
>


Re: Kafka Streaming

2016-10-20 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

I mixed up version numbers... Current CP version is 3.0.1 and not 3.1

Ie. 3.0.1 contains Kafka 0.10.0.1 and 3.1 will be released soon will
contain Kafka 0.10.1.0

examples master uses CP-3.1-SNAPSHOT.


Sorry for the confusion.



On 10/20/16 4:53 PM, Matthias J. Sax wrote:
> No problem. Asking questions is the purpose of mailing lists. :)
> 
> The issue will be fixed in next version of examples branch.
> 
> Examples branch is build with CP dependency and not with Kafka 
> dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0. 
> Nevertheless, they should work with Kafka dependency, too. I never 
> tried it, but you should give it a shot...
> 
> But you should use example master branch because of API changes
> from 0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to 0.10.1.0 will
> not be compatible and not compile, while changing CP-3.2-SNAPSHOT
> to 0.10.1.0 should work -- hopefully ;) )
> 
> 
> -Matthias
> 
> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
>> So this issue I am seeing is fixed in the next version of
>> example branch? Can I change my pom to point it the higher
>> version of Kafka if that is the issue? Or do I need to wait until
>> new branch is made available? Sorry lot of questions :)
> 
>> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax 
>>  wrote:
> 
>> The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros 
>> and ones -- super easy to mix up)
> 
>> However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka 
>> 0.10.1.0) -- there will be a 0.10.1 examples branch, after
>> CP-3.1 was released
> 
> 
>> -Matthias
> 
>> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> I just now cloned this repo. It seems to be using 10.1
> 
> https://github.com/confluentinc/examples and running
> examples in 
> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.
0
>
> 
.1/
> 
> 
> kafka-streams
> 
> On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll 
>  wrote:
> 
>> I suspect you are running Kafka 0.10.0.x on Windows?  If 
>> so, this is a known issue that is fixed in Kafka 0.10.1 
>> that was just released today.
>> 
>> Also: which examples are you referring to?  And, to 
>> confirm: which git branch / Kafka version / OS in case
>> my guess above was wrong.
>> 
>> 
>> On Thursday, October 20, 2016, Mohit Anchlia 
>>  wrote:
>> 
>>> I am trying to run the examples from git. While
>>> running the wordcount example I see this error:
>>> 
>>> Caused by: *java.lang.RuntimeException*: 
>>> librocksdbjni-win64.dll was not found inside JAR.
>>> 
>>> 
>>> Am I expected to include this jar locally?
>>> 
>> 
>> 
>> -- *Michael G. Noll* Product Manager | Confluent +1 650
>> 453 5860 | @miguno  Follow
>> us: Twitter  | Blog 
>> 
>> 
> 
>>> 
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYCVxhAAoJECnhiMLycopPLEMP+weBrur1974PcV5lbm4EmAfU
AuX7FgaDQfAExyd7uC0j0yi1QPD7ciLyVUrmHmKwMitQpQOU8iS8W4buRK4lPQuV
Lo0h3Y7xdXlxy0xuwUxHhQo2Wns7sjwxjD6I3Z/1ttWi53B8uyiBkt7UcR+YfoCO
kzaL+6bQr2jNRUn8MAzlhbDRuJdzayum+CimprqkiRYufed2TbHYjJhVeLoiIvqQ
fY5OZJ9O5LQKg72eNfBBkVHT6Lcels7jNRDjq1qhcCVItafqYwvIuVIPKMfaz7Bz
jOK0SuPGTmKooj2KzHpZ0PGHFcIfIZij/ZIpAi9mOiINfYISdENtlJKXVM0KlmAA
RaG+KhRDqtb2veKDgdTWAfRjMPuQtKWBCZcfWc4E8simB49pSYimmpQR58KNSJ34
wGRfqeqgKqK4fqXjzLZicF9Mw6EWDV7jxoksSkGg5UGn6PYLvlQChx6MsnHcYoog
qQhZpUr5D2j/HsSeRwFijEL8D9Efb90MuRmqYwm0+cXKGbpa/4CSemP6N2EjJJcM
tHU9hamj5WIFZCgvCYacHW+s+RB+IzSTjsa9rvTVgRKZpXgDJnP5yg2+Yz7qXjNr
ied7bm4kee6hCvkQmin0FLs2/PDKmyaPAKK+z030ixrz52xIOmtlujBRetB2RvYw
CsG4SEOD9Lr7/zM5wnOV
=XeAV
-END PGP SIGNATURE-


Re: Kafka Streaming

2016-10-20 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

No problem. Asking questions is the purpose of mailing lists. :)

The issue will be fixed in next version of examples branch.

Examples branch is build with CP dependency and not with Kafka
dependency. CP-3.2 is not available yet; only Kafka 0.10.1.0.
Nevertheless, they should work with Kafka dependency, too. I never
tried it, but you should give it a shot...

But you should use example master branch because of API changes from
0.10.0.x to 0.10.1 (and thus, changing CP-3.1 to 0.10.1.0 will not be
compatible and not compile, while changing CP-3.2-SNAPSHOT to 0.10.1.0
should work -- hopefully ;) )


- -Matthias

On 10/20/16 4:02 PM, Mohit Anchlia wrote:
> So this issue I am seeing is fixed in the next version of example
> branch? Can I change my pom to point it the higher version of Kafka
> if that is the issue? Or do I need to wait until new branch is made
> available? Sorry lot of questions :)
> 
> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
>  wrote:
> 
> The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros
> and ones -- super easy to mix up)
> 
> However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka 
> 0.10.1.0) -- there will be a 0.10.1 examples branch, after CP-3.1
> was released
> 
> 
> -Matthias
> 
> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
 I just now cloned this repo. It seems to be using 10.1
 
 https://github.com/confluentinc/examples and running examples
 in 
 https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0
.1/
>
 
kafka-streams
 
 On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll 
  wrote:
 
> I suspect you are running Kafka 0.10.0.x on Windows?  If
> so, this is a known issue that is fixed in Kafka 0.10.1
> that was just released today.
> 
> Also: which examples are you referring to?  And, to
> confirm: which git branch / Kafka version / OS in case my
> guess above was wrong.
> 
> 
> On Thursday, October 20, 2016, Mohit Anchlia 
>  wrote:
> 
>> I am trying to run the examples from git. While running
>> the wordcount example I see this error:
>> 
>> Caused by: *java.lang.RuntimeException*: 
>> librocksdbjni-win64.dll was not found inside JAR.
>> 
>> 
>> Am I expected to include this jar locally?
>> 
> 
> 
> -- *Michael G. Noll* Product Manager | Confluent +1 650 453
> 5860 | @miguno  Follow us:
> Twitter  | Blog 
> 
> 
 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYCVj+AAoJECnhiMLycopPisIP/RM+q1jJt04LeurXueqlAN8x
9Ndj/yxdHAelEe62g9JyjxqDoBn+pKkBCJXJxWqpoJUahQZ2YyCN8vA1X7w0eJnn
/QO6y14KtB5vKI3LA7YMbxSnrP9Vuc092TMzOdiLE56SqWRneVxZlKWPPgcoi5a9
8BGSp/riX9ODWuUk55vYKUKSMomGPHqqMIq1s3s4ypWPWVx0Tmya5s/TyEpmxNM4
X4NTxCUWhvlW+fnCHfQKzpGE+tc+6GfcoK+B8Dr4SINRXUlehhrLr4fqJ3JgqpRh
ONo+2h9ANyWAZF8pMRbN0WWYhYEPHpQkOoP476jcbArniI1pSKvgXFrkhrjs4bWy
gyo1ECL5X+UMtewYJq7iyRqU3HO1iaRIICm7mwq13KvN3U6Brxwu85c03qhv5oZj
hF2Yz4+JaCtfpp5A4dneq4aJ7eh70FiV2IUUPQiq5+iZTkJshH21Td6QZHkOcGUD
gyX062AC3+RQEFPRQmcqOdeIcHun7rxTJjRB7rBD0XpZ4acZweU3PfumsvfUvKjW
D54lZB0bcqUj6HJjc4HUfu/SsTAehmAu2+V4iz188gnmDXINckUMsRN3mRnxKTSO
95ypzNM0XZ042U2nCUM0vIeb6sd5vL57u3J9Y4dhj697OZxfMizTyN74A/6zNq0B
LRCZjJWbdN4ixWtH0OXW
=8OwT
-END PGP SIGNATURE-


Re: Kafka Streaming

2016-10-20 Thread Mohit Anchlia
So this issue I am seeing is fixed in the next version of example branch?
Can I change my pom to point it the higher version of Kafka if that is the
issue? Or do I need to wait until new branch is made available? Sorry lot
of questions :)

On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros and
> ones -- super easy to mix up)
>
> However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka
> 0.10.1.0) -- there will be a 0.10.1 examples branch, after CP-3.1 was
> released
>
>
> - -Matthias
>
> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> > I just now cloned this repo. It seems to be using 10.1
> >
> > https://github.com/confluentinc/examples and running examples in
> > https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/
> kafka-streams
> >
> >  On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll
> >  wrote:
> >
> >> I suspect you are running Kafka 0.10.0.x on Windows?  If so, this
> >> is a known issue that is fixed in Kafka 0.10.1 that was just
> >> released today.
> >>
> >> Also: which examples are you referring to?  And, to confirm:
> >> which git branch / Kafka version / OS in case my guess above was
> >> wrong.
> >>
> >>
> >> On Thursday, October 20, 2016, Mohit Anchlia
> >>  wrote:
> >>
> >>> I am trying to run the examples from git. While running the
> >>> wordcount example I see this error:
> >>>
> >>> Caused by: *java.lang.RuntimeException*:
> >>> librocksdbjni-win64.dll was not found inside JAR.
> >>>
> >>>
> >>> Am I expected to include this jar locally?
> >>>
> >>
> >>
> >> -- *Michael G. Noll* Product Manager | Confluent +1 650 453 5860
> >> | @miguno  Follow us: Twitter
> >>  | Blog
> >> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYCUusAAoJECnhiMLycopPnb4P/2QDApIhfTan1UkI2zojrog9
> dF62bzkJYgTy8qPO3m+NrbYISRN7FdjidLnGeyR4f5VlzNxA5UpMnjsXJmYgKLPQ
> zNU8Ubraz6vG9jrVvTueFRf9SGMLNlqcKqR0M84l2vBzWOXUzDWTTTD08sF153ie
> rgH2JbRfhzs1C9reCWkGc9Gmh+dwVHZw/49kzEzHW/l0x65n0xylSaaBM4bNUBoZ
> 0l/BHf9se/FigZO4XqEl2oIrLOxmyt6bbe4c6pc3ITbDFLq+3hkckZkRODwKE7l8
> ftCJEPaKcQ1lN/kz742YeroH/O+n6ciTE9eaUuEPyfH/Kw87wMkqTwjq+2AHf9kw
> yrQg/ucWafMfkY31FeoxKsWJ2hPlSobDuiYf7nEbD0PP7klr3sR3qgaqgpUvBrpI
> SULyp3gWGtxSiFNyN6G41j4WoxgYB8dtfjJolb+BltDpncvg3wbKe+0Tto2NfrHc
> QYWtSkII5dMrK9m39czAIXLdIAsqJJmPotvgtOV2SXyG5ibUC1IQo9sSEVU3hKud
> AI0FeNv56MjibaeWpWmdex3g+l9PdI7z8SpQLBymUSvY/F0bnAUylrfVA7HDJ0Dz
> P4q252BcL7A5i+ltpoC3dAj9uUS0JEQ67qwORqQnOf7EqyqfFczX1/MVQXWT5KJz
> k4t8IJgyi1aOMGb5t+oL
> =jH2Q
> -END PGP SIGNATURE-
>


Re: Kafka Streaming

2016-10-20 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

The branch is 0.10.0.1 and not 0.10.1.0 (sorry for so many zeros and
ones -- super easy to mix up)

However, examples master branch uses CP-3.1-SNAPSHOT (ie, Kafka
0.10.1.0) -- there will be a 0.10.1 examples branch, after CP-3.1 was
released


- -Matthias

On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> I just now cloned this repo. It seems to be using 10.1
> 
> https://github.com/confluentinc/examples and running examples in 
> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/
kafka-streams
>
>  On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll
>  wrote:
> 
>> I suspect you are running Kafka 0.10.0.x on Windows?  If so, this
>> is a known issue that is fixed in Kafka 0.10.1 that was just
>> released today.
>> 
>> Also: which examples are you referring to?  And, to confirm:
>> which git branch / Kafka version / OS in case my guess above was
>> wrong.
>> 
>> 
>> On Thursday, October 20, 2016, Mohit Anchlia
>>  wrote:
>> 
>>> I am trying to run the examples from git. While running the
>>> wordcount example I see this error:
>>> 
>>> Caused by: *java.lang.RuntimeException*:
>>> librocksdbjni-win64.dll was not found inside JAR.
>>> 
>>> 
>>> Am I expected to include this jar locally?
>>> 
>> 
>> 
>> -- *Michael G. Noll* Product Manager | Confluent +1 650 453 5860
>> | @miguno  Follow us: Twitter
>>  | Blog 
>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYCUusAAoJECnhiMLycopPnb4P/2QDApIhfTan1UkI2zojrog9
dF62bzkJYgTy8qPO3m+NrbYISRN7FdjidLnGeyR4f5VlzNxA5UpMnjsXJmYgKLPQ
zNU8Ubraz6vG9jrVvTueFRf9SGMLNlqcKqR0M84l2vBzWOXUzDWTTTD08sF153ie
rgH2JbRfhzs1C9reCWkGc9Gmh+dwVHZw/49kzEzHW/l0x65n0xylSaaBM4bNUBoZ
0l/BHf9se/FigZO4XqEl2oIrLOxmyt6bbe4c6pc3ITbDFLq+3hkckZkRODwKE7l8
ftCJEPaKcQ1lN/kz742YeroH/O+n6ciTE9eaUuEPyfH/Kw87wMkqTwjq+2AHf9kw
yrQg/ucWafMfkY31FeoxKsWJ2hPlSobDuiYf7nEbD0PP7klr3sR3qgaqgpUvBrpI
SULyp3gWGtxSiFNyN6G41j4WoxgYB8dtfjJolb+BltDpncvg3wbKe+0Tto2NfrHc
QYWtSkII5dMrK9m39czAIXLdIAsqJJmPotvgtOV2SXyG5ibUC1IQo9sSEVU3hKud
AI0FeNv56MjibaeWpWmdex3g+l9PdI7z8SpQLBymUSvY/F0bnAUylrfVA7HDJ0Dz
P4q252BcL7A5i+ltpoC3dAj9uUS0JEQ67qwORqQnOf7EqyqfFczX1/MVQXWT5KJz
k4t8IJgyi1aOMGb5t+oL
=jH2Q
-END PGP SIGNATURE-


Re: Kafka Streaming

2016-10-20 Thread Mohit Anchlia
I just now cloned this repo. It seems to be using 10.1

https://github.com/confluentinc/examples and running examples in
https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/kafka-streams

On Thu, Oct 20, 2016 at 3:10 PM, Michael Noll  wrote:

> I suspect you are running Kafka 0.10.0.x on Windows?  If so, this is a
> known issue that is fixed in Kafka 0.10.1 that was just released today.
>
> Also: which examples are you referring to?  And, to confirm: which git
> branch / Kafka version / OS in case my guess above was wrong.
>
>
> On Thursday, October 20, 2016, Mohit Anchlia 
> wrote:
>
> > I am trying to run the examples from git. While running the wordcount
> > example I see this error:
> >
> > Caused by: *java.lang.RuntimeException*: librocksdbjni-win64.dll was not
> > found inside JAR.
> >
> >
> > Am I expected to include this jar locally?
> >
>
>
> --
> *Michael G. Noll*
> Product Manager | Confluent
> +1 650 453 5860 | @miguno 
> Follow us: Twitter  | Blog
> 
>


Re: Kafka Streaming

2016-10-20 Thread Michael Noll
I suspect you are running Kafka 0.10.0.x on Windows?  If so, this is a
known issue that is fixed in Kafka 0.10.1 that was just released today.

Also: which examples are you referring to?  And, to confirm: which git
branch / Kafka version / OS in case my guess above was wrong.


On Thursday, October 20, 2016, Mohit Anchlia  wrote:

> I am trying to run the examples from git. While running the wordcount
> example I see this error:
>
> Caused by: *java.lang.RuntimeException*: librocksdbjni-win64.dll was not
> found inside JAR.
>
>
> Am I expected to include this jar locally?
>


-- 
*Michael G. Noll*
Product Manager | Confluent
+1 650 453 5860 | @miguno 
Follow us: Twitter  | Blog



Kafka Streaming

2016-10-20 Thread Mohit Anchlia
I am trying to run the examples from git. While running the wordcount
example I see this error:

Caused by: *java.lang.RuntimeException*: librocksdbjni-win64.dll was not
found inside JAR.


Am I expected to include this jar locally?


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-12 Thread Guozhang Wang
Haha, I feel the same pain with you man.

On Tue, Oct 11, 2016 at 8:59 PM, Ali Akhtar  wrote:

> Thanks. That filter() method is a good solution. But whenever I look at it,
> I feel an empty spot in my heart which can only be filled by:
> filter(Optional::isPresent)
>
> On Wed, Oct 12, 2016 at 12:15 AM, Guozhang Wang 
> wrote:
>
> > Ali,
> >
> > We are working on moving from Java7 to Java8 in Apache Kafka, and the
> > Streams client is one of the motivations doing so. Stay tuned on the
> > mailing list when it will come.
> >
> > Currently Streams won't automatically filter out null values for you
> since
> > in some other cases they may have semantic meanings and cannot be simply
> > ignored; you can, though, apply a simple filter such like "filter((key,
> > value) => value != null)" before your processor lambda operator, if it
> > looks clearer in your code.
> >
> > Guozhang
> >
> >
> > On Sun, Oct 9, 2016 at 3:14 PM, Ali Akhtar  wrote:
> >
> > > It isn't a fatal error. It should be logged as a warning, and then the
> > > stream should be continued w/ the next message.
> > >
> > > Checking for null is 'ok', in the sense that it gets the job done, but
> > > after java 8's release, we really should be using optionals.
> > >
> > > Hopefully we can break compatibility w/ the bad old days soon and move
> > into
> > > the future.
> > >
> > > (If there's a way to do the null check automatically, i.e before
> calling
> > > the lambda, please let me know).
> > >
> > > On Sun, Oct 9, 2016 at 11:14 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Ali,
> > > >
> > > > In your scenario, if serde fails to parse the bytes should that be
> > > treated
> > > > as a fatal failure or it is expected?
> > > >
> > > > In the former case, instead of returning a null I think it is better
> to
> > > > throw a runtime exception in order to let the whole client to stop
> and
> > > > notify the error; in the latter case, returning and checking null
> looks
> > > > fine to me.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Oct 7, 2016 at 3:12 PM, Ali Akhtar 
> > wrote:
> > > >
> > > > > Hey G,
> > > > >
> > > > > Looks like the only difference is a valueSerde parameter.
> > > > >
> > > > > How does that prevent having to look for nulls in the consumer?
> > > > >
> > > > > E.g, I wrote a custom Serde which converts the messages (which are
> > json
> > > > > strings) into a Java class using Jackson.
> > > > >
> > > > > If the json parse fails, it sends back a null.
> > > > >
> > > > > When I'm reading this stream, in my callback, how would I prevent
> > > having
> > > > to
> > > > > check if the serialized value isn't null?
> > > > >
> > > > > On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang 
> > > > wrote:
> > > > >
> > > > > > Hello Ali,
> > > > > >
> > > > > > We do have corresponding overloaded functions for most of
> KStream /
> > > > > KTable
> > > > > > operators to avoid enforcing users to specify "null"; in these
> > cases
> > > > the
> > > > > > default serdes specified in the configs are then used. For
> example:
> > > > > >
> > > > > >  KTable aggregate(Initializer initializer,
> > > > > >Aggregator adder,
> > > > > >Aggregator subtractor,
> > > > > >Serde aggValueSerde,
> > > > > >String storeName);
> > > > > >
> > > > > > /**
> > > > > >  * .. using default serializers and deserializers.
> > > > > >  */
> > > > > >  KTable aggregate(Initializer initializer,
> > > > > >Aggregator adder,
> > > > > >Aggregator subtractor,
> > > > > >String storeName);
> > > > > >
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll <
> mich...@confluent.io
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Ali, the Apache Kafka project still targets Java 7, which means
> > we
> > > > > can't
> > > > > > > use Java 8 features just yet.
> > > > > > >
> > > > > > > FYI: There's on ongoing conversation about when Kafka would
> move
> > > from
> > > > > > Java
> > > > > > > 7 to Java 8.
> > > > > > >
> > > > > > > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar <
> ali.rac...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Since we're using Java 8 in most cases anyway, Serdes /
> > > > Serialiazers
> > > > > > > should
> > > > > > > > use options, to avoid having to deal with the lovely nulls.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-11 Thread Ali Akhtar
Thanks. That filter() method is a good solution. But whenever I look at it,
I feel an empty spot in my heart which can only be filled by:
filter(Optional::isPresent)

On Wed, Oct 12, 2016 at 12:15 AM, Guozhang Wang  wrote:

> Ali,
>
> We are working on moving from Java7 to Java8 in Apache Kafka, and the
> Streams client is one of the motivations doing so. Stay tuned on the
> mailing list when it will come.
>
> Currently Streams won't automatically filter out null values for you since
> in some other cases they may have semantic meanings and cannot be simply
> ignored; you can, though, apply a simple filter such like "filter((key,
> value) => value != null)" before your processor lambda operator, if it
> looks clearer in your code.
>
> Guozhang
>
>
> On Sun, Oct 9, 2016 at 3:14 PM, Ali Akhtar  wrote:
>
> > It isn't a fatal error. It should be logged as a warning, and then the
> > stream should be continued w/ the next message.
> >
> > Checking for null is 'ok', in the sense that it gets the job done, but
> > after java 8's release, we really should be using optionals.
> >
> > Hopefully we can break compatibility w/ the bad old days soon and move
> into
> > the future.
> >
> > (If there's a way to do the null check automatically, i.e before calling
> > the lambda, please let me know).
> >
> > On Sun, Oct 9, 2016 at 11:14 PM, Guozhang Wang 
> wrote:
> >
> > > Ali,
> > >
> > > In your scenario, if serde fails to parse the bytes should that be
> > treated
> > > as a fatal failure or it is expected?
> > >
> > > In the former case, instead of returning a null I think it is better to
> > > throw a runtime exception in order to let the whole client to stop and
> > > notify the error; in the latter case, returning and checking null looks
> > > fine to me.
> > >
> > >
> > > Guozhang
> > >
> > > On Fri, Oct 7, 2016 at 3:12 PM, Ali Akhtar 
> wrote:
> > >
> > > > Hey G,
> > > >
> > > > Looks like the only difference is a valueSerde parameter.
> > > >
> > > > How does that prevent having to look for nulls in the consumer?
> > > >
> > > > E.g, I wrote a custom Serde which converts the messages (which are
> json
> > > > strings) into a Java class using Jackson.
> > > >
> > > > If the json parse fails, it sends back a null.
> > > >
> > > > When I'm reading this stream, in my callback, how would I prevent
> > having
> > > to
> > > > check if the serialized value isn't null?
> > > >
> > > > On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang 
> > > wrote:
> > > >
> > > > > Hello Ali,
> > > > >
> > > > > We do have corresponding overloaded functions for most of KStream /
> > > > KTable
> > > > > operators to avoid enforcing users to specify "null"; in these
> cases
> > > the
> > > > > default serdes specified in the configs are then used. For example:
> > > > >
> > > > >  KTable aggregate(Initializer initializer,
> > > > >Aggregator adder,
> > > > >Aggregator subtractor,
> > > > >Serde aggValueSerde,
> > > > >String storeName);
> > > > >
> > > > > /**
> > > > >  * .. using default serializers and deserializers.
> > > > >  */
> > > > >  KTable aggregate(Initializer initializer,
> > > > >Aggregator adder,
> > > > >Aggregator subtractor,
> > > > >String storeName);
> > > > >
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll  >
> > > > wrote:
> > > > >
> > > > > > Ali, the Apache Kafka project still targets Java 7, which means
> we
> > > > can't
> > > > > > use Java 8 features just yet.
> > > > > >
> > > > > > FYI: There's on ongoing conversation about when Kafka would move
> > from
> > > > > Java
> > > > > > 7 to Java 8.
> > > > > >
> > > > > > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar  >
> > > > wrote:
> > > > > >
> > > > > > > Since we're using Java 8 in most cases anyway, Serdes /
> > > Serialiazers
> > > > > > should
> > > > > > > use options, to avoid having to deal with the lovely nulls.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-11 Thread Guozhang Wang
Ali,

We are working on moving from Java7 to Java8 in Apache Kafka, and the
Streams client is one of the motivations doing so. Stay tuned on the
mailing list when it will come.

Currently Streams won't automatically filter out null values for you since
in some other cases they may have semantic meanings and cannot be simply
ignored; you can, though, apply a simple filter such like "filter((key,
value) => value != null)" before your processor lambda operator, if it
looks clearer in your code.

Guozhang


On Sun, Oct 9, 2016 at 3:14 PM, Ali Akhtar  wrote:

> It isn't a fatal error. It should be logged as a warning, and then the
> stream should be continued w/ the next message.
>
> Checking for null is 'ok', in the sense that it gets the job done, but
> after java 8's release, we really should be using optionals.
>
> Hopefully we can break compatibility w/ the bad old days soon and move into
> the future.
>
> (If there's a way to do the null check automatically, i.e before calling
> the lambda, please let me know).
>
> On Sun, Oct 9, 2016 at 11:14 PM, Guozhang Wang  wrote:
>
> > Ali,
> >
> > In your scenario, if serde fails to parse the bytes should that be
> treated
> > as a fatal failure or it is expected?
> >
> > In the former case, instead of returning a null I think it is better to
> > throw a runtime exception in order to let the whole client to stop and
> > notify the error; in the latter case, returning and checking null looks
> > fine to me.
> >
> >
> > Guozhang
> >
> > On Fri, Oct 7, 2016 at 3:12 PM, Ali Akhtar  wrote:
> >
> > > Hey G,
> > >
> > > Looks like the only difference is a valueSerde parameter.
> > >
> > > How does that prevent having to look for nulls in the consumer?
> > >
> > > E.g, I wrote a custom Serde which converts the messages (which are json
> > > strings) into a Java class using Jackson.
> > >
> > > If the json parse fails, it sends back a null.
> > >
> > > When I'm reading this stream, in my callback, how would I prevent
> having
> > to
> > > check if the serialized value isn't null?
> > >
> > > On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang 
> > wrote:
> > >
> > > > Hello Ali,
> > > >
> > > > We do have corresponding overloaded functions for most of KStream /
> > > KTable
> > > > operators to avoid enforcing users to specify "null"; in these cases
> > the
> > > > default serdes specified in the configs are then used. For example:
> > > >
> > > >  KTable aggregate(Initializer initializer,
> > > >Aggregator adder,
> > > >Aggregator subtractor,
> > > >Serde aggValueSerde,
> > > >String storeName);
> > > >
> > > > /**
> > > >  * .. using default serializers and deserializers.
> > > >  */
> > > >  KTable aggregate(Initializer initializer,
> > > >Aggregator adder,
> > > >Aggregator subtractor,
> > > >String storeName);
> > > >
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll 
> > > wrote:
> > > >
> > > > > Ali, the Apache Kafka project still targets Java 7, which means we
> > > can't
> > > > > use Java 8 features just yet.
> > > > >
> > > > > FYI: There's on ongoing conversation about when Kafka would move
> from
> > > > Java
> > > > > 7 to Java 8.
> > > > >
> > > > > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar 
> > > wrote:
> > > > >
> > > > > > Since we're using Java 8 in most cases anyway, Serdes /
> > Serialiazers
> > > > > should
> > > > > > use options, to avoid having to deal with the lovely nulls.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-09 Thread Ali Akhtar
It isn't a fatal error. It should be logged as a warning, and then the
stream should be continued w/ the next message.

Checking for null is 'ok', in the sense that it gets the job done, but
after java 8's release, we really should be using optionals.

Hopefully we can break compatibility w/ the bad old days soon and move into
the future.

(If there's a way to do the null check automatically, i.e before calling
the lambda, please let me know).

On Sun, Oct 9, 2016 at 11:14 PM, Guozhang Wang  wrote:

> Ali,
>
> In your scenario, if serde fails to parse the bytes should that be treated
> as a fatal failure or it is expected?
>
> In the former case, instead of returning a null I think it is better to
> throw a runtime exception in order to let the whole client to stop and
> notify the error; in the latter case, returning and checking null looks
> fine to me.
>
>
> Guozhang
>
> On Fri, Oct 7, 2016 at 3:12 PM, Ali Akhtar  wrote:
>
> > Hey G,
> >
> > Looks like the only difference is a valueSerde parameter.
> >
> > How does that prevent having to look for nulls in the consumer?
> >
> > E.g, I wrote a custom Serde which converts the messages (which are json
> > strings) into a Java class using Jackson.
> >
> > If the json parse fails, it sends back a null.
> >
> > When I'm reading this stream, in my callback, how would I prevent having
> to
> > check if the serialized value isn't null?
> >
> > On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang 
> wrote:
> >
> > > Hello Ali,
> > >
> > > We do have corresponding overloaded functions for most of KStream /
> > KTable
> > > operators to avoid enforcing users to specify "null"; in these cases
> the
> > > default serdes specified in the configs are then used. For example:
> > >
> > >  KTable aggregate(Initializer initializer,
> > >Aggregator adder,
> > >Aggregator subtractor,
> > >Serde aggValueSerde,
> > >String storeName);
> > >
> > > /**
> > >  * .. using default serializers and deserializers.
> > >  */
> > >  KTable aggregate(Initializer initializer,
> > >Aggregator adder,
> > >Aggregator subtractor,
> > >String storeName);
> > >
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll 
> > wrote:
> > >
> > > > Ali, the Apache Kafka project still targets Java 7, which means we
> > can't
> > > > use Java 8 features just yet.
> > > >
> > > > FYI: There's on ongoing conversation about when Kafka would move from
> > > Java
> > > > 7 to Java 8.
> > > >
> > > > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar 
> > wrote:
> > > >
> > > > > Since we're using Java 8 in most cases anyway, Serdes /
> Serialiazers
> > > > should
> > > > > use options, to avoid having to deal with the lovely nulls.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-09 Thread Guozhang Wang
Ali,

In your scenario, if serde fails to parse the bytes should that be treated
as a fatal failure or it is expected?

In the former case, instead of returning a null I think it is better to
throw a runtime exception in order to let the whole client to stop and
notify the error; in the latter case, returning and checking null looks
fine to me.


Guozhang

On Fri, Oct 7, 2016 at 3:12 PM, Ali Akhtar  wrote:

> Hey G,
>
> Looks like the only difference is a valueSerde parameter.
>
> How does that prevent having to look for nulls in the consumer?
>
> E.g, I wrote a custom Serde which converts the messages (which are json
> strings) into a Java class using Jackson.
>
> If the json parse fails, it sends back a null.
>
> When I'm reading this stream, in my callback, how would I prevent having to
> check if the serialized value isn't null?
>
> On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang  wrote:
>
> > Hello Ali,
> >
> > We do have corresponding overloaded functions for most of KStream /
> KTable
> > operators to avoid enforcing users to specify "null"; in these cases the
> > default serdes specified in the configs are then used. For example:
> >
> >  KTable aggregate(Initializer initializer,
> >Aggregator adder,
> >Aggregator subtractor,
> >Serde aggValueSerde,
> >String storeName);
> >
> > /**
> >  * .. using default serializers and deserializers.
> >  */
> >  KTable aggregate(Initializer initializer,
> >Aggregator adder,
> >Aggregator subtractor,
> >String storeName);
> >
> >
> >
> > Guozhang
> >
> >
> > On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll 
> wrote:
> >
> > > Ali, the Apache Kafka project still targets Java 7, which means we
> can't
> > > use Java 8 features just yet.
> > >
> > > FYI: There's on ongoing conversation about when Kafka would move from
> > Java
> > > 7 to Java 8.
> > >
> > > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar 
> wrote:
> > >
> > > > Since we're using Java 8 in most cases anyway, Serdes / Serialiazers
> > > should
> > > > use options, to avoid having to deal with the lovely nulls.
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-07 Thread Ali Akhtar
Hey G,

Looks like the only difference is a valueSerde parameter.

How does that prevent having to look for nulls in the consumer?

E.g, I wrote a custom Serde which converts the messages (which are json
strings) into a Java class using Jackson.

If the json parse fails, it sends back a null.

When I'm reading this stream, in my callback, how would I prevent having to
check if the serialized value isn't null?

On Sat, Oct 8, 2016 at 1:07 AM, Guozhang Wang  wrote:

> Hello Ali,
>
> We do have corresponding overloaded functions for most of KStream / KTable
> operators to avoid enforcing users to specify "null"; in these cases the
> default serdes specified in the configs are then used. For example:
>
>  KTable aggregate(Initializer initializer,
>Aggregator adder,
>Aggregator subtractor,
>Serde aggValueSerde,
>String storeName);
>
> /**
>  * .. using default serializers and deserializers.
>  */
>  KTable aggregate(Initializer initializer,
>Aggregator adder,
>Aggregator subtractor,
>String storeName);
>
>
>
> Guozhang
>
>
> On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll  wrote:
>
> > Ali, the Apache Kafka project still targets Java 7, which means we can't
> > use Java 8 features just yet.
> >
> > FYI: There's on ongoing conversation about when Kafka would move from
> Java
> > 7 to Java 8.
> >
> > On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar  wrote:
> >
> > > Since we're using Java 8 in most cases anyway, Serdes / Serialiazers
> > should
> > > use options, to avoid having to deal with the lovely nulls.
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-07 Thread Guozhang Wang
Hello Ali,

We do have corresponding overloaded functions for most of KStream / KTable
operators to avoid enforcing users to specify "null"; in these cases the
default serdes specified in the configs are then used. For example:

 KTable aggregate(Initializer initializer,
   Aggregator adder,
   Aggregator subtractor,
   Serde aggValueSerde,
   String storeName);

/**
 * .. using default serializers and deserializers.
 */
 KTable aggregate(Initializer initializer,
   Aggregator adder,
   Aggregator subtractor,
   String storeName);



Guozhang


On Fri, Oct 7, 2016 at 9:20 AM, Michael Noll  wrote:

> Ali, the Apache Kafka project still targets Java 7, which means we can't
> use Java 8 features just yet.
>
> FYI: There's on ongoing conversation about when Kafka would move from Java
> 7 to Java 8.
>
> On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar  wrote:
>
> > Since we're using Java 8 in most cases anyway, Serdes / Serialiazers
> should
> > use options, to avoid having to deal with the lovely nulls.
> >
>



-- 
-- Guozhang


Re: In Kafka Streaming, Serdes should use Optionals

2016-10-07 Thread Michael Noll
Ali, the Apache Kafka project still targets Java 7, which means we can't
use Java 8 features just yet.

FYI: There's on ongoing conversation about when Kafka would move from Java
7 to Java 8.

On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar  wrote:

> Since we're using Java 8 in most cases anyway, Serdes / Serialiazers should
> use options, to avoid having to deal with the lovely nulls.
>


In Kafka Streaming, Serdes should use Optionals

2016-10-07 Thread Ali Akhtar
Since we're using Java 8 in most cases anyway, Serdes / Serialiazers should
use options, to avoid having to deal with the lovely nulls.


Re: Kafka streaming and topic filter whitelist

2016-10-03 Thread Damian Guy
They will be automatically added and removed.

On Mon, 3 Oct 2016 at 14:59 Gary Ogden  wrote:

> What if topics are created or deleted after the application has started?
> Will they be added/removed automatically, or do we need to restart the
> application to pick up the changes?
>
> On 1 October 2016 at 04:42, Damian Guy  wrote:
>
> > That is correct.
> >
> > On Fri, 30 Sep 2016 at 18:00 Gary Ogden  wrote:
> >
> > > So how exactly would that work? For example, I can currently do this:
> > >
> > > KStream
> > > textLines = builder.stream(stringSerde, stringSerde,
> > SYSTEM_TOPIC);
> > >
> > > Are you saying that I could put a regex in place of the SYSTEM_TOPIC
> and
> > > that one KStream would be streaming from multiple topics that match
> that
> > > regex?
> > >
> > > If so, that could be useful.
> > >
> > > Gary
> > >
> > >
> > > On 30 September 2016 at 13:35, Damian Guy 
> wrote:
> > >
> > > > Hi Gary,
> > > >
> > > > In the upcoming 0.10.1 release you can do regex subscription - will
> > that
> > > > help?
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Fri, 30 Sep 2016 at 14:57 Gary Ogden  wrote:
> > > >
> > > > > Is it possible to use the topic filter whitelist within a Kafka
> > > Streaming
> > > > > application? Or can it only be done in a consumer job?
> > > > >
> > > >
> > >
> >
>


Re: Kafka streaming and topic filter whitelist

2016-10-03 Thread Gary Ogden
What if topics are created or deleted after the application has started?
Will they be added/removed automatically, or do we need to restart the
application to pick up the changes?

On 1 October 2016 at 04:42, Damian Guy  wrote:

> That is correct.
>
> On Fri, 30 Sep 2016 at 18:00 Gary Ogden  wrote:
>
> > So how exactly would that work? For example, I can currently do this:
> >
> > KStream
> > textLines = builder.stream(stringSerde, stringSerde,
> SYSTEM_TOPIC);
> >
> > Are you saying that I could put a regex in place of the SYSTEM_TOPIC and
> > that one KStream would be streaming from multiple topics that match that
> > regex?
> >
> > If so, that could be useful.
> >
> > Gary
> >
> >
> > On 30 September 2016 at 13:35, Damian Guy  wrote:
> >
> > > Hi Gary,
> > >
> > > In the upcoming 0.10.1 release you can do regex subscription - will
> that
> > > help?
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 30 Sep 2016 at 14:57 Gary Ogden  wrote:
> > >
> > > > Is it possible to use the topic filter whitelist within a Kafka
> > Streaming
> > > > application? Or can it only be done in a consumer job?
> > > >
> > >
> >
>


Re: Kafka streaming and topic filter whitelist

2016-10-01 Thread Damian Guy
That is correct.

On Fri, 30 Sep 2016 at 18:00 Gary Ogden  wrote:

> So how exactly would that work? For example, I can currently do this:
>
> KStream
> textLines = builder.stream(stringSerde, stringSerde, SYSTEM_TOPIC);
>
> Are you saying that I could put a regex in place of the SYSTEM_TOPIC and
> that one KStream would be streaming from multiple topics that match that
> regex?
>
> If so, that could be useful.
>
> Gary
>
>
> On 30 September 2016 at 13:35, Damian Guy  wrote:
>
> > Hi Gary,
> >
> > In the upcoming 0.10.1 release you can do regex subscription - will that
> > help?
> >
> > Thanks,
> > Damian
> >
> > On Fri, 30 Sep 2016 at 14:57 Gary Ogden  wrote:
> >
> > > Is it possible to use the topic filter whitelist within a Kafka
> Streaming
> > > application? Or can it only be done in a consumer job?
> > >
> >
>


Re: Kafka streaming and topic filter whitelist

2016-09-30 Thread Gary Ogden
So how exactly would that work? For example, I can currently do this:

KStream
textLines = builder.stream(stringSerde, stringSerde, SYSTEM_TOPIC);

Are you saying that I could put a regex in place of the SYSTEM_TOPIC and
that one KStream would be streaming from multiple topics that match that
regex?

If so, that could be useful.

Gary


On 30 September 2016 at 13:35, Damian Guy  wrote:

> Hi Gary,
>
> In the upcoming 0.10.1 release you can do regex subscription - will that
> help?
>
> Thanks,
> Damian
>
> On Fri, 30 Sep 2016 at 14:57 Gary Ogden  wrote:
>
> > Is it possible to use the topic filter whitelist within a Kafka Streaming
> > application? Or can it only be done in a consumer job?
> >
>


Re: Kafka streaming and topic filter whitelist

2016-09-30 Thread Damian Guy
Hi Gary,

In the upcoming 0.10.1 release you can do regex subscription - will that
help?

Thanks,
Damian

On Fri, 30 Sep 2016 at 14:57 Gary Ogden  wrote:

> Is it possible to use the topic filter whitelist within a Kafka Streaming
> application? Or can it only be done in a consumer job?
>


Kafka streaming and topic filter whitelist

2016-09-30 Thread Gary Ogden
Is it possible to use the topic filter whitelist within a Kafka Streaming
application? Or can it only be done in a consumer job?


Re: Kafka Streaming Join for range of gps coordinates

2016-08-30 Thread Guozhang Wang
Hello Farhon,

I think your idea about KStream-KTable join is a good approach with some
tweaks, more specifically:

1. Model your rider request as a normal record stream with the combo key of
(latitude, longitude).

2. Model your driver location as an ever-updating table with the combo key
of (latitude, longitude), and value as the driver information. Whenever a
driver's location gets updated, it is captured as two updates: a delete
with the old location key and an insert with the new location key.

3. Then upon receiving a new record from the rider stream, issue the range
query based on its key with the range on the materialized table.


Does this work for you?


Guozhang



On Mon, Aug 29, 2016 at 11:36 AM, Michael Noll  wrote:

> Quick reply only, since I am on my mobile.  Not an exact answer to your
> problem but still somewhat related:
> http://www.infolace.com/blog/2016/07/14/simple-spatial-
> windowing-with-kafka-streams/
> (perhaps you have seen this already).
>
> -Michael
>
>
>
> On Sun, Aug 28, 2016 at 4:55 AM, Farhon Zaharia 
> wrote:
>
> > Hello friends,
> >
> > I am designing a new streaming component and am looking at how to use
> Kafka
> > Streams.
> >
> > I need some guidance with the appropriate flow.
> >
> > *Problem to solve:*
> > The problem I am working on is I have a large group of riders and
> drivers.
> > I would like to match available riders to nearby drivers.  (For
> simplicity
> > think of Uber or Lyft)
> >
> > The goal is to have the drivers within a certain delta of gps coordinates
> > be notified of a potential rider.
> >
> > For example, a rider requests to be picked up at location:
> > 37.788517, -122.406812
> >
> > I would like to select the nearby drivers to send notifications of an
> > available match by selecting nearby drivers within a range
> >
> > latitude < 37.798517 && latitude > 37.778517 && longitude < -122.4106812
> &&
> > longitude > -122.3906812
> >
> > *Note this is a POC and would prefer to select the most nearby drivers,
> > then after lookup the address and use my own graph for streets and
> > calculate the shortest path on my own.
> >
> > I would like to have 3 initial topics:  riders, drivers, and
> paired-onride
> >
> > What is the best way to do this with Kafka Streams?
> >
> > *What I have tried or considered:*
> > I was considering storing drivers in a Ktable and having riders in a
> > KStream and joining them.  But I don't think this will work because the
> > join is dependent on the key, which in this case I was looking more for a
> > select statement to look for a range of gps coordinates as noted above.
> > The drivers location will be updated periodically.
> >
> > I was also thinking of filtering the KStream based on the gps range and
> > making a smaller subselection of available drivers within a certain
> > distance to a rider.
> >
> > At this point I am seeking some guidance and if this is not an ideal
> > use-case that is also ok.  Thanks for any information or direction you
> can
> > provide.
> >
> >
> > -Farhon
> >
>



-- 
-- Guozhang


Re: Kafka Streaming Join for range of gps coordinates

2016-08-29 Thread Michael Noll
Quick reply only, since I am on my mobile.  Not an exact answer to your
problem but still somewhat related:
http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
(perhaps you have seen this already).

-Michael



On Sun, Aug 28, 2016 at 4:55 AM, Farhon Zaharia 
wrote:

> Hello friends,
>
> I am designing a new streaming component and am looking at how to use Kafka
> Streams.
>
> I need some guidance with the appropriate flow.
>
> *Problem to solve:*
> The problem I am working on is I have a large group of riders and drivers.
> I would like to match available riders to nearby drivers.  (For simplicity
> think of Uber or Lyft)
>
> The goal is to have the drivers within a certain delta of gps coordinates
> be notified of a potential rider.
>
> For example, a rider requests to be picked up at location:
> 37.788517, -122.406812
>
> I would like to select the nearby drivers to send notifications of an
> available match by selecting nearby drivers within a range
>
> latitude < 37.798517 && latitude > 37.778517 && longitude < -122.4106812 &&
> longitude > -122.3906812
>
> *Note this is a POC and would prefer to select the most nearby drivers,
> then after lookup the address and use my own graph for streets and
> calculate the shortest path on my own.
>
> I would like to have 3 initial topics:  riders, drivers, and paired-onride
>
> What is the best way to do this with Kafka Streams?
>
> *What I have tried or considered:*
> I was considering storing drivers in a Ktable and having riders in a
> KStream and joining them.  But I don't think this will work because the
> join is dependent on the key, which in this case I was looking more for a
> select statement to look for a range of gps coordinates as noted above.
> The drivers location will be updated periodically.
>
> I was also thinking of filtering the KStream based on the gps range and
> making a smaller subselection of available drivers within a certain
> distance to a rider.
>
> At this point I am seeking some guidance and if this is not an ideal
> use-case that is also ok.  Thanks for any information or direction you can
> provide.
>
>
> -Farhon
>


Kafka Streaming Join for range of gps coordinates

2016-08-27 Thread Farhon Zaharia
Hello friends,

I am designing a new streaming component and am looking at how to use Kafka
Streams.

I need some guidance with the appropriate flow.

*Problem to solve:*
The problem I am working on is I have a large group of riders and drivers.
I would like to match available riders to nearby drivers.  (For simplicity
think of Uber or Lyft)

The goal is to have the drivers within a certain delta of gps coordinates
be notified of a potential rider.

For example, a rider requests to be picked up at location:
37.788517, -122.406812

I would like to select the nearby drivers to send notifications of an
available match by selecting nearby drivers within a range

latitude < 37.798517 && latitude > 37.778517 && longitude < -122.4106812 &&
longitude > -122.3906812

*Note this is a POC and would prefer to select the most nearby drivers,
then after lookup the address and use my own graph for streets and
calculate the shortest path on my own.

I would like to have 3 initial topics:  riders, drivers, and paired-onride

What is the best way to do this with Kafka Streams?

*What I have tried or considered:*
I was considering storing drivers in a Ktable and having riders in a
KStream and joining them.  But I don't think this will work because the
join is dependent on the key, which in this case I was looking more for a
select statement to look for a range of gps coordinates as noted above.
The drivers location will be updated periodically.

I was also thinking of filtering the KStream based on the gps range and
making a smaller subselection of available drivers within a certain
distance to a rider.

At this point I am seeking some guidance and if this is not an ideal
use-case that is also ok.  Thanks for any information or direction you can
provide.


-Farhon


Re: Kafka Streaming Question : reset offset

2016-07-20 Thread Pariksheet Barapatre
Many Thanks Matthias for update.

Regards
Pari

On 20 July 2016 at 17:48, Matthias J. Sax  wrote:

> Hi Pari,
>
> currently, changing the application ID is the best way to follow.
> Cleaning up the application state correctly, is a little bit tricky. We
> are currently working on an improvement for this -- should be available
> soon.
>
> See https://issues.apache.org/jira/browse/KAFKA-3185
>
>
> -Matthias
>
> On 07/20/2016 01:57 PM, Pariksheet Barapatre wrote:
> > Hi Experts,
> >
> > I have written Kafka Streaming app that just filters rows based on some
> > condition and load it to MongoDB.
> >
> > The streaming process is working fine but due to some flaw in my code, I
> > want to reprocess whole data again.
> >
> > One way is to do this -  kill streaming app , change consumer group id ,
> > delete data from mongo and rerun the app.
> >
> > How to achieve this scenario without changing consumer group id.
> >
> > Many Thanks
> > Pari
> >
>
>


Re: Kafka Streaming Question : reset offset

2016-07-20 Thread Matthias J. Sax
Hi Pari,

currently, changing the application ID is the best way to follow.
Cleaning up the application state correctly, is a little bit tricky. We
are currently working on an improvement for this -- should be available
soon.

See https://issues.apache.org/jira/browse/KAFKA-3185


-Matthias

On 07/20/2016 01:57 PM, Pariksheet Barapatre wrote:
> Hi Experts,
> 
> I have written Kafka Streaming app that just filters rows based on some
> condition and load it to MongoDB.
> 
> The streaming process is working fine but due to some flaw in my code, I
> want to reprocess whole data again.
> 
> One way is to do this -  kill streaming app , change consumer group id ,
> delete data from mongo and rerun the app.
> 
> How to achieve this scenario without changing consumer group id.
> 
> Many Thanks
> Pari
> 



signature.asc
Description: OpenPGP digital signature


Kafka Streaming Question : reset offset

2016-07-20 Thread Pariksheet Barapatre
Hi Experts,

I have written Kafka Streaming app that just filters rows based on some
condition and load it to MongoDB.

The streaming process is working fine but due to some flaw in my code, I
want to reprocess whole data again.

One way is to do this -  kill streaming app , change consumer group id ,
delete data from mongo and rerun the app.

How to achieve this scenario without changing consumer group id.

Many Thanks
Pari


Re: Kafka Streaming - Window expiration

2016-06-13 Thread Pariksheet Barapatre
Many Thanks Eno.  I will try .until method.

Cheers
Pari

On 13 June 2016 at 14:10, Eno Thereska  wrote:

> Hi Pari,
>
> Try the .until method like this:
>
> > (TimeWindows) TimeWindows.of("tumbling-window-example",
> windowSizeMs).until(60 * 1000L)
>
> Thanks
> Eno
>
>
> > On 13 Jun 2016, at 08:31, Pariksheet Barapatre 
> wrote:
> >
> > Hello Experts,
> >
> > As per documentation in kafka docs -
> > *Windowing* is a common prerequisite for stateful transformations which
> > group records in a stream, for example, by their timestamps. A local
> state
> > store is usually needed for a windowing operation to store recently
> > received records based on the window interval, while old records in the
> > store are purged after the specified window retention period
> > <
> http://docs.confluent.io/3.0.0/streams/concepts.html#streams-concepts-windowing
> >
> > .
> >
> > But I am not finding any example to set window retention period example.
> >
> > Could somebody help me with example?
> >
> > long windowSizeMs = 60 *
> > 1000L;TimeWindows.of("tumbling-window-example", windowSizeMs);
> >
> > How to set expiration/purge time for above window example.
> >
> > Many Thanks
> >
> > Pari
>
>


Re: Kafka Streaming - Window expiration

2016-06-13 Thread Eno Thereska
Hi Pari,

Try the .until method like this:

> (TimeWindows) TimeWindows.of("tumbling-window-example", 
> windowSizeMs).until(60 * 1000L)

Thanks
Eno


> On 13 Jun 2016, at 08:31, Pariksheet Barapatre  wrote:
> 
> Hello Experts,
> 
> As per documentation in kafka docs -
> *Windowing* is a common prerequisite for stateful transformations which
> group records in a stream, for example, by their timestamps. A local state
> store is usually needed for a windowing operation to store recently
> received records based on the window interval, while old records in the
> store are purged after the specified window retention period
> 
> .
> 
> But I am not finding any example to set window retention period example.
> 
> Could somebody help me with example?
> 
> long windowSizeMs = 60 *
> 1000L;TimeWindows.of("tumbling-window-example", windowSizeMs);
> 
> How to set expiration/purge time for above window example.
> 
> Many Thanks
> 
> Pari



Kafka Streaming - Window expiration

2016-06-13 Thread Pariksheet Barapatre
Hello Experts,

As per documentation in kafka docs -
*Windowing* is a common prerequisite for stateful transformations which
group records in a stream, for example, by their timestamps. A local state
store is usually needed for a windowing operation to store recently
received records based on the window interval, while old records in the
store are purged after the specified window retention period

.

But I am not finding any example to set window retention period example.

Could somebody help me with example?

long windowSizeMs = 60 *
1000L;TimeWindows.of("tumbling-window-example", windowSizeMs);

How to set expiration/purge time for above window example.

Many Thanks

Pari