Question on Metadata

2017-03-14 Thread Syed Mudassir Ahmed
Hi guys,
  When we consume a JMS message, we get a Message object that has methods
to fetch implicit metadata provided by JMS server.
http://docs.oracle.com/javaee/6/api/javax/jms/Message.html.  There are
methods to fetch that implicit metadata such as Expiration, Correlation ID,
etc.

  Is there a way to fetch any such implicit metadata while consuming a
kafka message?  Any pointers to do that would be greatly appreciated.

  Thanks in advance.

-- 
Thanks,
Syed.


Re: Kafka Streams question

2017-03-14 Thread BYEONG-GI KIM
Dear Michael Noll,

I have a question; Is it possible converting JSON format to YAML format via
using Kafka Streams?

Best Regards

KIM

2017-03-10 11:36 GMT+09:00 BYEONG-GI KIM :

> Thank you very much for the information!
>
>
> 2017-03-09 19:40 GMT+09:00 Michael Noll :
>
>> There's actually a demo application that demonstrates the simplest use
>> case
>> for Kafka's Streams API:  to read data from an input topic and then write
>> that data as-is to an output topic.
>>
>> https://github.com/confluentinc/examples/blob/3.2.x/kafka-
>> streams/src/test/java/io/confluent/examples/streams/Pas
>> sThroughIntegrationTest.java
>>
>> The code above is for Confluent 3.2 and Apache Kafka 0.10.2.
>>
>> The demo shows how to (1) write a message from a producer to the input
>> topic, (2) use a Kafka Streams app to process that data and write the
>> results back to Kafka, and (3) validating the results with a consumer that
>> reads from the output topic.
>>
>> The GitHub project above includes many more such examples, see
>> https://github.com/confluentinc/examples/tree/3.2.x/kafka-streams.
>> Again,
>> this is for Confluent 3.2 and Kafka 0.10.2.  There is a version
>> compatibility matrix that explains which branches you need to use for
>> older
>> versions of Confluent/Kafka as well as for the very latest development
>> version (aka Kafka's trunk):
>> https://github.com/confluentinc/examples/tree/3.2.x/kafka-
>> streams#version-compatibility
>>
>> Hope this helps!
>> Michael
>>
>>
>>
>>
>> On Thu, Mar 9, 2017 at 9:59 AM, BYEONG-GI KIM  wrote:
>>
>> > Hello.
>> >
>> > I'm a new who started learning the one of the new Kafka functionality,
>> aka
>> > Kafka Stream.
>> >
>> > As far as I know, the simplest usage of the Kafka Stream is to do
>> something
>> > like parsing, which forward incoming data from a topic to another topic,
>> > with a few changing.
>> >
>> > So... Here is what I'd want to do:
>> >
>> > 1. Produce a simple message, like 1, 2, 3, 4, 5, ... from a producer
>> > 2. Let Kafka Stream application consume the message and change the
>> message
>> > like [1], [2], [3], ...
>> > 3. Consume the changed message at a consumer
>> >
>> > I've read the documentation,
>> > https://kafka.apache.org/0102/javadoc/index.html?org/apache/
>> kafka/connect,
>> > but it's unclear for me how to implement it.
>> >
>> > Especially, I could not understand the the
>> > line, builder.stream("my-input-topic").mapValues(value ->
>> > value.length().toString()).to("my-output-topic"). Could someone
>> explain it
>> > and how to implement what I've mentioned?
>> >
>> > Thanks in advance.
>> >
>> > Best regards
>> >
>> > KIM
>> >
>>
>
>


Re: Kafka Streams question

2017-03-14 Thread Michael Noll
Yes, of course.  You can also re-use any existing JSON and/or YAML library
for helping you with that.

Also, in general, an application that uses the Kafka Streams API/library is
a normal, standard Java application -- you can of course also use any other
Java/Scala/... library for the application's processing needs.

-Michael



On Tue, Mar 14, 2017 at 9:00 AM, BYEONG-GI KIM  wrote:

> Dear Michael Noll,
>
> I have a question; Is it possible converting JSON format to YAML format via
> using Kafka Streams?
>
> Best Regards
>
> KIM
>
> 2017-03-10 11:36 GMT+09:00 BYEONG-GI KIM :
>
> > Thank you very much for the information!
> >
> >
> > 2017-03-09 19:40 GMT+09:00 Michael Noll :
> >
> >> There's actually a demo application that demonstrates the simplest use
> >> case
> >> for Kafka's Streams API:  to read data from an input topic and then
> write
> >> that data as-is to an output topic.
> >>
> >> https://github.com/confluentinc/examples/blob/3.2.x/kafka-
> >> streams/src/test/java/io/confluent/examples/streams/Pas
> >> sThroughIntegrationTest.java
> >>
> >> The code above is for Confluent 3.2 and Apache Kafka 0.10.2.
> >>
> >> The demo shows how to (1) write a message from a producer to the input
> >> topic, (2) use a Kafka Streams app to process that data and write the
> >> results back to Kafka, and (3) validating the results with a consumer
> that
> >> reads from the output topic.
> >>
> >> The GitHub project above includes many more such examples, see
> >> https://github.com/confluentinc/examples/tree/3.2.x/kafka-streams.
> >> Again,
> >> this is for Confluent 3.2 and Kafka 0.10.2.  There is a version
> >> compatibility matrix that explains which branches you need to use for
> >> older
> >> versions of Confluent/Kafka as well as for the very latest development
> >> version (aka Kafka's trunk):
> >> https://github.com/confluentinc/examples/tree/3.2.x/kafka-
> >> streams#version-compatibility
> >>
> >> Hope this helps!
> >> Michael
> >>
> >>
> >>
> >>
> >> On Thu, Mar 9, 2017 at 9:59 AM, BYEONG-GI KIM 
> wrote:
> >>
> >> > Hello.
> >> >
> >> > I'm a new who started learning the one of the new Kafka functionality,
> >> aka
> >> > Kafka Stream.
> >> >
> >> > As far as I know, the simplest usage of the Kafka Stream is to do
> >> something
> >> > like parsing, which forward incoming data from a topic to another
> topic,
> >> > with a few changing.
> >> >
> >> > So... Here is what I'd want to do:
> >> >
> >> > 1. Produce a simple message, like 1, 2, 3, 4, 5, ... from a producer
> >> > 2. Let Kafka Stream application consume the message and change the
> >> message
> >> > like [1], [2], [3], ...
> >> > 3. Consume the changed message at a consumer
> >> >
> >> > I've read the documentation,
> >> > https://kafka.apache.org/0102/javadoc/index.html?org/apache/
> >> kafka/connect,
> >> > but it's unclear for me how to implement it.
> >> >
> >> > Especially, I could not understand the the
> >> > line, builder.stream("my-input-topic").mapValues(value ->
> >> > value.length().toString()).to("my-output-topic"). Could someone
> >> explain it
> >> > and how to implement what I've mentioned?
> >> >
> >> > Thanks in advance.
> >> >
> >> > Best regards
> >> >
> >> > KIM
> >> >
> >>
> >
> >
>



-- 
*Michael G. Noll*
Product Manager | Confluent
+1 650 453 5860 | @miguno 
Follow us: Twitter  | Blog



Re: Kafka Streams question

2017-03-14 Thread BYEONG-GI KIM
Thank you very much for the reply.

I'll try to implement it.

Best regards

KIM

2017-03-14 17:07 GMT+09:00 Michael Noll :

> Yes, of course.  You can also re-use any existing JSON and/or YAML library
> for helping you with that.
>
> Also, in general, an application that uses the Kafka Streams API/library is
> a normal, standard Java application -- you can of course also use any other
> Java/Scala/... library for the application's processing needs.
>
> -Michael
>
>
>
> On Tue, Mar 14, 2017 at 9:00 AM, BYEONG-GI KIM  wrote:
>
> > Dear Michael Noll,
> >
> > I have a question; Is it possible converting JSON format to YAML format
> via
> > using Kafka Streams?
> >
> > Best Regards
> >
> > KIM
> >
> > 2017-03-10 11:36 GMT+09:00 BYEONG-GI KIM :
> >
> > > Thank you very much for the information!
> > >
> > >
> > > 2017-03-09 19:40 GMT+09:00 Michael Noll :
> > >
> > >> There's actually a demo application that demonstrates the simplest use
> > >> case
> > >> for Kafka's Streams API:  to read data from an input topic and then
> > write
> > >> that data as-is to an output topic.
> > >>
> > >> https://github.com/confluentinc/examples/blob/3.2.x/kafka-
> > >> streams/src/test/java/io/confluent/examples/streams/Pas
> > >> sThroughIntegrationTest.java
> > >>
> > >> The code above is for Confluent 3.2 and Apache Kafka 0.10.2.
> > >>
> > >> The demo shows how to (1) write a message from a producer to the input
> > >> topic, (2) use a Kafka Streams app to process that data and write the
> > >> results back to Kafka, and (3) validating the results with a consumer
> > that
> > >> reads from the output topic.
> > >>
> > >> The GitHub project above includes many more such examples, see
> > >> https://github.com/confluentinc/examples/tree/3.2.x/kafka-streams.
> > >> Again,
> > >> this is for Confluent 3.2 and Kafka 0.10.2.  There is a version
> > >> compatibility matrix that explains which branches you need to use for
> > >> older
> > >> versions of Confluent/Kafka as well as for the very latest development
> > >> version (aka Kafka's trunk):
> > >> https://github.com/confluentinc/examples/tree/3.2.x/kafka-
> > >> streams#version-compatibility
> > >>
> > >> Hope this helps!
> > >> Michael
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Mar 9, 2017 at 9:59 AM, BYEONG-GI KIM 
> > wrote:
> > >>
> > >> > Hello.
> > >> >
> > >> > I'm a new who started learning the one of the new Kafka
> functionality,
> > >> aka
> > >> > Kafka Stream.
> > >> >
> > >> > As far as I know, the simplest usage of the Kafka Stream is to do
> > >> something
> > >> > like parsing, which forward incoming data from a topic to another
> > topic,
> > >> > with a few changing.
> > >> >
> > >> > So... Here is what I'd want to do:
> > >> >
> > >> > 1. Produce a simple message, like 1, 2, 3, 4, 5, ... from a producer
> > >> > 2. Let Kafka Stream application consume the message and change the
> > >> message
> > >> > like [1], [2], [3], ...
> > >> > 3. Consume the changed message at a consumer
> > >> >
> > >> > I've read the documentation,
> > >> > https://kafka.apache.org/0102/javadoc/index.html?org/apache/
> > >> kafka/connect,
> > >> > but it's unclear for me how to implement it.
> > >> >
> > >> > Especially, I could not understand the the
> > >> > line, builder.stream("my-input-topic").mapValues(value ->
> > >> > value.length().toString()).to("my-output-topic"). Could someone
> > >> explain it
> > >> > and how to implement what I've mentioned?
> > >> >
> > >> > Thanks in advance.
> > >> >
> > >> > Best regards
> > >> >
> > >> > KIM
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
> *Michael G. Noll*
> Product Manager | Confluent
> +1 650 453 5860 | @miguno 
> Follow us: Twitter  | Blog
> 
>


Re: Lost ISR when upgrading kafka from 0.10.0.1 to any newer version like 0.10.1.0 or 0.10.2.0

2017-03-14 Thread Ismael Juma
Hi Thomas,

Did you follow the instructions:

https://kafka.apache.org/documentation/#upgrade

Ismael

On Mon, Mar 13, 2017 at 9:43 AM, Thomas KIEFFER <
thomas.kief...@olamobile.com.invalid> wrote:

> I'm trying to perform an upgrade of 2 kafka cluster of 5 instances, When
> I'm doing the switch between 0.10.0.1 and 0.10.1.0 or 0.10.2.0, I saw that
> ISR is lost when I upgrade one instance. I didn't find out yet anything
> relevant about this problem, logs seems just fine.
> eg.
>
> kafka-topics.sh --describe --zookeeper kazoo002.#.prv --topic redirects
> Topic:redirectsPartitionCount:6ReplicationFactor:2
> Configs:retention.bytes=10737418240
> Topic: redirectsPartition: 0Leader: 1Replicas: 1,2Isr:
> 1,2
> Topic: redirectsPartition: 1Leader: 2Replicas: 2,0Isr:
> 2
> Topic: redirectsPartition: 2Leader: 1Replicas: 0,1Isr:
> 1
> Topic: redirectsPartition: 3Leader: 1Replicas: 1,0Isr:
> 1
> Topic: redirectsPartition: 4Leader: 2Replicas: 2,1Isr:
> 2,1
> Topic: redirectsPartition: 5Leader: 2Replicas: 0,2Isr:
> 2
>
> It run with Zookeeper 3.4.6.
>
> As those clusters are in production, I didn't try to migrate more than 1
> instance after spotting this ISR problem, and then rollback to the original
> version 0.10.0.1.
>
> Any update about this would be greatly receive.
>
> --
> 
>
> Thomas Kieffer
>
> Senior Linux Systems Administrator
>
> Skype: thomas.kieffer.corporate | Phone: (+352) 691444263
> <+352%20691%20444%20263> | www.olamobile.com
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Michael Noll
I see Jay's point, and I agree with much of it -- notably about being
careful which concepts we do and do not expose, depending on which user
group / user type is affected.  That said, I'm not sure yet whether or not
we should get rid of "Topology" (or a similar term) in the DSL.

For what it's worth, here's how related technologies define/name their
"topologies" and "builders".  Note that, in all cases, it's about
constructing a logical processing plan, which then is being executed/run.

- `Pipeline` (Google Dataflow/Apache Beam)
- To add a source you first instantiate the Source (e.g.
`TextIO.Read.from("gs://some/inputData.txt")`),
  then attach it to your processing plan via `Pipeline#apply()`.
  This setup is a bit different to our DSL because in our DSL the
builder does both, i.e.
  instantiating + auto-attaching to itself.
- To execute the processing plan you call `Pipeline#execute()`.
- `StreamingContext`` (Spark): This setup is similar to our DSL.
- To add a source you call e.g.
`StreamingContext#socketTextStream("localhost", )`.
- To execute the processing plan you call `StreamingContext#execute()`.
- `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
- To add a source you call e.g.
`StreamExecutionEnvironment#socketTextStream("localhost", )`.
- To execute the processing plan you call
`StreamExecutionEnvironment#execute()`.
- `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
`KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
  into Flows, which are [Runnable]Graphs.
- You instantiate a Source directly, and then compose the Source with
Sinks to create a RunnableGraph:
  see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
RunnableGraph[Mat]`.
- To execute the processing plan you call `Flow#run()`.

In our DSL, in comparison, we do:

- `KStreamBuilder` (Kafka Streams API)
- To add a source you call e.g. `KStreamBuilder#stream("input-topic")`.
- To execute the processing plan you create a `KafkaStreams` instance
from `KStreamBuilder`
  (where the builder will instantiate the topology = processing plan to
be executed), and then
  call `KafkaStreams#start()`.  Think of `KafkaStreams` as our runner.

First, I agree with the sentiment that the current name of `KStreamBuilder`
isn't great (which is why we're having this discussion).  Also, that
finding a good name is tricky. ;-)

Second, even though I agree with many of Jay's points I'm not sure whether
I like the `StreamsBuilder` suggestion (i.e. any name that does not include
"topology" or a similar term) that much more.  It still doesn't describe
what that class actually does, and what the difference to `KafkaStreams`
is.  IMHO, the point of `KStreamBuilder` is that it lets you build a
logical plan (what we call "topology"), and `KafkaStreams` is the thing
that executes that plan.  I'm not yet convinced that abstracting these two
points away from the user is a good idea if the argument is that it's
potentially confusing to beginners (a claim which I am not sure is actually
true).

That said, if we rather favor "good-sounding but perhaps less technically
correct names", I'd argue we should not even use something like "Builder".
We could, for example, also pick the following names:

- KafkaStreams as the new name for the builder that creates the logical
plan, with e.g. `KafkaStreams.stream("intput-topic")` and
`KafkaStreams.table("input-topic")`.
- KafkaStreamsRunner as the new name for the executioner of the plan, with
`KafkaStreamsRunner(KafkaStreams).run()`.



On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian 
wrote:

> StreamsBuilder would be my vote.
>
> > On Mar 13, 2017, at 9:42 PM, Jay Kreps  wrote:
> >
> > Hey Matthias,
> >
> > Make sense, I'm more advocating for removing the word topology than any
> > particular new replacement.
> >
> > -Jay
> >
> > On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax  >
> > wrote:
> >
> >> Jay,
> >>
> >> thanks for your feedback
> >>
> >>> What if instead we called it KStreamsBuilder?
> >>
> >> That's the current name and I personally think it's not the best one.
> >> The main reason why I don't like KStreamsBuilder is, that we have the
> >> concepts of KStreams and KTables, and the builder creates both. However,
> >> the name puts he focus on KStream and devalues KTable.
> >>
> >> I understand your argument, and I am personally open the remove the
> >> "Topology" part, and name it "StreamsBuilder". Not sure what others
> >> think about this.
> >>
> >>
> >> About Processor API: I like the idea in general, but I thinks it's out
> >> of scope for this KIP. KIP-120 has the focus on removing leaking
> >> internal APIs and do some cleanup how our API reflects some concepts.
> >>
> >> However, I added your idea to API discussion Wiki page and we take if
> >> from there:
> >> https://cwiki.apache.org/confluence/display/KAFKA/
> >> Kafka+Streams+Discussions
> >>
> >>
> >>
> >> -Matthias
> >>

Re: Lost ISR when upgrading kafka from 0.10.0.1 to any newer version like 0.10.1.0 or 0.10.2.0

2017-03-14 Thread Thomas KIEFFER

Hello Ismael,

Thank you for your feedback.

Yes I've done  this changes on a previous upgrade and set them 
accordingly with the new version when trying to do the upgrade.


inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2, 0.9.0, 
0.10.0 or 0.10.1).
log.message.format.version=CURRENT_KAFKA_VERSION (See potential 
performance impact following the upgrade for the details on what this 
configuration does.)


On 03/14/2017 11:26 AM, Ismael Juma wrote:

Hi Thomas,

Did you follow the instructions:

https://kafka.apache.org/documentation/#upgrade

Ismael

On Mon, Mar 13, 2017 at 9:43 AM, Thomas KIEFFER <
thomas.kief...@olamobile.com.invalid> wrote:


I'm trying to perform an upgrade of 2 kafka cluster of 5 instances, When
I'm doing the switch between 0.10.0.1 and 0.10.1.0 or 0.10.2.0, I saw that
ISR is lost when I upgrade one instance. I didn't find out yet anything
relevant about this problem, logs seems just fine.
eg.

kafka-topics.sh --describe --zookeeper kazoo002.#.prv --topic redirects
Topic:redirectsPartitionCount:6ReplicationFactor:2
Configs:retention.bytes=10737418240
 Topic: redirectsPartition: 0Leader: 1Replicas: 1,2Isr:
1,2
 Topic: redirectsPartition: 1Leader: 2Replicas: 2,0Isr:
2
 Topic: redirectsPartition: 2Leader: 1Replicas: 0,1Isr:
1
 Topic: redirectsPartition: 3Leader: 1Replicas: 1,0Isr:
1
 Topic: redirectsPartition: 4Leader: 2Replicas: 2,1Isr:
2,1
 Topic: redirectsPartition: 5Leader: 2Replicas: 0,2Isr:
2

It run with Zookeeper 3.4.6.

As those clusters are in production, I didn't try to migrate more than 1
instance after spotting this ISR problem, and then rollback to the original
version 0.10.0.1.

Any update about this would be greatly receive.

--


Thomas Kieffer

Senior Linux Systems Administrator

Skype: thomas.kieffer.corporate | Phone: (+352) 691444263
<+352%20691%20444%20263> | www.olamobile.com

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.


--




Thomas Kieffer

Senior Linux Systems Administrator

Skype: thomas.kieffer.corporate | Phone: (+352) 691444263 | 
www.olamobile.com 



--
The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this in error, please contact the sender and delete the material from any 
computer.


Re: Lost ISR when upgrading kafka from 0.10.0.1 to any newer version like 0.10.1.0 or 0.10.2.0

2017-03-14 Thread Ismael Juma
So, to double-check, you set inter.broker.protocol.version=0.10.0 before
bouncing each broker?

On Tue, Mar 14, 2017 at 11:22 AM, Thomas KIEFFER <
thomas.kief...@olamobile.com.invalid> wrote:

> Hello Ismael,
>
> Thank you for your feedback.
>
> Yes I've done  this changes on a previous upgrade and set them accordingly
> with the new version when trying to do the upgrade.
>
> inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2, 0.9.0,
> 0.10.0 or 0.10.1).
> log.message.format.version=CURRENT_KAFKA_VERSION (See potential
> performance impact following the upgrade for the details on what this
> configuration does.)
> On 03/14/2017 11:26 AM, Ismael Juma wrote:
>
> Hi Thomas,
>
> Did you follow the instructions:
> https://kafka.apache.org/documentation/#upgrade
>
> Ismael
>
> On Mon, Mar 13, 2017 at 9:43 AM, Thomas KIEFFER 
>  wrote:
>
>
> I'm trying to perform an upgrade of 2 kafka cluster of 5 instances, When
> I'm doing the switch between 0.10.0.1 and 0.10.1.0 or 0.10.2.0, I saw that
> ISR is lost when I upgrade one instance. I didn't find out yet anything
> relevant about this problem, logs seems just fine.
> eg.
>
> kafka-topics.sh --describe --zookeeper kazoo002.#.prv --topic redirects
> Topic:redirectsPartitionCount:6ReplicationFactor:2
> Configs:retention.bytes=10737418240
> Topic: redirectsPartition: 0Leader: 1Replicas: 1,2Isr:
> 1,2
> Topic: redirectsPartition: 1Leader: 2Replicas: 2,0Isr:
> 2
> Topic: redirectsPartition: 2Leader: 1Replicas: 0,1Isr:
> 1
> Topic: redirectsPartition: 3Leader: 1Replicas: 1,0Isr:
> 1
> Topic: redirectsPartition: 4Leader: 2Replicas: 2,1Isr:
> 2,1
> Topic: redirectsPartition: 5Leader: 2Replicas: 0,2Isr:
> 2
>
> It run with Zookeeper 3.4.6.
>
> As those clusters are in production, I didn't try to migrate more than 1
> instance after spotting this ISR problem, and then rollback to the original
> version 0.10.0.1.
>
> Any update about this would be greatly receive.
>
> --
>  
> 
>
> Thomas Kieffer
>
> Senior Linux Systems Administrator
>
> Skype: thomas.kieffer.corporate | Phone: (+352) 691444263 
> <+352%20691%20444%20263>
> <+352%20691%20444%20263> | www.olamobile.com
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
>
> --
> 
>
> Thomas Kieffer
>
> Senior Linux Systems Administrator
>
> Skype: thomas.kieffer.corporate | Phone: (+352) 691444263
> <+352%20691%20444%20263> | www.olamobile.com
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>


Re: Simple KafkaProducer to handle multiple requests or not

2017-03-14 Thread Amit K
Thanks Robert Quinlivan,

Got this one clear.
Have been going through additional docs, they seems to point to this though
bit subtly.

Thanks again for your help!

On Mon, Mar 13, 2017 at 10:20 PM, Robert Quinlivan 
wrote:

> There is no need to create a new producer instance for each write request.
> In doing so you lose the advantages of the buffering and batching that the
> producer offers. In your use case I would recommend having a single running
> producer and tuning the batch size and linger.ms settings if you find that
> the producer is using too much memory.
>
> On Mon, Mar 13, 2017 at 5:05 AM, Amit K  wrote:
>
> > Hi,
> >
> > I am using simple kafka producer (java based, version 0.9.0.0) in an
> > application where I receive lot of hits (about 50 per seconds, in much
> like
> > servlet way) on application that has kafka producer. Per request comes
> > different set of records.
> >
> > I am using only one instance of kafka producer to push the records to
> kafka
> > cluster. Is that good way to use kafka producer? As it is mentioned in
> > documentation, the kafkaproducer can be shared across multiple threads.
> >
> > Or should be there one kafka producer created to handle one request?
> >
> > Is there any best practice documents/guidelines to follow for using
> simple
> > java Kafka producer api?
> >
> > Thanks in advance for your responses.
> >
> > Thanks,
> > Amit
> >
>
>
>
> --
> Robert Quinlivan
> Software Engineer, Signal
>


Re: Lost ISR when upgrading kafka from 0.10.0.1 to any newer version like 0.10.1.0 or 0.10.2.0

2017-03-14 Thread Thomas KIEFFER
Yes, I've set the inter.broker.protocol.version=0.10.0 before restarting 
each broker on a previous update. Clusters currently run with this config.



On 03/14/2017 12:34 PM, Ismael Juma wrote:

So, to double-check, you set inter.broker.protocol.version=0.10.0 before
bouncing each broker?

On Tue, Mar 14, 2017 at 11:22 AM, Thomas KIEFFER <
thomas.kief...@olamobile.com.invalid> wrote:


Hello Ismael,

Thank you for your feedback.

Yes I've done  this changes on a previous upgrade and set them accordingly
with the new version when trying to do the upgrade.

inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2, 0.9.0,
0.10.0 or 0.10.1).
log.message.format.version=CURRENT_KAFKA_VERSION (See potential
performance impact following the upgrade for the details on what this
configuration does.)
On 03/14/2017 11:26 AM, Ismael Juma wrote:

Hi Thomas,

Did you follow the instructions:
https://kafka.apache.org/documentation/#upgrade

Ismael

On Mon, Mar 13, 2017 at 9:43 AM, Thomas KIEFFER 
 wrote:


I'm trying to perform an upgrade of 2 kafka cluster of 5 instances, When
I'm doing the switch between 0.10.0.1 and 0.10.1.0 or 0.10.2.0, I saw that
ISR is lost when I upgrade one instance. I didn't find out yet anything
relevant about this problem, logs seems just fine.
eg.

kafka-topics.sh --describe --zookeeper kazoo002.#.prv --topic redirects
Topic:redirectsPartitionCount:6ReplicationFactor:2
Configs:retention.bytes=10737418240
 Topic: redirectsPartition: 0Leader: 1Replicas: 1,2Isr:
1,2
 Topic: redirectsPartition: 1Leader: 2Replicas: 2,0Isr:
2
 Topic: redirectsPartition: 2Leader: 1Replicas: 0,1Isr:
1
 Topic: redirectsPartition: 3Leader: 1Replicas: 1,0Isr:
1
 Topic: redirectsPartition: 4Leader: 2Replicas: 2,1Isr:
2,1
 Topic: redirectsPartition: 5Leader: 2Replicas: 0,2Isr:
2

It run with Zookeeper 3.4.6.

As those clusters are in production, I didn't try to migrate more than 1
instance after spotting this ISR problem, and then rollback to the original
version 0.10.0.1.

Any update about this would be greatly receive.

--
 


Thomas Kieffer

Senior Linux Systems Administrator

Skype: thomas.kieffer.corporate | Phone: (+352) 691444263 
<+352%20691%20444%20263>
<+352%20691%20444%20263> | www.olamobile.com

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.


--


Thomas Kieffer

Senior Linux Systems Administrator

Skype: thomas.kieffer.corporate | Phone: (+352) 691444263
<+352%20691%20444%20263> | www.olamobile.com

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material from any
computer.



--




Thomas Kieffer

Senior Linux Systems Administrator

Skype: thomas.kieffer.corporate | Phone: (+352) 691444263 | 
www.olamobile.com 



--
The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this in error, please contact the sender and delete the material from any 
computer.


Re: Pattern to create Task with dependencies (DI)

2017-03-14 Thread Mathieu Fenniak
Hey Petr,

I have the same issue.  But I just cope with it; I wire up default
dependencies directly in the connector and task constructors, expose them
through properties, and modify them to refer to mocks in my unit tests.

It's not a great approach, but it is simple.

Why KConnect does take control from me how tasks are created?  ... Or
> instead of getTaskClass() to have createTask() on Connector which returns
> task instance.


The problem is that in distributed mode, Kafka Connect will be creating
connectors on "arbitrary" nodes, and then very likely creating tasks on
nodes where the connector is not instantiated.  This is my understanding of
why Kafka Connect takes the control away from you.

I'm not saying it can't be done better, but it's definitely not without
reason. :-)

Mathieu


On Mon, Mar 13, 2017 at 4:42 AM, Petr Novak  wrote:

> Hello,
>
> Nobody has experience with Kafka Connect tasks with external dependencies?
>
>
>
> Thanks,
>
> Petr
>
>
>
> From: Petr Novak [mailto:oss.mli...@gmail.com]
> Sent: 23. února 2017 14:48
> To: users@kafka.apache.org
> Subject: Pattern to create Task with dependencies (DI)
>
>
>
> Hello,
>
> it seems that KConnect take control over creating task instance and
> requires
> no-arg constructor. What is the recommended pattern when I need to create a
> task which has dependency e.g. on some db client and I want to be able to
> pass in mock in tests, preferable through constructor?
>
> In Java I probably have to divide actual task implementation into separate
> class and use method forwarding from task factory class. I assume that this
> factory, whose class would be passed to KConnect, requires to extend
> abstract class whose “this” I have to pass to implementing class as another
> dependency. I can’t figure out anything less ugly.
>
>
>
> In Scala it can be done elegantly through mixin composition which is
> invisible to KConnect.
>
>
>
> Why KConnect does take control from me how tasks are created? Why KConnect
> doesn’t accept factory class on which it would call no-arg create method
> which would return instance of my task so that I can control how it is
> created and which dependencies are created. Or instead of getTaskClass() to
> have createTask() on Connector which returns task instance.
>
>
>
> Many thanks for advice,
>
> Petr
>
>


Using Kafka connect BYTES_SCHEMA

2017-03-14 Thread Marc Magnin

Hi,

When sending data using SourceRecord, I want to send to Kafka a byte array but 
I receive a base64 encoded array of my object.
 :
@Override
public SourceRecord[] getRecords(String kafkaTopic) {
return new SourceRecord[]{new SourceRecord(null, null, 
topic.toString(), null,
Schema.STRING_SCHEMA,   « test key »,
Schema.BYTES_SCHEMA, mMessage.getPayload())};
}

getPayload returns a byte array.

Any help on this would be great !
Many thanks,
Marc


WordCount example does not output to OUTPUT topic

2017-03-14 Thread Mina Aslani
Hi,
I am using below code to read from a topic and count words and write to
another topic. The example is the one in github.
My kafka container is in the VM. I do not get any error but I do not see
any result/output in my output ordCount-output topic either. The program
also does not stop either!

Any idea?

Best regards,
Mina

Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "cloudtrail-events-streaming");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.99.102:29092");
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());

props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);

// setting offset reset to earliest so that we can re-run the demo
code with the same pre-loaded data
// Note: To re-run the demo, you need to use the offset reset tool:
// 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

KStreamBuilder builder = new KStreamBuilder();

KStream source = builder.stream("wordCount-input");

KTable counts = source
  .flatMapValues(new ValueMapper>() {
 @Override
 public Iterable apply(String value) {
return
Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
 }
  }).map(new KeyValueMapper>() {
 @Override
 public KeyValue apply(String key, String value) {
return new KeyValue<>(value, value);
 }
  })
  .groupByKey()
  .count("Counts");

// need to override value serde to Long type
counts.to(Serdes.String(), Serdes.Long(), "wordCount-output");

LOGGER.info("counts:" + counts);

KafkaStreams streams = new KafkaStreams(builder, props);

streams.cleanUp();
streams.start();

// usually the stream application would be running forever,
// in this example we just let it run for some time and stop since the
input data is finite.
Thread.sleep(5000L);
streams.close();


Fwd: Question on Metadata

2017-03-14 Thread Syed Mudassir Ahmed
Can anyone help?

-- Forwarded message --
From: "Syed Mudassir Ahmed" 
Date: Mar 14, 2017 12:28 PM
Subject: Question on Metadata
To: 
Cc:

Hi guys,
  When we consume a JMS message, we get a Message object that has methods
to fetch implicit metadata provided by JMS server.
http://docs.oracle.com/javaee/6/api/javax/jms/Message.html.  There are
methods to fetch that implicit metadata such as Expiration, Correlation ID,
etc.

  Is there a way to fetch any such implicit metadata while consuming a
kafka message?  Any pointers to do that would be greatly appreciated.

  Thanks in advance.

-- 
Thanks,
Syed.


Re: Question on Metadata

2017-03-14 Thread Robert Quinlivan
Did you look at the ConsumerRecord

class?

On Tue, Mar 14, 2017 at 11:09 AM, Syed Mudassir Ahmed <
smudas...@snaplogic.com> wrote:

> Can anyone help?
>
> -- Forwarded message --
> From: "Syed Mudassir Ahmed" 
> Date: Mar 14, 2017 12:28 PM
> Subject: Question on Metadata
> To: 
> Cc:
>
> Hi guys,
>   When we consume a JMS message, we get a Message object that has methods
> to fetch implicit metadata provided by JMS server.
> http://docs.oracle.com/javaee/6/api/javax/jms/Message.html.  There are
> methods to fetch that implicit metadata such as Expiration, Correlation ID,
> etc.
>
>   Is there a way to fetch any such implicit metadata while consuming a
> kafka message?  Any pointers to do that would be greatly appreciated.
>
>   Thanks in advance.
>
> --
> Thanks,
> Syed.
>



-- 
Robert Quinlivan
Software Engineer, Signal


Re: WordCount example does not output to OUTPUT topic

2017-03-14 Thread Matthias J. Sax
This seems to be the same question as "Trying to use Kafka Stream" ?



On 3/14/17 9:05 AM, Mina Aslani wrote:
> Hi,
> I am using below code to read from a topic and count words and write to
> another topic. The example is the one in github.
> My kafka container is in the VM. I do not get any error but I do not see
> any result/output in my output ordCount-output topic either. The program
> also does not stop either!
> 
> Any idea?
> 
> Best regards,
> Mina
> 
> Properties props = new Properties();
> props.put(StreamsConfig.APPLICATION_ID_CONFIG, "cloudtrail-events-streaming");
> props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.99.102:29092");
> props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> Serdes.String().getClass().getName());
> props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> Serdes.String().getClass().getName());
> 
> props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
> props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
> 
> // setting offset reset to earliest so that we can re-run the demo
> code with the same pre-loaded data
> // Note: To re-run the demo, you need to use the offset reset tool:
> // 
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> 
> KStreamBuilder builder = new KStreamBuilder();
> 
> KStream source = builder.stream("wordCount-input");
> 
> KTable counts = source
>   .flatMapValues(new ValueMapper>() {
>  @Override
>  public Iterable apply(String value) {
> return
> Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
>  }
>   }).map(new KeyValueMapper>() {
>  @Override
>  public KeyValue apply(String key, String value) {
> return new KeyValue<>(value, value);
>  }
>   })
>   .groupByKey()
>   .count("Counts");
> 
> // need to override value serde to Long type
> counts.to(Serdes.String(), Serdes.Long(), "wordCount-output");
> 
> LOGGER.info("counts:" + counts);
> 
> KafkaStreams streams = new KafkaStreams(builder, props);
> 
> streams.cleanUp();
> streams.start();
> 
> // usually the stream application would be running forever,
> // in this example we just let it run for some time and stop since the
> input data is finite.
> Thread.sleep(5000L);
> streams.close();
> 



signature.asc
Description: OpenPGP digital signature


Re: Trying to use Kafka Stream

2017-03-14 Thread Matthias J. Sax
>> So, when I check the number of messages in wordCount-input I see the same
>> messages. However, when I run below code I do not see any message/data in
>> wordCount-output.

Did you reset your application?

Each time you run you app and restart it, it will resume processing
where it left off. Thus, if something went wrong in you first run but
you got committed offsets, the app will not re-read the whole topic.

You can check committed offset via bin/kafka-consumer-groups.sh. The
application-id from StreamConfig is used a group.id.

Thus, resetting you app would be required to consumer the input topic
from scratch. Of you just write new data to you input topic.

>> Can I connect to kafka in VM/docker container using below code or do I need
>> to change/add other parameters? How can I submit the code to
>> kafka/kafka-connect? Do we have similar concept as SPARK to submit the
>> code(e.g. jar file)?

A Streams app is a regular Java application and can run anywhere --
there is no notion of a processing cluster and you don't "submit" your
code -- you just run your app.

Thus, if your console consumer can connect to the cluster, your Streams
app should also be able to connect to the cluster.


Maybe, the short runtime of 5 seconds could be a problem (even if it
seems log to process just a few records). But you might need to put
startup delay into account. I would recommend to register a shutdown
hook: see
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181


Hope this helps.

-Matthias


On 3/13/17 7:30 PM, Mina Aslani wrote:
> Hi Matthias,
> 
> Thank you for the quick response, appreciate it!
> 
> I created the topics wordCount-input and wordCount-output. Pushed some data
> to wordCount-input using
> 
> docker exec -it $(docker ps -f "name=kafka\\." --format "{{.Names}}")
> /bin/kafka-console-producer --broker-list localhost:9092 --topic
> wordCount-input
> 
> test
> 
> new
> 
> word
> 
> count
> 
> wordcount
> 
> word count
> 
> So, when I check the number of messages in wordCount-input I see the same
> messages. However, when I run below code I do not see any message/data in
> wordCount-output.
> 
> Can I connect to kafka in VM/docker container using below code or do I need
> to change/add other parameters? How can I submit the code to
> kafka/kafka-connect? Do we have similar concept as SPARK to submit the
> code(e.g. jar file)?
> 
> I really appreciate your input as I am blocked and cannot run even below
> simple example.
> 
> Best regards,
> Mina
> 
> I changed the code to be as below:
> 
> Properties props = new Properties();
> props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordCount-streaming");
> props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ":9092");
> props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> Serdes.String().getClass().getName());
> props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> Serdes.String().getClass().getName());
> 
> props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
> props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
> 
> // setting offset reset to earliest so that we can re-run the demo
> code with the same pre-loaded data
> // Note: To re-run the demo, you need to use the offset reset tool:
> // 
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool
> props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> 
> KStreamBuilder builder = new KStreamBuilder();
> 
> KStream source = builder.stream("wordCount-input");
> 
> KTable counts = source
>   .flatMapValues(new ValueMapper>() {
>  @Override
>  public Iterable apply(String value) {
> return
> Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
>  }
>   }).map(new KeyValueMapper>() {
>  @Override
>  public KeyValue apply(String key, String value) {
> return new KeyValue<>(value, value);
>  }
>   })
>   .groupByKey()
>   .count("Counts");
> 
> // need to override value serde to Long type
> counts.to(Serdes.String(), Serdes.Long(), "wordCount-output");
> 
> KafkaStreams streams = new KafkaStreams(builder, props);
> streams.start();
> 
> // usually the stream application would be running forever,
> // in this example we just let it run for some time and stop since the
> input data is finite.
> Thread.sleep(5000L);
> 
> streams.close();
> 
> 
> 
> 
> On Mon, Mar 13, 2017 at 4:09 PM, Matthias J. Sax 
> wrote:
> 
>> Maybe you need to reset your application using the reset tool:
>> http://docs.confluent.io/current/streams/developer-
>> guide.html#application-reset-tool
>>
>> Also keep in mind, that KTables buffer internally, and thus, you might
>> only see data on commit.
>>
>> Try to reduce commit interval or disable caching by setting
>> "cache.max.bytes.buffering" to zero in your StreamsConfig.
>>
>>
>> -Matthias
>>
>> On 3/13/17 12:29 PM, Mina Aslani wrote:
>>> Hi,
>>>

Re: Kafka Streams: ReadOnlyKeyValueStore range behavior

2017-03-14 Thread Matthias J. Sax
> However,
>> for keys that have been tombstoned, it does return null for me.

Sound like a bug. Can you reliable reproduce this? Would you mind
opening a JIRA?

Can you check if this happens for both cases: caching enabled and
disabled? Or only for once case?


> "No ordering guarantees are provided."

That is correct. Internally, default stores are hash-based -- thus, we
don't give a sorted list/iterator back. You could replace RocksDB with a
custom store though.


-Matthias


On 3/13/17 3:56 PM, Dmitry Minkovsky wrote:
> I am using interactive streams to query tables:
> 
> ReadOnlyKeyValueStore Messages.UserLetter> store
>   = streams.store("view-user-drafts",
> QueryableStoreTypes.keyValueStore());
> 
> Documentation says that #range() should not return null values. However,
> for keys that have been tombstoned, it does return null for me.
> 
> Also, I noticed only just now that "No ordering guarantees are provided." I
> haven't done enough testing or looked at the code carefully enough yet and
> wonder if someone who knows could confirm: is this true? Is this common to
> all store implementations? I was hoping to use interactive streams like
> HBase to scan ranges. It appears this is not possible.
> 
> Thank you,
> Dmitry
> 



signature.asc
Description: OpenPGP digital signature


Kafka Retention Policy to Indefinite

2017-03-14 Thread Joe San
Dear Kafka Users,

What are the arguments against setting the retention plociy on a Kafka
topic to infinite? I was in an interesting discussion with one of my
colleagues where he was suggesting to set the retention policy for a topic
to be indefinite.

So how does this play up when adding new broker partitions? Say, I have
accumulated in my topic some gigabytes of data and now I realize that I
have to scale up by adding another partition. Now is this going to pose me
a problem? The partition rebalance has to happen and I'm not sure what the
implications are with rebalancing a partition that has gigabytes of data.

Any thoughts on this?

Thanks and Regards,
Jothi


Re: Question on Metadata

2017-03-14 Thread Hans Jespersen
You may also be interested to try out the new Confluent JMS client for Kafka. 
It implements the JMS 1.1. API along with all the JMS metadata fields and 
access methods. It does this by putting/getting the JMS metadata into the body 
of an underlying Kafka message which is defined with a special JMS AVRO schema 
that includes both the JMS metadata as well as the JMS message body (which can 
be any of the JMS message types).

-hans


> On Mar 14, 2017, at 9:26 AM, Robert Quinlivan  wrote:
> 
> Did you look at the ConsumerRecord
> 
> class?
> 
> On Tue, Mar 14, 2017 at 11:09 AM, Syed Mudassir Ahmed <
> smudas...@snaplogic.com> wrote:
> 
>> Can anyone help?
>> 
>> -- Forwarded message --
>> From: "Syed Mudassir Ahmed" 
>> Date: Mar 14, 2017 12:28 PM
>> Subject: Question on Metadata
>> To: 
>> Cc:
>> 
>> Hi guys,
>>  When we consume a JMS message, we get a Message object that has methods
>> to fetch implicit metadata provided by JMS server.
>> http://docs.oracle.com/javaee/6/api/javax/jms/Message.html.  There are
>> methods to fetch that implicit metadata such as Expiration, Correlation ID,
>> etc.
>> 
>>  Is there a way to fetch any such implicit metadata while consuming a
>> kafka message?  Any pointers to do that would be greatly appreciated.
>> 
>>  Thanks in advance.
>> 
>> --
>> Thanks,
>> Syed.
>> 
> 
> 
> 
> -- 
> Robert Quinlivan
> Software Engineer, Signal



Re: Kafka Retention Policy to Indefinite

2017-03-14 Thread Hans Jespersen
You might want to use the new replication quotas mechanism (i.e. network
throttling) to make sure that replication traffic doesn't negatively impact
your production traffic.

See for details:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas

This feature was added in 0.10.1

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Tue, Mar 14, 2017 at 10:09 AM, Joe San  wrote:

> Dear Kafka Users,
>
> What are the arguments against setting the retention plociy on a Kafka
> topic to infinite? I was in an interesting discussion with one of my
> colleagues where he was suggesting to set the retention policy for a topic
> to be indefinite.
>
> So how does this play up when adding new broker partitions? Say, I have
> accumulated in my topic some gigabytes of data and now I realize that I
> have to scale up by adding another partition. Now is this going to pose me
> a problem? The partition rebalance has to happen and I'm not sure what the
> implications are with rebalancing a partition that has gigabytes of data.
>
> Any thoughts on this?
>
> Thanks and Regards,
> Jothi
>


Re: Kafka Retention Policy to Indefinite

2017-03-14 Thread Joe San
So that means with replication quotas, I can set the retention policy to be
infinite?

On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen  wrote:

> You might want to use the new replication quotas mechanism (i.e. network
> throttling) to make sure that replication traffic doesn't negatively impact
> your production traffic.
>
> See for details:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 73+Replication+Quotas
>
> This feature was added in 0.10.1
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670
>  */
>
> On Tue, Mar 14, 2017 at 10:09 AM, Joe San  wrote:
>
> > Dear Kafka Users,
> >
> > What are the arguments against setting the retention plociy on a Kafka
> > topic to infinite? I was in an interesting discussion with one of my
> > colleagues where he was suggesting to set the retention policy for a
> topic
> > to be indefinite.
> >
> > So how does this play up when adding new broker partitions? Say, I have
> > accumulated in my topic some gigabytes of data and now I realize that I
> > have to scale up by adding another partition. Now is this going to pose
> me
> > a problem? The partition rebalance has to happen and I'm not sure what
> the
> > implications are with rebalancing a partition that has gigabytes of data.
> >
> > Any thoughts on this?
> >
> > Thanks and Regards,
> > Jothi
> >
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Matthias J. Sax
Thanks for your input Michael.

>> - KafkaStreams as the new name for the builder that creates the logical
>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>> `KafkaStreams.table("input-topic")`.

I don't thinks this is a good idea, for multiple reasons:

(1) We would reuse a name for a completely different purpose. The same
argument for not renaming KStreamBuilder to TopologyBuilder. The
confusion would just be too large.

So if we would start from scratch, it might be ok to do so, but now we
cannot make this move, IMHO.

Also a clarification question: do you suggest to have static methods
#stream and #table -- I am not sure if this would work?
(or was you code snippet just simplification?)


(2) Kafka Streams is basically a "processing client" next to consumer
and producer client. Thus, the name KafkaStreams aligns to the naming
schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
a good choice to "break" this naming scheme.

Btw: this is also the reason, why we have KafkaStreams#close() -- and
not KafkaStreams#stop() -- because #close() aligns with consumer and
producer client.


(3) On more argument against using KafkaStreams as DSL entry class would
be, that it would need to create a Topology that can be given to the
"runner/processing-client". Thus the pattern would be

> Topology topology = streams.build();
> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)

(or of course as a one liner).



On the other hand, there was the idea (that we intentionally excluded
from the KIP), to change the "client instantiation" pattern.

Right now, a new client in actively instantiated (ie, by calling "new")
and the topology if provided as a constructor argument. However,
especially for DSL (not sure if it would make sense for PAPI), the DSL
builder could create the client for the user.

Something like this:

> KStreamBuilder builder = new KStreamBuilder();
> builder.whatever() // use the builder
>
> StreamsConfig config = 
> KafkaStreams streams = builder.getKafkaStreams(config);

If we change the patter like this, the notion a the "DSL builder" would
change, as it does not create a topology anymore, but it creates the
"processing client". This would address Jay's concern about "not
exposing concept users don't need the understand" and would not require
to include the word "Topology" in the DSL builder class name, because
the builder does not build a Topology anymore.

I just put some names that came to my mind first hand -- did not think
about good names. It's just to discuss the pattern.



-Matthias





On 3/14/17 3:36 AM, Michael Noll wrote:
> I see Jay's point, and I agree with much of it -- notably about being
> careful which concepts we do and do not expose, depending on which user
> group / user type is affected.  That said, I'm not sure yet whether or not
> we should get rid of "Topology" (or a similar term) in the DSL.
> 
> For what it's worth, here's how related technologies define/name their
> "topologies" and "builders".  Note that, in all cases, it's about
> constructing a logical processing plan, which then is being executed/run.
> 
> - `Pipeline` (Google Dataflow/Apache Beam)
> - To add a source you first instantiate the Source (e.g.
> `TextIO.Read.from("gs://some/inputData.txt")`),
>   then attach it to your processing plan via `Pipeline#apply()`.
>   This setup is a bit different to our DSL because in our DSL the
> builder does both, i.e.
>   instantiating + auto-attaching to itself.
> - To execute the processing plan you call `Pipeline#execute()`.
> - `StreamingContext`` (Spark): This setup is similar to our DSL.
> - To add a source you call e.g.
> `StreamingContext#socketTextStream("localhost", )`.
> - To execute the processing plan you call `StreamingContext#execute()`.
> - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
> - To add a source you call e.g.
> `StreamExecutionEnvironment#socketTextStream("localhost", )`.
> - To execute the processing plan you call
> `StreamExecutionEnvironment#execute()`.
> - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>   into Flows, which are [Runnable]Graphs.
> - You instantiate a Source directly, and then compose the Source with
> Sinks to create a RunnableGraph:
>   see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
> RunnableGraph[Mat]`.
> - To execute the processing plan you call `Flow#run()`.
> 
> In our DSL, in comparison, we do:
> 
> - `KStreamBuilder` (Kafka Streams API)
> - To add a source you call e.g. `KStreamBuilder#stream("input-topic")`.
> - To execute the processing plan you create a `KafkaStreams` instance
> from `KStreamBuilder`
>   (where the builder will instantiate the topology = processing plan to
> be executed), and then
>   call `KafkaStreams#start()`.  Think of `KafkaStreams` as our runner.
> 
> Fi

Re: Kafka Retention Policy to Indefinite

2017-03-14 Thread Hans Jespersen
I am saying that replication quotas will mitigate one of the potential
downsides of setting an infinite retention policy.

There is no clear set yes/no best practice rule for setting an extremely
large retention policy. It is clearly a valid configuration and there are
people who run this way.

The issues have more to do will the amount of data you expect to be stored
over the life of the system. If you have a Kafka cluster with petabytes of
data in it and a consumer comes along and blindly consumes from the
beginning, they will be getting a lot of data. So much so that this might
be considered an anti-pattern because their apps might not behave as they
expect and the network bandwidth used by lots of clients operating this way
may be considered bad practice.

Another way to avoid collecting too much data is to use compacted topics,
which are a special kind of topic that keeps the latest value for each key
forever, but removes the older messages with the same key in order to
reduce the total about of messages stored.

How much data do you expect to store in your largest topic over the life of
the cluster?

-hans





/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Tue, Mar 14, 2017 at 10:36 AM, Joe San  wrote:

> So that means with replication quotas, I can set the retention policy to be
> infinite?
>
> On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen  wrote:
>
> > You might want to use the new replication quotas mechanism (i.e. network
> > throttling) to make sure that replication traffic doesn't negatively
> impact
> > your production traffic.
> >
> > See for details:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 73+Replication+Quotas
> >
> > This feature was added in 0.10.1
> >
> > -hans
> >
> > /**
> >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >  * h...@confluent.io (650)924-2670
> >  */
> >
> > On Tue, Mar 14, 2017 at 10:09 AM, Joe San 
> wrote:
> >
> > > Dear Kafka Users,
> > >
> > > What are the arguments against setting the retention plociy on a Kafka
> > > topic to infinite? I was in an interesting discussion with one of my
> > > colleagues where he was suggesting to set the retention policy for a
> > topic
> > > to be indefinite.
> > >
> > > So how does this play up when adding new broker partitions? Say, I have
> > > accumulated in my topic some gigabytes of data and now I realize that I
> > > have to scale up by adding another partition. Now is this going to pose
> > me
> > > a problem? The partition rebalance has to happen and I'm not sure what
> > the
> > > implications are with rebalancing a partition that has gigabytes of
> data.
> > >
> > > Any thoughts on this?
> > >
> > > Thanks and Regards,
> > > Jothi
> > >
> >
>


Re: Common Identity between brokers

2017-03-14 Thread Sumit Maheshwari
Can anyone answer the above query?

On Mon, Mar 13, 2017 at 3:41 PM, Sumit Maheshwari 
wrote:

> Hi,
>
> How can we identify if a set of brokers (nodes) belong to same cluster?
> I understand we can use the zookeeper where all the brokers pointing to
> same zookeeper URL's belong to same cluster.
> But is there a common identity between brokers which can help identify if
> brokers belong to same cluster?
>
> I have seen in recent Kafka release there is concept of clusterId but that
> is available from 0.10.1.0. I am using little older version of Kafka.
>
> Thanks,
> Sumit
>


Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
I reset and still not working!

My env is setup using
http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html

I just tried using
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181
with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic) created
from scratch as went through the steps as directed.

When I stopped the java program and check the topics below are the data in
each topic.

docker run \

  --net=host \

  --rm \

  confluentinc/cp-kafka:3.2.0 \

  kafka-console-consumer --bootstrap-server localhost:29092 --topic
TextLinesTopic --new-consumer --from-beginning


SHOWS

hello kafka streams

all streams lead to kafka

join kafka summit

test1

test2

test3

test4

FOR WordsWithCountsTopic nothing is shown


I am new to the Kafka/Kafka Stream and still do not understand why a simple
example does not work!

On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax 
wrote:

> >> So, when I check the number of messages in wordCount-input I see the
> same
> >> messages. However, when I run below code I do not see any message/data
> in
> >> wordCount-output.
>
> Did you reset your application?
>
> Each time you run you app and restart it, it will resume processing
> where it left off. Thus, if something went wrong in you first run but
> you got committed offsets, the app will not re-read the whole topic.
>
> You can check committed offset via bin/kafka-consumer-groups.sh. The
> application-id from StreamConfig is used a group.id.
>
> Thus, resetting you app would be required to consumer the input topic
> from scratch. Of you just write new data to you input topic.
>
> >> Can I connect to kafka in VM/docker container using below code or do I
> need
> >> to change/add other parameters? How can I submit the code to
> >> kafka/kafka-connect? Do we have similar concept as SPARK to submit the
> >> code(e.g. jar file)?
>
> A Streams app is a regular Java application and can run anywhere --
> there is no notion of a processing cluster and you don't "submit" your
> code -- you just run your app.
>
> Thus, if your console consumer can connect to the cluster, your Streams
> app should also be able to connect to the cluster.
>
>
> Maybe, the short runtime of 5 seconds could be a problem (even if it
> seems log to process just a few records). But you might need to put
> startup delay into account. I would recommend to register a shutdown
> hook: see
> https://github.com/confluentinc/examples/blob/3.
> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
> WordCountLambdaExample.java#L178-L181
>
>
> Hope this helps.
>
> -Matthias
>
>
> On 3/13/17 7:30 PM, Mina Aslani wrote:
> > Hi Matthias,
> >
> > Thank you for the quick response, appreciate it!
> >
> > I created the topics wordCount-input and wordCount-output. Pushed some
> data
> > to wordCount-input using
> >
> > docker exec -it $(docker ps -f "name=kafka\\." --format "{{.Names}}")
> > /bin/kafka-console-producer --broker-list localhost:9092 --topic
> > wordCount-input
> >
> > test
> >
> > new
> >
> > word
> >
> > count
> >
> > wordcount
> >
> > word count
> >
> > So, when I check the number of messages in wordCount-input I see the same
> > messages. However, when I run below code I do not see any message/data in
> > wordCount-output.
> >
> > Can I connect to kafka in VM/docker container using below code or do I
> need
> > to change/add other parameters? How can I submit the code to
> > kafka/kafka-connect? Do we have similar concept as SPARK to submit the
> > code(e.g. jar file)?
> >
> > I really appreciate your input as I am blocked and cannot run even below
> > simple example.
> >
> > Best regards,
> > Mina
> >
> > I changed the code to be as below:
> >
> > Properties props = new Properties();
> > props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordCount-streaming");
> > props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ":9092");
> > props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> > Serdes.String().getClass().getName());
> > props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> > Serdes.String().getClass().getName());
> >
> > props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
> > props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
> >
> > // setting offset reset to earliest so that we can re-run the demo
> > code with the same pre-loaded data
> > // Note: To re-run the demo, you need to use the offset reset tool:
> > // https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Application+Reset+Tool
> > props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
> >
> > KStreamBuilder builder = new KStreamBuilder();
> >
> > KStream source = builder.stream("wordCount-input");
> >
> > KTable counts = source
> >   .flatMapValues(new ValueMapper>() {
> >  @Override
> >  public Iterable apply(String value) {
> > return
> > Arrays.asList(value.toLowerCase(Locale.getDefault()).split(" "));
> >

Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
Any book, document and provides information on how to use kafka stream?

On Tue, Mar 14, 2017 at 2:42 PM, Mina Aslani  wrote:

> I reset and still not working!
>
> My env is setup using http://docs.confluent.io/3.2.0/cp-docker-images/
> docs/quickstart.html
>
> I just tried using https://github.com/confluentinc/examples/blob/3.
> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
> WordCountLambdaExample.java#L178-L181 with all the topics(e.g. TextLinesTopic
> and WordsWithCountsTopic) created from scratch as went through the steps
> as directed.
>
> When I stopped the java program and check the topics below are the data in
> each topic.
>
> docker run \
>
>   --net=host \
>
>   --rm \
>
>   confluentinc/cp-kafka:3.2.0 \
>
>   kafka-console-consumer --bootstrap-server localhost:29092 --topic
> TextLinesTopic --new-consumer --from-beginning
>
>
> SHOWS
>
> hello kafka streams
>
> all streams lead to kafka
>
> join kafka summit
>
> test1
>
> test2
>
> test3
>
> test4
>
> FOR WordsWithCountsTopic nothing is shown
>
>
> I am new to the Kafka/Kafka Stream and still do not understand why a
> simple example does not work!
>
> On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax 
> wrote:
>
>> >> So, when I check the number of messages in wordCount-input I see the
>> same
>> >> messages. However, when I run below code I do not see any message/data
>> in
>> >> wordCount-output.
>>
>> Did you reset your application?
>>
>> Each time you run you app and restart it, it will resume processing
>> where it left off. Thus, if something went wrong in you first run but
>> you got committed offsets, the app will not re-read the whole topic.
>>
>> You can check committed offset via bin/kafka-consumer-groups.sh. The
>> application-id from StreamConfig is used a group.id.
>>
>> Thus, resetting you app would be required to consumer the input topic
>> from scratch. Of you just write new data to you input topic.
>>
>> >> Can I connect to kafka in VM/docker container using below code or do I
>> need
>> >> to change/add other parameters? How can I submit the code to
>> >> kafka/kafka-connect? Do we have similar concept as SPARK to submit the
>> >> code(e.g. jar file)?
>>
>> A Streams app is a regular Java application and can run anywhere --
>> there is no notion of a processing cluster and you don't "submit" your
>> code -- you just run your app.
>>
>> Thus, if your console consumer can connect to the cluster, your Streams
>> app should also be able to connect to the cluster.
>>
>>
>> Maybe, the short runtime of 5 seconds could be a problem (even if it
>> seems log to process just a few records). But you might need to put
>> startup delay into account. I would recommend to register a shutdown
>> hook: see
>> https://github.com/confluentinc/examples/blob/3.2.x/kafka-
>> streams/src/main/java/io/confluent/examples/streams/Wor
>> dCountLambdaExample.java#L178-L181
>>
>>
>> Hope this helps.
>>
>> -Matthias
>>
>>
>> On 3/13/17 7:30 PM, Mina Aslani wrote:
>> > Hi Matthias,
>> >
>> > Thank you for the quick response, appreciate it!
>> >
>> > I created the topics wordCount-input and wordCount-output. Pushed some
>> data
>> > to wordCount-input using
>> >
>> > docker exec -it $(docker ps -f "name=kafka\\." --format "{{.Names}}")
>> > /bin/kafka-console-producer --broker-list localhost:9092 --topic
>> > wordCount-input
>> >
>> > test
>> >
>> > new
>> >
>> > word
>> >
>> > count
>> >
>> > wordcount
>> >
>> > word count
>> >
>> > So, when I check the number of messages in wordCount-input I see the
>> same
>> > messages. However, when I run below code I do not see any message/data
>> in
>> > wordCount-output.
>> >
>> > Can I connect to kafka in VM/docker container using below code or do I
>> need
>> > to change/add other parameters? How can I submit the code to
>> > kafka/kafka-connect? Do we have similar concept as SPARK to submit the
>> > code(e.g. jar file)?
>> >
>> > I really appreciate your input as I am blocked and cannot run even below
>> > simple example.
>> >
>> > Best regards,
>> > Mina
>> >
>> > I changed the code to be as below:
>> >
>> > Properties props = new Properties();
>> > props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordCount-streaming");
>> > props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ":9092");
>> > props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
>> > Serdes.String().getClass().getName());
>> > props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
>> > Serdes.String().getClass().getName());
>> >
>> > props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
>> > props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
>> >
>> > // setting offset reset to earliest so that we can re-run the demo
>> > code with the same pre-loaded data
>> > // Note: To re-run the demo, you need to use the offset reset tool:
>> > // https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
>> Streams+Application+Reset+Tool
>> > props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
>> >
>> > KStreamBuilder builder = new K

Re: Trying to use Kafka Stream

2017-03-14 Thread Eno Thereska
Hi there,

I noticed in your example that you are using localhost:9092 to produce but 
localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and the 
Kafka Streams app all running within one docker container, or in different 
containers?

I just tested the WordCountLambdaExample and it works for me. This might not 
have anything to do with streams, but rather with the Kafka configuration and 
whether streams (that is just an app) can reach Kafka at all. If you provide 
the above information we can look further.



Thanks
Eno

> On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
> 
> I reset and still not working!
> 
> My env is setup using
> http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html
> 
> I just tried using
> https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181
> with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic) created
> from scratch as went through the steps as directed.
> 
> When I stopped the java program and check the topics below are the data in
> each topic.
> 
> docker run \
> 
>  --net=host \
> 
>  --rm \
> 
>  confluentinc/cp-kafka:3.2.0 \
> 
>  kafka-console-consumer --bootstrap-server localhost:29092 --topic
> TextLinesTopic --new-consumer --from-beginning
> 
> 
> SHOWS
> 
> hello kafka streams
> 
> all streams lead to kafka
> 
> join kafka summit
> 
> test1
> 
> test2
> 
> test3
> 
> test4
> 
> FOR WordsWithCountsTopic nothing is shown
> 
> 
> I am new to the Kafka/Kafka Stream and still do not understand why a simple
> example does not work!
> 
> On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax 
> wrote:
> 
 So, when I check the number of messages in wordCount-input I see the
>> same
 messages. However, when I run below code I do not see any message/data
>> in
 wordCount-output.
>> 
>> Did you reset your application?
>> 
>> Each time you run you app and restart it, it will resume processing
>> where it left off. Thus, if something went wrong in you first run but
>> you got committed offsets, the app will not re-read the whole topic.
>> 
>> You can check committed offset via bin/kafka-consumer-groups.sh. The
>> application-id from StreamConfig is used a group.id.
>> 
>> Thus, resetting you app would be required to consumer the input topic
>> from scratch. Of you just write new data to you input topic.
>> 
 Can I connect to kafka in VM/docker container using below code or do I
>> need
 to change/add other parameters? How can I submit the code to
 kafka/kafka-connect? Do we have similar concept as SPARK to submit the
 code(e.g. jar file)?
>> 
>> A Streams app is a regular Java application and can run anywhere --
>> there is no notion of a processing cluster and you don't "submit" your
>> code -- you just run your app.
>> 
>> Thus, if your console consumer can connect to the cluster, your Streams
>> app should also be able to connect to the cluster.
>> 
>> 
>> Maybe, the short runtime of 5 seconds could be a problem (even if it
>> seems log to process just a few records). But you might need to put
>> startup delay into account. I would recommend to register a shutdown
>> hook: see
>> https://github.com/confluentinc/examples/blob/3.
>> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
>> WordCountLambdaExample.java#L178-L181
>> 
>> 
>> Hope this helps.
>> 
>> -Matthias
>> 
>> 
>> On 3/13/17 7:30 PM, Mina Aslani wrote:
>>> Hi Matthias,
>>> 
>>> Thank you for the quick response, appreciate it!
>>> 
>>> I created the topics wordCount-input and wordCount-output. Pushed some
>> data
>>> to wordCount-input using
>>> 
>>> docker exec -it $(docker ps -f "name=kafka\\." --format "{{.Names}}")
>>> /bin/kafka-console-producer --broker-list localhost:9092 --topic
>>> wordCount-input
>>> 
>>> test
>>> 
>>> new
>>> 
>>> word
>>> 
>>> count
>>> 
>>> wordcount
>>> 
>>> word count
>>> 
>>> So, when I check the number of messages in wordCount-input I see the same
>>> messages. However, when I run below code I do not see any message/data in
>>> wordCount-output.
>>> 
>>> Can I connect to kafka in VM/docker container using below code or do I
>> need
>>> to change/add other parameters? How can I submit the code to
>>> kafka/kafka-connect? Do we have similar concept as SPARK to submit the
>>> code(e.g. jar file)?
>>> 
>>> I really appreciate your input as I am blocked and cannot run even below
>>> simple example.
>>> 
>>> Best regards,
>>> Mina
>>> 
>>> I changed the code to be as below:
>>> 
>>> Properties props = new Properties();
>>> props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordCount-streaming");
>>> props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ":9092");
>>> props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
>>> Serdes.String().getClass().getName());
>>> props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
>>> Serdes.String().getClass().getName());
>>> 
>>> props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10);
>>> props.put(

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Guozhang Wang
I'd like to keep the term "Topology" inside the builder class since, as
Matthias mentioned, this builder#build() function returns a "Topology"
object, whose type is a public class anyways. Although you can argue to let
users always call

"new KafkaStreams(builder.build())"

I think it is still more benefit to expose this concept.



Guozhang

On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax 
wrote:

> Thanks for your input Michael.
>
> >> - KafkaStreams as the new name for the builder that creates the logical
> >> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> >> `KafkaStreams.table("input-topic")`.
>
> I don't thinks this is a good idea, for multiple reasons:
>
> (1) We would reuse a name for a completely different purpose. The same
> argument for not renaming KStreamBuilder to TopologyBuilder. The
> confusion would just be too large.
>
> So if we would start from scratch, it might be ok to do so, but now we
> cannot make this move, IMHO.
>
> Also a clarification question: do you suggest to have static methods
> #stream and #table -- I am not sure if this would work?
> (or was you code snippet just simplification?)
>
>
> (2) Kafka Streams is basically a "processing client" next to consumer
> and producer client. Thus, the name KafkaStreams aligns to the naming
> schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
> a good choice to "break" this naming scheme.
>
> Btw: this is also the reason, why we have KafkaStreams#close() -- and
> not KafkaStreams#stop() -- because #close() aligns with consumer and
> producer client.
>
>
> (3) On more argument against using KafkaStreams as DSL entry class would
> be, that it would need to create a Topology that can be given to the
> "runner/processing-client". Thus the pattern would be
>
> > Topology topology = streams.build();
> > KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>
> (or of course as a one liner).
>
>
>
> On the other hand, there was the idea (that we intentionally excluded
> from the KIP), to change the "client instantiation" pattern.
>
> Right now, a new client in actively instantiated (ie, by calling "new")
> and the topology if provided as a constructor argument. However,
> especially for DSL (not sure if it would make sense for PAPI), the DSL
> builder could create the client for the user.
>
> Something like this:
>
> > KStreamBuilder builder = new KStreamBuilder();
> > builder.whatever() // use the builder
> >
> > StreamsConfig config = 
> > KafkaStreams streams = builder.getKafkaStreams(config);
>
> If we change the patter like this, the notion a the "DSL builder" would
> change, as it does not create a topology anymore, but it creates the
> "processing client". This would address Jay's concern about "not
> exposing concept users don't need the understand" and would not require
> to include the word "Topology" in the DSL builder class name, because
> the builder does not build a Topology anymore.
>
> I just put some names that came to my mind first hand -- did not think
> about good names. It's just to discuss the pattern.
>
>
>
> -Matthias
>
>
>
>
>
> On 3/14/17 3:36 AM, Michael Noll wrote:
> > I see Jay's point, and I agree with much of it -- notably about being
> > careful which concepts we do and do not expose, depending on which user
> > group / user type is affected.  That said, I'm not sure yet whether or
> not
> > we should get rid of "Topology" (or a similar term) in the DSL.
> >
> > For what it's worth, here's how related technologies define/name their
> > "topologies" and "builders".  Note that, in all cases, it's about
> > constructing a logical processing plan, which then is being executed/run.
> >
> > - `Pipeline` (Google Dataflow/Apache Beam)
> > - To add a source you first instantiate the Source (e.g.
> > `TextIO.Read.from("gs://some/inputData.txt")`),
> >   then attach it to your processing plan via
> `Pipeline#apply()`.
> >   This setup is a bit different to our DSL because in our DSL the
> > builder does both, i.e.
> >   instantiating + auto-attaching to itself.
> > - To execute the processing plan you call `Pipeline#execute()`.
> > - `StreamingContext`` (Spark): This setup is similar to our DSL.
> > - To add a source you call e.g.
> > `StreamingContext#socketTextStream("localhost", )`.
> > - To execute the processing plan you call
> `StreamingContext#execute()`.
> > - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
> > - To add a source you call e.g.
> > `StreamExecutionEnvironment#socketTextStream("localhost", )`.
> > - To execute the processing plan you call
> > `StreamExecutionEnvironment#execute()`.
> > - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
> > `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
> >   into Flows, which are [Runnable]Graphs.
> > - You instantiate a Source directly, and then compose the Source with
> > Sinks to create a RunnableGraph:
> >   see 

Kafka connection to start from latest offset

2017-03-14 Thread Aaron Niskode-Dossett
Is it possible to start a kafka connect instance that reads from the
*latest* offset as opposed to the earliest?  I suppose this would be the
equivalent of passing auto.offset.reset=earliest to a kafka consumer.

More generally, is this something that specific implementations of the
kafka connect API would have to be responsible for handling or should it
exposed through the connect API?

Thanks, Aaron


Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
Hi Eno,

Sorry! That is a typo!

I have a docker-machine with different containers (setup as directed @
http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html)

docker ps --format "{{.Image}}: {{.Names}}"

confluentinc/cp-kafka-connect:3.2.0: kafka-connect

confluentinc/cp-enterprise-control-center:3.2.0: control-center

confluentinc/cp-kafka-rest:3.2.0: kafka-rest

confluentinc/cp-schema-registry:3.2.0: schema-registry

confluentinc/cp-kafka:3.2.0: kafka

confluentinc/cp-zookeeper:3.2.0: zookeeper

I used example @
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confluent/examples/streams/WordCountLambdaExample.java#L178-L181
and followed the same steps.

When I run below command in docker-machine, I see the messages in
TextLinesTopic.

docker run --net=host --rm confluentinc/cp-kafka:3.2.0 kafka-console-consumer
--bootstrap-server localhost:29092 --topic TextLinesTopic --new-consumer
--from-beginning

hello kafka streams

all streams lead to kafka

join kafka summit

test1

test2

test3

test4

Running above command for WordsWithCountsTopic returns nothing*.*

My program runs out of docker machine, and it does not return any error.

I checked kafka logs and kafka-connect logs, no information is shown.
Wondering what is the log level in kafka/kafka-connect.


Best regards,
Mina




On Tue, Mar 14, 2017 at 3:57 PM, Eno Thereska 
wrote:

> Hi there,
>
> I noticed in your example that you are using localhost:9092 to produce but
> localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and the
> Kafka Streams app all running within one docker container, or in different
> containers?
>
> I just tested the WordCountLambdaExample and it works for me. This might
> not have anything to do with streams, but rather with the Kafka
> configuration and whether streams (that is just an app) can reach Kafka at
> all. If you provide the above information we can look further.
>
>
>
> Thanks
> Eno
>
> > On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
> >
> > I reset and still not working!
> >
> > My env is setup using
> > http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html
> >
> > I just tried using
> > https://github.com/confluentinc/examples/blob/3.
> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
> WordCountLambdaExample.java#L178-L181
> > with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic) created
> > from scratch as went through the steps as directed.
> >
> > When I stopped the java program and check the topics below are the data
> in
> > each topic.
> >
> > docker run \
> >
> >  --net=host \
> >
> >  --rm \
> >
> >  confluentinc/cp-kafka:3.2.0 \
> >
> >  kafka-console-consumer --bootstrap-server localhost:29092 --topic
> > TextLinesTopic --new-consumer --from-beginning
> >
> >
> > SHOWS
> >
> > hello kafka streams
> >
> > all streams lead to kafka
> >
> > join kafka summit
> >
> > test1
> >
> > test2
> >
> > test3
> >
> > test4
> >
> > FOR WordsWithCountsTopic nothing is shown
> >
> >
> > I am new to the Kafka/Kafka Stream and still do not understand why a
> simple
> > example does not work!
> >
> > On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax 
> > wrote:
> >
>  So, when I check the number of messages in wordCount-input I see the
> >> same
>  messages. However, when I run below code I do not see any message/data
> >> in
>  wordCount-output.
> >>
> >> Did you reset your application?
> >>
> >> Each time you run you app and restart it, it will resume processing
> >> where it left off. Thus, if something went wrong in you first run but
> >> you got committed offsets, the app will not re-read the whole topic.
> >>
> >> You can check committed offset via bin/kafka-consumer-groups.sh. The
> >> application-id from StreamConfig is used a group.id.
> >>
> >> Thus, resetting you app would be required to consumer the input topic
> >> from scratch. Of you just write new data to you input topic.
> >>
>  Can I connect to kafka in VM/docker container using below code or do I
> >> need
>  to change/add other parameters? How can I submit the code to
>  kafka/kafka-connect? Do we have similar concept as SPARK to submit the
>  code(e.g. jar file)?
> >>
> >> A Streams app is a regular Java application and can run anywhere --
> >> there is no notion of a processing cluster and you don't "submit" your
> >> code -- you just run your app.
> >>
> >> Thus, if your console consumer can connect to the cluster, your Streams
> >> app should also be able to connect to the cluster.
> >>
> >>
> >> Maybe, the short runtime of 5 seconds could be a problem (even if it
> >> seems log to process just a few records). But you might need to put
> >> startup delay into account. I would recommend to register a shutdown
> >> hook: see
> >> https://github.com/confluentinc/examples/blob/3.
> >> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
> >> WordCountLambdaExample.java#L178-L181
> >>
> >>
> >> Hope this helps

Re: Kafka connection to start from latest offset

2017-03-14 Thread Stephen Durfey
Producer and consumer overrides used by the connect worker can be
overridden by prefixing the specific kafka config with either 'producer.'
or 'consumer.'. So, you should be able to set
'consumer.auto.offset.reset=latest' in your worker config to do that.

http://docs.confluent.io/3.0.0/connect/userguide.html?highlight=override%20configuration#overriding-producer-consumer-settings

On Tue, Mar 14, 2017 at 7:47 PM, Aaron Niskode-Dossett <
aniskodedoss...@etsy.com.invalid> wrote:

> Is it possible to start a kafka connect instance that reads from the
> *latest* offset as opposed to the earliest?  I suppose this would be the
> equivalent of passing auto.offset.reset=earliest to a kafka consumer.
>
> More generally, is this something that specific implementations of the
> kafka connect API would have to be responsible for handling or should it
> exposed through the connect API?
>
> Thanks, Aaron
>


Re: Common Identity between brokers

2017-03-14 Thread Hans Jespersen
This might be useful reading as it outlines why Cluster ID was added and lists 
a few ways that clusters can be identifies prior to that feature enhancement.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id 


-hans




> On Mar 14, 2017, at 11:20 AM, Sumit Maheshwari  wrote:
> 
> Can anyone answer the above query?
> 
> On Mon, Mar 13, 2017 at 3:41 PM, Sumit Maheshwari 
> wrote:
> 
>> Hi,
>> 
>> How can we identify if a set of brokers (nodes) belong to same cluster?
>> I understand we can use the zookeeper where all the brokers pointing to
>> same zookeeper URL's belong to same cluster.
>> But is there a common identity between brokers which can help identify if
>> brokers belong to same cluster?
>> 
>> I have seen in recent Kafka release there is concept of clusterId but that
>> is available from 0.10.1.0. I am using little older version of Kafka.
>> 
>> Thanks,
>> Sumit
>> 



Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
And the port for kafka is 29092 and for zookeeper 32181.

On Tue, Mar 14, 2017 at 9:06 PM, Mina Aslani  wrote:

> Hi,
>
> I forgot to add in my previous email 2 questions.
>
> To setup my env, shall I use https://raw.githubusercontent.com/
> confluentinc/cp-docker-images/master/examples/kafka-single-
> node/docker-compose.yml instead or is there any other docker-compose.yml
> (version 2 or 3) which is suggested to setup env?
>
> How can I check "whether streams (that is just an app) can reach Kafka"?
>
> Regards,
> Mina
>
> On Tue, Mar 14, 2017 at 9:00 PM, Mina Aslani  wrote:
>
>> Hi Eno,
>>
>> Sorry! That is a typo!
>>
>> I have a docker-machine with different containers (setup as directed @
>> http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html)
>>
>> docker ps --format "{{.Image}}: {{.Names}}"
>>
>> confluentinc/cp-kafka-connect:3.2.0: kafka-connect
>>
>> confluentinc/cp-enterprise-control-center:3.2.0: control-center
>>
>> confluentinc/cp-kafka-rest:3.2.0: kafka-rest
>>
>> confluentinc/cp-schema-registry:3.2.0: schema-registry
>>
>> confluentinc/cp-kafka:3.2.0: kafka
>>
>> confluentinc/cp-zookeeper:3.2.0: zookeeper
>>
>> I used example @ https://github.com/confluent
>> inc/examples/blob/3.2.x/kafka-streams/src/main/java/io/
>> confluent/examples/streams/WordCountLambdaExample.java#L178-L181 and
>> followed the same steps.
>>
>> When I run below command in docker-machine, I see the messages in
>> TextLinesTopic.
>>
>> docker run --net=host --rm confluentinc/cp-kafka:3.2.0 kafka-console-consumer
>> --bootstrap-server localhost:29092 --topic TextLinesTopic --new-consumer
>> --from-beginning
>>
>> hello kafka streams
>>
>> all streams lead to kafka
>>
>> join kafka summit
>>
>> test1
>>
>> test2
>>
>> test3
>>
>> test4
>>
>> Running above command for WordsWithCountsTopic returns nothing*.*
>>
>> My program runs out of docker machine, and it does not return any error.
>>
>> I checked kafka logs and kafka-connect logs, no information is shown.
>> Wondering what is the log level in kafka/kafka-connect.
>>
>>
>> Best regards,
>> Mina
>>
>>
>>
>>
>> On Tue, Mar 14, 2017 at 3:57 PM, Eno Thereska 
>> wrote:
>>
>>> Hi there,
>>>
>>> I noticed in your example that you are using localhost:9092 to produce
>>> but localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and
>>> the Kafka Streams app all running within one docker container, or in
>>> different containers?
>>>
>>> I just tested the WordCountLambdaExample and it works for me. This might
>>> not have anything to do with streams, but rather with the Kafka
>>> configuration and whether streams (that is just an app) can reach Kafka at
>>> all. If you provide the above information we can look further.
>>>
>>>
>>>
>>> Thanks
>>> Eno
>>>
>>> > On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
>>> >
>>> > I reset and still not working!
>>> >
>>> > My env is setup using
>>> > http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html
>>> >
>>> > I just tried using
>>> > https://github.com/confluentinc/examples/blob/3.2.x/kafka-st
>>> reams/src/main/java/io/confluent/examples/streams/WordCountL
>>> ambdaExample.java#L178-L181
>>> > with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic)
>>> created
>>> > from scratch as went through the steps as directed.
>>> >
>>> > When I stopped the java program and check the topics below are the
>>> data in
>>> > each topic.
>>> >
>>> > docker run \
>>> >
>>> >  --net=host \
>>> >
>>> >  --rm \
>>> >
>>> >  confluentinc/cp-kafka:3.2.0 \
>>> >
>>> >  kafka-console-consumer --bootstrap-server localhost:29092 --topic
>>> > TextLinesTopic --new-consumer --from-beginning
>>> >
>>> >
>>> > SHOWS
>>> >
>>> > hello kafka streams
>>> >
>>> > all streams lead to kafka
>>> >
>>> > join kafka summit
>>> >
>>> > test1
>>> >
>>> > test2
>>> >
>>> > test3
>>> >
>>> > test4
>>> >
>>> > FOR WordsWithCountsTopic nothing is shown
>>> >
>>> >
>>> > I am new to the Kafka/Kafka Stream and still do not understand why a
>>> simple
>>> > example does not work!
>>> >
>>> > On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax <
>>> matth...@confluent.io>
>>> > wrote:
>>> >
>>>  So, when I check the number of messages in wordCount-input I see the
>>> >> same
>>>  messages. However, when I run below code I do not see any
>>> message/data
>>> >> in
>>>  wordCount-output.
>>> >>
>>> >> Did you reset your application?
>>> >>
>>> >> Each time you run you app and restart it, it will resume processing
>>> >> where it left off. Thus, if something went wrong in you first run but
>>> >> you got committed offsets, the app will not re-read the whole topic.
>>> >>
>>> >> You can check committed offset via bin/kafka-consumer-groups.sh. The
>>> >> application-id from StreamConfig is used a group.id.
>>> >>
>>> >> Thus, resetting you app would be required to consumer the input topic
>>> >> from scratch. Of you just write new data to you input topic.
>>> >>
>>>  Can I connect to kafka in VM/docker container using be

Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
Hi,

I forgot to add in my previous email 2 questions.

To setup my env, shall I use
https://raw.githubusercontent.com/confluentinc/cp-docker-images/master/examples/kafka-single-node/docker-compose.yml
instead or is there any other docker-compose.yml (version 2 or 3) which is
suggested to setup env?

How can I check "whether streams (that is just an app) can reach Kafka"?

Regards,
Mina

On Tue, Mar 14, 2017 at 9:00 PM, Mina Aslani  wrote:

> Hi Eno,
>
> Sorry! That is a typo!
>
> I have a docker-machine with different containers (setup as directed @
> http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html)
>
> docker ps --format "{{.Image}}: {{.Names}}"
>
> confluentinc/cp-kafka-connect:3.2.0: kafka-connect
>
> confluentinc/cp-enterprise-control-center:3.2.0: control-center
>
> confluentinc/cp-kafka-rest:3.2.0: kafka-rest
>
> confluentinc/cp-schema-registry:3.2.0: schema-registry
>
> confluentinc/cp-kafka:3.2.0: kafka
>
> confluentinc/cp-zookeeper:3.2.0: zookeeper
>
> I used example @ https://github.com/confluentinc/examples/blob/3.
> 2.x/kafka-streams/src/main/java/io/confluent/examples/streams/
> WordCountLambdaExample.java#L178-L181 and followed the same steps.
>
> When I run below command in docker-machine, I see the messages in
> TextLinesTopic.
>
> docker run --net=host --rm confluentinc/cp-kafka:3.2.0 kafka-console-consumer
> --bootstrap-server localhost:29092 --topic TextLinesTopic --new-consumer
> --from-beginning
>
> hello kafka streams
>
> all streams lead to kafka
>
> join kafka summit
>
> test1
>
> test2
>
> test3
>
> test4
>
> Running above command for WordsWithCountsTopic returns nothing*.*
>
> My program runs out of docker machine, and it does not return any error.
>
> I checked kafka logs and kafka-connect logs, no information is shown.
> Wondering what is the log level in kafka/kafka-connect.
>
>
> Best regards,
> Mina
>
>
>
>
> On Tue, Mar 14, 2017 at 3:57 PM, Eno Thereska 
> wrote:
>
>> Hi there,
>>
>> I noticed in your example that you are using localhost:9092 to produce
>> but localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and
>> the Kafka Streams app all running within one docker container, or in
>> different containers?
>>
>> I just tested the WordCountLambdaExample and it works for me. This might
>> not have anything to do with streams, but rather with the Kafka
>> configuration and whether streams (that is just an app) can reach Kafka at
>> all. If you provide the above information we can look further.
>>
>>
>>
>> Thanks
>> Eno
>>
>> > On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
>> >
>> > I reset and still not working!
>> >
>> > My env is setup using
>> > http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html
>> >
>> > I just tried using
>> > https://github.com/confluentinc/examples/blob/3.2.x/kafka-
>> streams/src/main/java/io/confluent/examples/streams/Wor
>> dCountLambdaExample.java#L178-L181
>> > with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic)
>> created
>> > from scratch as went through the steps as directed.
>> >
>> > When I stopped the java program and check the topics below are the data
>> in
>> > each topic.
>> >
>> > docker run \
>> >
>> >  --net=host \
>> >
>> >  --rm \
>> >
>> >  confluentinc/cp-kafka:3.2.0 \
>> >
>> >  kafka-console-consumer --bootstrap-server localhost:29092 --topic
>> > TextLinesTopic --new-consumer --from-beginning
>> >
>> >
>> > SHOWS
>> >
>> > hello kafka streams
>> >
>> > all streams lead to kafka
>> >
>> > join kafka summit
>> >
>> > test1
>> >
>> > test2
>> >
>> > test3
>> >
>> > test4
>> >
>> > FOR WordsWithCountsTopic nothing is shown
>> >
>> >
>> > I am new to the Kafka/Kafka Stream and still do not understand why a
>> simple
>> > example does not work!
>> >
>> > On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax > >
>> > wrote:
>> >
>>  So, when I check the number of messages in wordCount-input I see the
>> >> same
>>  messages. However, when I run below code I do not see any
>> message/data
>> >> in
>>  wordCount-output.
>> >>
>> >> Did you reset your application?
>> >>
>> >> Each time you run you app and restart it, it will resume processing
>> >> where it left off. Thus, if something went wrong in you first run but
>> >> you got committed offsets, the app will not re-read the whole topic.
>> >>
>> >> You can check committed offset via bin/kafka-consumer-groups.sh. The
>> >> application-id from StreamConfig is used a group.id.
>> >>
>> >> Thus, resetting you app would be required to consumer the input topic
>> >> from scratch. Of you just write new data to you input topic.
>> >>
>>  Can I connect to kafka in VM/docker container using below code or do
>> I
>> >> need
>>  to change/add other parameters? How can I submit the code to
>>  kafka/kafka-connect? Do we have similar concept as SPARK to submit
>> the
>>  code(e.g. jar file)?
>> >>
>> >> A Streams app is a regular Java application and can run anywhere --
>> >> there is no notion of a proc

Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
I even tried
http://docs.confluent.io/3.2.0/streams/quickstart.html#goal-of-this-quickstart

and in docker-machine  ran /usr/bin/kafka-run-class
org.apache.kafka.streams.examples.wordcount.WordCountDemo

Running

docker run --net=host --rm confluentinc/cp-kafka:3.2.0
kafka-console-consumer --bootstrap-server localhost:9092 --topic
streams-wordcount-output --new-consumer --from-beginning

shows 8 blank messages

Is there any setting/configuration should be done as running the class in
the docker-machine and running program outside the docker-machine does not
return expected result!

On Tue, Mar 14, 2017 at 9:56 PM, Mina Aslani  wrote:

> And the port for kafka is 29092 and for zookeeper 32181.
>
> On Tue, Mar 14, 2017 at 9:06 PM, Mina Aslani  wrote:
>
>> Hi,
>>
>> I forgot to add in my previous email 2 questions.
>>
>> To setup my env, shall I use https://raw.githubusercont
>> ent.com/confluentinc/cp-docker-images/master/examples/
>> kafka-single-node/docker-compose.yml instead or is there any other
>> docker-compose.yml (version 2 or 3) which is suggested to setup env?
>>
>> How can I check "whether streams (that is just an app) can reach Kafka"?
>>
>> Regards,
>> Mina
>>
>> On Tue, Mar 14, 2017 at 9:00 PM, Mina Aslani 
>> wrote:
>>
>>> Hi Eno,
>>>
>>> Sorry! That is a typo!
>>>
>>> I have a docker-machine with different containers (setup as directed @
>>> http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html)
>>>
>>> docker ps --format "{{.Image}}: {{.Names}}"
>>>
>>> confluentinc/cp-kafka-connect:3.2.0: kafka-connect
>>>
>>> confluentinc/cp-enterprise-control-center:3.2.0: control-center
>>>
>>> confluentinc/cp-kafka-rest:3.2.0: kafka-rest
>>>
>>> confluentinc/cp-schema-registry:3.2.0: schema-registry
>>>
>>> confluentinc/cp-kafka:3.2.0: kafka
>>>
>>> confluentinc/cp-zookeeper:3.2.0: zookeeper
>>>
>>> I used example @ https://github.com/confluent
>>> inc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confl
>>> uent/examples/streams/WordCountLambdaExample.java#L178-L181 and
>>> followed the same steps.
>>>
>>> When I run below command in docker-machine, I see the messages in
>>> TextLinesTopic.
>>>
>>> docker run --net=host --rm confluentinc/cp-kafka:3.2.0 
>>> kafka-console-consumer
>>> --bootstrap-server localhost:29092 --topic TextLinesTopic --new-consumer
>>> --from-beginning
>>>
>>> hello kafka streams
>>>
>>> all streams lead to kafka
>>>
>>> join kafka summit
>>>
>>> test1
>>>
>>> test2
>>>
>>> test3
>>>
>>> test4
>>>
>>> Running above command for WordsWithCountsTopic returns nothing*.*
>>>
>>> My program runs out of docker machine, and it does not return any error.
>>>
>>> I checked kafka logs and kafka-connect logs, no information is shown.
>>> Wondering what is the log level in kafka/kafka-connect.
>>>
>>>
>>> Best regards,
>>> Mina
>>>
>>>
>>>
>>>
>>> On Tue, Mar 14, 2017 at 3:57 PM, Eno Thereska 
>>> wrote:
>>>
 Hi there,

 I noticed in your example that you are using localhost:9092 to produce
 but localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, and
 the Kafka Streams app all running within one docker container, or in
 different containers?

 I just tested the WordCountLambdaExample and it works for me. This
 might not have anything to do with streams, but rather with the Kafka
 configuration and whether streams (that is just an app) can reach Kafka at
 all. If you provide the above information we can look further.



 Thanks
 Eno

 > On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
 >
 > I reset and still not working!
 >
 > My env is setup using
 > http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html
 >
 > I just tried using
 > https://github.com/confluentinc/examples/blob/3.2.x/kafka-st
 reams/src/main/java/io/confluent/examples/streams/WordCountL
 ambdaExample.java#L178-L181
 > with all the topics(e.g. TextLinesTopic and WordsWithCountsTopic)
 created
 > from scratch as went through the steps as directed.
 >
 > When I stopped the java program and check the topics below are the
 data in
 > each topic.
 >
 > docker run \
 >
 >  --net=host \
 >
 >  --rm \
 >
 >  confluentinc/cp-kafka:3.2.0 \
 >
 >  kafka-console-consumer --bootstrap-server localhost:29092 --topic
 > TextLinesTopic --new-consumer --from-beginning
 >
 >
 > SHOWS
 >
 > hello kafka streams
 >
 > all streams lead to kafka
 >
 > join kafka summit
 >
 > test1
 >
 > test2
 >
 > test3
 >
 > test4
 >
 > FOR WordsWithCountsTopic nothing is shown
 >
 >
 > I am new to the Kafka/Kafka Stream and still do not understand why a
 simple
 > example does not work!
 >
 > On Tue, Mar 14, 2017 at 1:03 PM, Matthias J. Sax <
 matth...@confluent.io>
 > wrote:
 >
  So, when I check the number of mess

Kafka Topics best practice for logging data pipeline use case

2017-03-14 Thread Ram Vittal
We are using latest Kafka and Logstash versions for ingesting several
business apps logs(now few but eventually 100+) into ELK. We have a
standardized logging structure for business apps to log data into Kafka
topics and able to ingest into ELK via Kafka topics input plugin.

Currently, we are using one kafka topic for each business app for pushing
data into logstash. We have 3 logstash consumers with 3 partitions on each
topic.

I am wondering about the best practice for using kafka/logstash. Is the
above config a good approach or is there better approach.

For example, instead of having one kafka topic for each app, should we have
one kafka topic across all apps? What are the pros and cons?

If you are not familiar with Logstash it is part of Elastic stack and it is
just another consumer for Kafka.

Would appreciate your input!
-- 
Thanks,
Ram Vittal


Re: Trying to use Kafka Stream

2017-03-14 Thread Mina Aslani
Hi,
I just checked streams-wordcount-output topic using below command

docker run \

  --net=host \

  --rm \

  confluentinc/cp-kafka:3.2.0 \

  kafka-console-consumer --bootstrap-server localhost:9092 \

  --topic streams-wordcount-output \

  --from-beginning \

  --formatter kafka.tools.DefaultMessageFormatter \

  --property print.key=true \

  --property key.deserializer=org.apache.ka
fka.common.serialization.StringDeserializer \

  --property value.deserializer=org.apache.
kafka.common.serialization.LongDeserializer


and it returns

all 1
lead 1
to 1
hello 1
streams 2
join 1
kafka 3
summit 1

Please note above result is when I tried  http://docs.confluent.i
o/3.2.0/streams/quickstart.html#goal-of-this-quickstart and in
docker-machine  ran /usr/bin/kafka-run-class org.apache.kafka.streams.examp
les.wordcount.WordCountDemo.

How come running same program out of docker-machine does not output to the
output topic?
Should I make the program as jar and deploy to docker-machine and run it
using ./bin/kafka-run-class?

Best regards,
Mina



On Tue, Mar 14, 2017 at 11:11 PM, Mina Aslani  wrote:

> I even tried http://docs.confluent.io/3.2.0/streams/quickstart.html#goal-
> of-this-quickstart
>
> and in docker-machine  ran /usr/bin/kafka-run-class
> org.apache.kafka.streams.examples.wordcount.WordCountDemo
>
> Running
>
> docker run --net=host --rm confluentinc/cp-kafka:3.2.0
> kafka-console-consumer --bootstrap-server localhost:9092 --topic
> streams-wordcount-output --new-consumer --from-beginning
>
> shows 8 blank messages
>
> Is there any setting/configuration should be done as running the class in
> the docker-machine and running program outside the docker-machine does not
> return expected result!
>
> On Tue, Mar 14, 2017 at 9:56 PM, Mina Aslani  wrote:
>
>> And the port for kafka is 29092 and for zookeeper 32181.
>>
>> On Tue, Mar 14, 2017 at 9:06 PM, Mina Aslani 
>> wrote:
>>
>>> Hi,
>>>
>>> I forgot to add in my previous email 2 questions.
>>>
>>> To setup my env, shall I use https://raw.githubusercont
>>> ent.com/confluentinc/cp-docker-images/master/examples/kafka-
>>> single-node/docker-compose.yml instead or is there any other
>>> docker-compose.yml (version 2 or 3) which is suggested to setup env?
>>>
>>> How can I check "whether streams (that is just an app) can reach Kafka"?
>>>
>>> Regards,
>>> Mina
>>>
>>> On Tue, Mar 14, 2017 at 9:00 PM, Mina Aslani 
>>> wrote:
>>>
 Hi Eno,

 Sorry! That is a typo!

 I have a docker-machine with different containers (setup as directed @
 http://docs.confluent.io/3.2.0/cp-docker-images/docs/quickstart.html)

 docker ps --format "{{.Image}}: {{.Names}}"

 confluentinc/cp-kafka-connect:3.2.0: kafka-connect

 confluentinc/cp-enterprise-control-center:3.2.0: control-center

 confluentinc/cp-kafka-rest:3.2.0: kafka-rest

 confluentinc/cp-schema-registry:3.2.0: schema-registry

 confluentinc/cp-kafka:3.2.0: kafka

 confluentinc/cp-zookeeper:3.2.0: zookeeper

 I used example @ https://github.com/confluent
 inc/examples/blob/3.2.x/kafka-streams/src/main/java/io/confl
 uent/examples/streams/WordCountLambdaExample.java#L178-L181 and
 followed the same steps.

 When I run below command in docker-machine, I see the messages in
 TextLinesTopic.

 docker run --net=host --rm confluentinc/cp-kafka:3.2.0 
 kafka-console-consumer
 --bootstrap-server localhost:29092 --topic TextLinesTopic --new-consumer
 --from-beginning

 hello kafka streams

 all streams lead to kafka

 join kafka summit

 test1

 test2

 test3

 test4

 Running above command for WordsWithCountsTopic returns nothing*.*

 My program runs out of docker machine, and it does not return any
 error.

 I checked kafka logs and kafka-connect logs, no information is shown.
 Wondering what is the log level in kafka/kafka-connect.


 Best regards,
 Mina




 On Tue, Mar 14, 2017 at 3:57 PM, Eno Thereska 
 wrote:

> Hi there,
>
> I noticed in your example that you are using localhost:9092 to produce
> but localhost:29092 to consume? Or is that a typo? Is zookeeper, kafka, 
> and
> the Kafka Streams app all running within one docker container, or in
> different containers?
>
> I just tested the WordCountLambdaExample and it works for me. This
> might not have anything to do with streams, but rather with the Kafka
> configuration and whether streams (that is just an app) can reach Kafka at
> all. If you provide the above information we can look further.
>
>
>
> Thanks
> Eno
>
> > On 14 Mar 2017, at 18:42, Mina Aslani  wrote:
> >
> > I reset and still not working!
> >
> > My env is setup using
> > http://docs.confluent.io/3