Re: Can anyone help me to send messages in their original order?

2018-05-26 Thread Damian Guy
Hi Raymond,
If you want all messages delivered in order then you should create the
topic with 1 partition. If you want ordering guarantees for messages with
the same key, then you need to produce the messages with a key.

Using the console producer you can do that by adding
--property "parse.key=true"
--property "key.separator=,"

Regards,
Damian

On Sat, 26 May 2018 at 21:32, Raymond Xie  wrote:

> Thank you so much Hans for your enlightening, it is definitely greatly
> helpful to me as a new starter.
>
> So for my case, what is the right options I should put together to run the
> commands for producer and consumer respectively?
>
> Thanks.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sat, May 26, 2018 at 4:26 PM, Hans Jespersen  wrote:
>
> > There are two concepts in Kafka that are not always familiar to people
> who
> > have used other pub/sub systems.
> >
> > 1) partitions:
> >
> > Kafka topics are partitioned which means a single topic is sharded into
> > multiple pieces that are distributed across multiple brokers in the
> cluster
> > for parallel processing.
> >
> > Order is guaranteed per partition (not per topic).
> >
> > You can think of each kafka topic partition like an exclusive queue is
> > traditional messaging systems and order is not guaranteed when the data
> is
> > spread out across multiple queues in tradition messaging either.
> >
> > 2) keys
> >
> > Kafka messages have keys in addition the value (I.e body) and the header.
> > When messages are published with the same key they will be all be sent in
> > order to the same partition.
> >
> > If messages are published with a “null” key then they will be spread out
> > round robin across all partitions (which is what you have done).
> >
> >
> > Conclusion
> >
> > You will see ordered delivery if your either use a key when you publish
> or
> > create a topic with one partition.
> >
> >
> > -hans
> >
> > On May 26, 2018, at 7:59 AM, Raymond Xie  wrote:
> >
> > Thanks. By default, can you explain me why I received the message in
> wrong
> > order? Note there are only 9 lines from 1 to 9, but on consumer side
> their
> > original order becomes messed up.
> >
> > ~~~sent from my cell phone, sorry if there is any typo
> >
> > Hans Jespersen  于 2018年5月26日周六 上午12:16写道:
> >
> >> If you create a topic with one partition they will be in order.
> >>
> >> Alternatively if you publish with the same key for every message they
> >> will be in the same order even if your topic has more than 1 partition.
> >>
> >> Either way above will work for Kafka.
> >>
> >> -hans
> >>
> >> > On May 25, 2018, at 8:56 PM, Raymond Xie 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I just started learning Kafka and have the environment setup on my
> >> > hortonworks sandbox at home vmware.
> >> >
> >> > test.csv is what I want the producer to send out:
> >> >
> >> > more test1.csv ./kafka-console-producer.sh --broker-list
> >> > sandbox.hortonworks.com:6667 --topic kafka-topic2
> >> >
> >> > 1, abc
> >> > 2, def
> >> > ...
> >> > 8, vwx
> >> > 9, zzz
> >> >
> >> > What I received are all the content of test.csv, however, not in their
> >> > original order;
> >> >
> >> > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic
> >> > kafka-topic2
> >> >
> >> > 2, def
> >> > 1, abc
> >> > ...
> >> > 9, zzz
> >> > 8, vwx
> >> >
> >> >
> >> > I read from google that partition could be the feasible solution,
> >> however,
> >> > my questions are:
> >> >
> >> > 1. for small files like this one, shall I really do the partitioning?
> >> how
> >> > small a partition would be acceptable to ensure the sequence?
> >> > 2. for big files, each partition could still contain multiple lines,
> >> how to
> >> > ensure all the lines in each partition won't get messed up on consumer
> >> side?
> >> >
> >> >
> >> > I also want to know what is the best practice to process large volume
> of
> >> > data through kafka? There should be better way other than console
> >> command.
> >> >
> >> > Thank you very much.
> >> >
> >> >
> >> >
> >> > **
> >> > *Sincerely yours,*
> >> >
> >> >
> >> > *Raymond*
> >>
> >
>


Re: Can anyone help me to send messages in their original order?

2018-05-26 Thread Raymond Xie
Thank you so much Hans for your enlightening, it is definitely greatly
helpful to me as a new starter.

So for my case, what is the right options I should put together to run the
commands for producer and consumer respectively?

Thanks.


**
*Sincerely yours,*


*Raymond*

On Sat, May 26, 2018 at 4:26 PM, Hans Jespersen  wrote:

> There are two concepts in Kafka that are not always familiar to people who
> have used other pub/sub systems.
>
> 1) partitions:
>
> Kafka topics are partitioned which means a single topic is sharded into
> multiple pieces that are distributed across multiple brokers in the cluster
> for parallel processing.
>
> Order is guaranteed per partition (not per topic).
>
> You can think of each kafka topic partition like an exclusive queue is
> traditional messaging systems and order is not guaranteed when the data is
> spread out across multiple queues in tradition messaging either.
>
> 2) keys
>
> Kafka messages have keys in addition the value (I.e body) and the header.
> When messages are published with the same key they will be all be sent in
> order to the same partition.
>
> If messages are published with a “null” key then they will be spread out
> round robin across all partitions (which is what you have done).
>
>
> Conclusion
>
> You will see ordered delivery if your either use a key when you publish or
> create a topic with one partition.
>
>
> -hans
>
> On May 26, 2018, at 7:59 AM, Raymond Xie  wrote:
>
> Thanks. By default, can you explain me why I received the message in wrong
> order? Note there are only 9 lines from 1 to 9, but on consumer side their
> original order becomes messed up.
>
> ~~~sent from my cell phone, sorry if there is any typo
>
> Hans Jespersen  于 2018年5月26日周六 上午12:16写道:
>
>> If you create a topic with one partition they will be in order.
>>
>> Alternatively if you publish with the same key for every message they
>> will be in the same order even if your topic has more than 1 partition.
>>
>> Either way above will work for Kafka.
>>
>> -hans
>>
>> > On May 25, 2018, at 8:56 PM, Raymond Xie  wrote:
>> >
>> > Hello,
>> >
>> > I just started learning Kafka and have the environment setup on my
>> > hortonworks sandbox at home vmware.
>> >
>> > test.csv is what I want the producer to send out:
>> >
>> > more test1.csv ./kafka-console-producer.sh --broker-list
>> > sandbox.hortonworks.com:6667 --topic kafka-topic2
>> >
>> > 1, abc
>> > 2, def
>> > ...
>> > 8, vwx
>> > 9, zzz
>> >
>> > What I received are all the content of test.csv, however, not in their
>> > original order;
>> >
>> > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic
>> > kafka-topic2
>> >
>> > 2, def
>> > 1, abc
>> > ...
>> > 9, zzz
>> > 8, vwx
>> >
>> >
>> > I read from google that partition could be the feasible solution,
>> however,
>> > my questions are:
>> >
>> > 1. for small files like this one, shall I really do the partitioning?
>> how
>> > small a partition would be acceptable to ensure the sequence?
>> > 2. for big files, each partition could still contain multiple lines,
>> how to
>> > ensure all the lines in each partition won't get messed up on consumer
>> side?
>> >
>> >
>> > I also want to know what is the best practice to process large volume of
>> > data through kafka? There should be better way other than console
>> command.
>> >
>> > Thank you very much.
>> >
>> >
>> >
>> > **
>> > *Sincerely yours,*
>> >
>> >
>> > *Raymond*
>>
>


Re: Can anyone help me to send messages in their original order?

2018-05-26 Thread Hans Jespersen
There are two concepts in Kafka that are not always familiar to people who have 
used other pub/sub systems. 

1) partitions: 

Kafka topics are partitioned which means a single topic is sharded into 
multiple pieces that are distributed across multiple brokers in the cluster for 
parallel processing.

Order is guaranteed per partition (not per topic).

You can think of each kafka topic partition like an exclusive queue is 
traditional messaging systems and order is not guaranteed when the data is 
spread out across multiple queues in tradition messaging either.

2) keys

Kafka messages have keys in addition the value (I.e body) and the header. When 
messages are published with the same key they will be all be sent in order to 
the same partition.

If messages are published with a “null” key then they will be spread out round 
robin across all partitions (which is what you have done).


Conclusion 

You will see ordered delivery if your either use a key when you publish or 
create a topic with one partition.


-hans

> On May 26, 2018, at 7:59 AM, Raymond Xie  wrote:
> 
> Thanks. By default, can you explain me why I received the message in wrong 
> order? Note there are only 9 lines from 1 to 9, but on consumer side their 
> original order becomes messed up.
> 
> ~~~sent from my cell phone, sorry if there is any typo
> 
> Hans Jespersen  于 2018年5月26日周六 上午12:16写道:
>> If you create a topic with one partition they will be in order.
>> 
>> Alternatively if you publish with the same key for every message they will 
>> be in the same order even if your topic has more than 1 partition.
>> 
>> Either way above will work for Kafka.
>> 
>> -hans
>> 
>> > On May 25, 2018, at 8:56 PM, Raymond Xie  wrote:
>> > 
>> > Hello,
>> > 
>> > I just started learning Kafka and have the environment setup on my
>> > hortonworks sandbox at home vmware.
>> > 
>> > test.csv is what I want the producer to send out:
>> > 
>> > more test1.csv ./kafka-console-producer.sh --broker-list
>> > sandbox.hortonworks.com:6667 --topic kafka-topic2
>> > 
>> > 1, abc
>> > 2, def
>> > ...
>> > 8, vwx
>> > 9, zzz
>> > 
>> > What I received are all the content of test.csv, however, not in their
>> > original order;
>> > 
>> > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic
>> > kafka-topic2
>> > 
>> > 2, def
>> > 1, abc
>> > ...
>> > 9, zzz
>> > 8, vwx
>> > 
>> > 
>> > I read from google that partition could be the feasible solution, however,
>> > my questions are:
>> > 
>> > 1. for small files like this one, shall I really do the partitioning? how
>> > small a partition would be acceptable to ensure the sequence?
>> > 2. for big files, each partition could still contain multiple lines, how to
>> > ensure all the lines in each partition won't get messed up on consumer 
>> > side?
>> > 
>> > 
>> > I also want to know what is the best practice to process large volume of
>> > data through kafka? There should be better way other than console command.
>> > 
>> > Thank you very much.
>> > 
>> > 
>> > 
>> > **
>> > *Sincerely yours,*
>> > 
>> > 
>> > *Raymond*


Re: Can anyone help me to send messages in their original order?

2018-05-26 Thread Raymond Xie
Thanks. By default, can you explain me why I received the message in wrong
order? Note there are only 9 lines from 1 to 9, but on consumer side their
original order becomes messed up.

~~~sent from my cell phone, sorry if there is any typo

Hans Jespersen  于 2018年5月26日周六 上午12:16写道:

> If you create a topic with one partition they will be in order.
>
> Alternatively if you publish with the same key for every message they will
> be in the same order even if your topic has more than 1 partition.
>
> Either way above will work for Kafka.
>
> -hans
>
> > On May 25, 2018, at 8:56 PM, Raymond Xie  wrote:
> >
> > Hello,
> >
> > I just started learning Kafka and have the environment setup on my
> > hortonworks sandbox at home vmware.
> >
> > test.csv is what I want the producer to send out:
> >
> > more test1.csv ./kafka-console-producer.sh --broker-list
> > sandbox.hortonworks.com:6667 --topic kafka-topic2
> >
> > 1, abc
> > 2, def
> > ...
> > 8, vwx
> > 9, zzz
> >
> > What I received are all the content of test.csv, however, not in their
> > original order;
> >
> > kafka-console-consumer.sh --zookeeper 192.168.112.129:2181 --topic
> > kafka-topic2
> >
> > 2, def
> > 1, abc
> > ...
> > 9, zzz
> > 8, vwx
> >
> >
> > I read from google that partition could be the feasible solution,
> however,
> > my questions are:
> >
> > 1. for small files like this one, shall I really do the partitioning? how
> > small a partition would be acceptable to ensure the sequence?
> > 2. for big files, each partition could still contain multiple lines, how
> to
> > ensure all the lines in each partition won't get messed up on consumer
> side?
> >
> >
> > I also want to know what is the best practice to process large volume of
> > data through kafka? There should be better way other than console
> command.
> >
> > Thank you very much.
> >
> >
> >
> > **
> > *Sincerely yours,*
> >
> >
> > *Raymond*
>