Re: Kafka Producer - Producing to Multiple Topics

2020-08-21 Thread SenthilKumar K
it would be great if someone provides input(s)/hint :) thanks!

--Senthil

On Fri, Aug 21, 2020 at 3:28 PM SenthilKumar K 
wrote:

> Updating the Kafka broker version:
>
> Kafka Version: 2.4.1
>
> On Fri, Aug 21, 2020 at 3:21 PM SenthilKumar K 
> wrote:
>
>> Hi Team,  We have deployed 150 node Kafka cluster on production for our
>> use case. Recently I have seen issue(s) in Kafka Producer Client.
>>
>> Use Case:
>>  --> (Consume)Stream App (Multiple Topologies)
>> (Transform) --> Kafka Producer Topology ( Produce it to Multiple Topics)
>>
>> Initially, the data is written to a common buffer topic. The stream
>> topologies will consume from buffer topic and transform from one form to
>> JSON. Finally transformed output goes to Kafka Producer Topology. The
>> nature of the Kafka Producer Topology is to get input from Transform
>> Topology and Write it to stream specific topics.
>>
>> No of Transformation Topologies: 200
>> No Of Kafka Producer Topologies: 100
>>
>> Each Kafka Producer Topology will be producing on multiple topics(
>> approx: 2000). Initially, it used to produce to 1000 topics, and now the
>> number of topics has increased hence the problem observed.
>>
>> What is the recommended approach to produce data on multiple topics from
>> multiple clients?
>>
>> Looking forward to your inputs to optimize the Kafka producer topology.
>> Thanks in advance!
>>
>> Client Version:
>>
>> 2.3.0
>>
>>
>> Kafka Producer Configuration:
>>
>>
>> retries=3
>> linger.ms=5
>> buffer.memory=67108864
>> batch.size=32768
>> compression.type=snappy
>>
>>
>> --Senthil
>>
>


Re: kafka.common.StateChangeFailedException: Failed to elect leader for partition XXX under strategy PreferredReplicaPartitionLeaderElectionStrategy

2018-11-15 Thread SenthilKumar K
Adding Kafka Controller Log.
[2018-11-15 11:19:23,985] ERROR [Controller id=4 epoch=8] Controller 4
epoch 8 failed to change state for partition XYXY-24 from OnlinePartition
to OnlinePartition (state.change.logger)

On Thu, Nov 15, 2018 at 5:12 PM SenthilKumar K 
wrote:

> Hello Kafka Experts,
>  We are facing  StateChange Failed Exception on one of the broker.
> Out of 4 brokers, 3 were running fine and only one broker is throwing state
> change error. I dont find any error on Zookeeper Logs related to this error.
>
> Kafka Version : kafka_2.11-1.1.0
>
> Any input would help me to debug further on this issue. Thank in advance!
>
> --Senthil
>


kafka.common.StateChangeFailedException: Failed to elect leader for partition XXX under strategy PreferredReplicaPartitionLeaderElectionStrategy

2018-11-15 Thread SenthilKumar K
Hello Kafka Experts,
 We are facing  StateChange Failed Exception on one of the broker.
Out of 4 brokers, 3 were running fine and only one broker is throwing state
change error. I dont find any error on Zookeeper Logs related to this error.

Kafka Version : kafka_2.11-1.1.0

Any input would help me to debug further on this issue. Thank in advance!

--Senthil


Re: Kafka Producer Partition Key Selection

2018-08-29 Thread SenthilKumar K
Thanks Gaurav.  Did you notice side effect mentioned in this page :
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?


--Senthil

On Wed, Aug 29, 2018 at 2:02 PM Gaurav Bajaj  wrote:

> Hello Senthil,
>
> In our case we use NULL as message Key to achieve even distribution in
> producer.
> With that we were able to achieve very even distribution with that.
> Our Kafka client version is 0.10.1.0 and Kafka broker version is 1.1
>
>
> Thanks,
> Gaurav
>
> On Wed, Aug 29, 2018 at 9:15 AM, SenthilKumar K 
> wrote:
>
>> Hello Experts, We want to distribute data across partitions in Kafka
>> Cluster.
>>  Option 1 : Use Null Partition Key which can distribute data across
>> paritions.
>>  Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute
>> data
>> 70-80%.
>>
>> I have seen below side effect on Confluence Page about sending null Keys
>> to
>> Producer. Is this still valid on newer version of Kafka Producer Lib ?
>> Why is data not evenly distributed among partitions when a partitioning
>> key
>> is not specified?
>>
>> In Kafka producer, a partition key can be specified to indicate the
>> destination partition of the message. By default, a hashing-based
>> partitioner is used to determine the partition id given the key, and
>> people
>> can use customized partitioners also.
>>
>> To reduce # of open sockets, in 0.8.0 (
>> https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
>> key is not specified or null, a producer will pick a random partition and
>> stick to it for some time (default is 10 mins) before switching to another
>> one. So, if there are fewer producers than partitions, at a given point of
>> time, some partitions may not receive any data. To alleviate this problem,
>> one can either reduce the metadata refresh interval or specify a message
>> key and a customized random partitioner. For more detail see this thread
>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E
>>
>> Pls advise on Choosing Partition Key which should not have side effects.
>>
>> --Senthil
>>
>
>


Kafka Producer Partition Key Selection

2018-08-29 Thread SenthilKumar K
Hello Experts, We want to distribute data across partitions in Kafka
Cluster.
 Option 1 : Use Null Partition Key which can distribute data across
paritions.
 Option 2 :  Choose Key ( Random UUID ? ) which can help to distribute data
70-80%.

I have seen below side effect on Confluence Page about sending null Keys to
Producer. Is this still valid on newer version of Kafka Producer Lib ?
Why is data not evenly distributed among partitions when a partitioning key
is not specified?

In Kafka producer, a partition key can be specified to indicate the
destination partition of the message. By default, a hashing-based
partitioner is used to determine the partition id given the key, and people
can use customized partitioners also.

To reduce # of open sockets, in 0.8.0 (
https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning
key is not specified or null, a producer will pick a random partition and
stick to it for some time (default is 10 mins) before switching to another
one. So, if there are fewer producers than partitions, at a given point of
time, some partitions may not receive any data. To alleviate this problem,
one can either reduce the metadata refresh interval or specify a message
key and a customized random partitioner. For more detail see this thread
http://mail-archives.apache.org/mod_mbox/kafka-dev/201310.mbox/%3CCAFbh0Q0aVh%2Bvqxfy7H-%2BMnRFBt6BnyoZk1LWBoMspwSmTqUKMg%40mail.gmail.com%3E

Pls advise on Choosing Partition Key which should not have side effects.

--Senthil


Kafka Log deletion Problem

2018-02-02 Thread SenthilKumar K
Hello Experts , We have a Kafka Setup running for our analytics pipeline
...Below is the broker config ..

max.message.bytes = 67108864
replica.fetch.max.bytes = 67108864
zookeeper.session.timeout.ms = 7000
replica.socket.timeout.ms = 3
offsets.commit.timeout.ms = 5000
request.timeout.ms = 4
zookeeper.connection.timeout.ms = 7000
controller.socket.timeout.ms = 3
num.partitions = 24
listeners = SSL://23.212.237.10:9093
broker.id = 1
socket.receive.buffer.bytes = 102400
message.max.bytes = 2621440
auto.create.topics.enable = true
auto.leader.rebalance.enable = true
zookeeper.connect = zk1:2181,zk2:2181
log.retention.ms=8640
#log.retention.hours = 24
socket.request.max.bytes = 104857600
default.replication.factor = 2
log.dirs = /data/kafka_logs
compression.codec = 3

Kafka Version : *0.11*

The retention period is set to 24 Hours , but i could see the data in disk
after 24 hours.. *What could be the problem here ?*

Note : There is no topic specific configuration.

--Senthil


Kafka Consumer - org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 305000 ms

2017-10-11 Thread SenthilKumar K
Hi All , Recently we starting seeing Kafka Consumer error with Timeout .
What could be the cause here ?

Version : kafka_2.11-0.11.0.0

Consumer Properties:

*bootstrap.servers, enable.auto.commit,auto.commit.interval.ms
,session.timeout.ms
,group.id
,key.deserializer,value.deserializer,max.poll.records*

--Senthil


Re: Different Data Types under same topic

2017-08-18 Thread SenthilKumar K
+ dev experts for inputs.


--Senthil

On Fri, Aug 18, 2017 at 9:15 PM, SenthilKumar K <senthilec...@gmail.com>
wrote:

> Hi Users , We have planned to use Kafka for one of the use to collect data
> from different server and persist into Message Bus ..
>
> Flow Would Be :
> Source --> Kafka  -->  Streaming Engine --> Reports
>
> We like to store different types of data in the same topic , same time
> data should be accessed easily ..
>
> here is sample data :
> {"code" :"100" , "type" : "a" , "data" : " hello"}
> {"code" :"100" , "type" : "b" , "data" : " hello"}
> {"code" :"100" , "type" : "c" , "data" : " hello"}
>
> This case we want to create topic called :  *topic_100* and store all
> data but the access pattern is using type.
>
> Example :
>  1) Read only *Type : "a"* data
>
>
> There is an Option to Partition the data using type so that all types goes
> to same partition. The problem is the data is not distributed across
> cluster.
>
> What is the preferred approach to Use Same Topic but different types of
> data ?
>
>
> --Senthil
>


Re: Handling 2 to 3 Million Events before Kafka

2017-06-22 Thread SenthilKumar K
Hi Barton -  I think we can use Async Producer with Call Back api(s) to
keep track on which event failed ..

--Senthil

On Thu, Jun 22, 2017 at 4:58 PM, SenthilKumar K <senthilec...@gmail.com>
wrote:

> Thanks Barton.. I'll look into these ..
>
> On Thu, Jun 22, 2017 at 7:12 AM, Garrett Barton <garrett.bar...@gmail.com>
> wrote:
>
>> Getting good concurrency in a webapp is more than doable.  Check out
>> these benchmarks:
>> https://www.techempower.com/benchmarks/#section=data-r14=ph=db
>> I linked to the single query one because thats closest to a single
>> operation like you will be doing.
>>
>> I'd also note if the data delivery does not need to be guaranteed you
>> could go faster switching the web servers over to UDP and using async mode
>> on the kafka producers.
>>
>> On Wed, Jun 21, 2017 at 2:23 PM, Tauzell, Dave <
>> dave.tauz...@surescripts.com> wrote:
>>
>>> I’m not really familiar with Netty so I won’t be of much help.   Maybe
>>> try posting on a Netty forum to see what they think?
>>> -Dave
>>>
>>> From: SenthilKumar K [mailto:senthilec...@gmail.com]
>>> Sent: Wednesday, June 21, 2017 10:28 AM
>>> To: Tauzell, Dave
>>> Cc: us...@kafka.apache.org; senthilec...@apache.org;
>>> dev@kafka.apache.org
>>> Subject: Re: Handling 2 to 3 Million Events before Kafka
>>>
>>> So netty would work for this case ?  I do have netty server and seems to
>>> be i'm not getting the expected results .. here is the git
>>> https://github.com/senthilec566/netty4-server , is this right
>>> implementation ?
>>>
>>> Cheers,
>>> Senthil
>>>
>>> On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave <
>>> dave.tauz...@surescripts.com<mailto:dave.tauz...@surescripts.com>>
>>> wrote:
>>> I see.
>>>
>>> 1.   You don’t want the 100k machines sending directly to kafka.
>>>
>>> 2.   You can only have a small number of web servers
>>>
>>> People certainly have web-servers handling over 100k concurrent
>>> connections.  See this for some examples:
>>> https://github.com/smallnest/C1000K-Servers .
>>>
>>> It seems possible with the right sort of kafka producer tuning.
>>>
>>> -Dave
>>>
>>> From: SenthilKumar K [mailto:senthilec...@gmail.com>> senthilec...@gmail.com>]
>>> Sent: Wednesday, June 21, 2017 8:55 AM
>>> To: Tauzell, Dave
>>> Cc: us...@kafka.apache.org<mailto:us...@kafka.apache.org>;
>>> senthilec...@apache.org<mailto:senthilec...@apache.org>;
>>> dev@kafka.apache.org<mailto:dev@kafka.apache.org>; Senthil kumar
>>> Subject: Re: Handling 2 to 3 Million Events before Kafka
>>>
>>> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
>>> memory ..
>>>
>>> Hi Dave ,  The problem is not with Kafka , it's all about how do you
>>> handle huge data before kafka.  I did a simple test with 5 node Kafka
>>> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
>>> scaling issue ...
>>>
>>> All we are trying is before kafka how do we handle messages from
>>> different servers ...  Webservers can send fast to kafka but still i can
>>> handle only 50k events per second which is less for my use case.. also i
>>> can't deploy 20 webservers to handle this load. I'm looking for an option
>>> what could be the best candidate before kafka , it should be super fast in
>>> getting all and send it to kafka producer ..
>>>
>>>
>>> --Senthil
>>>
>>> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
>>> dave.tauz...@surescripts.com<mailto:dave.tauz...@surescripts.com>>
>>> wrote:
>>> What are your configurations?
>>>
>>> - production
>>> - brokers
>>> - consumers
>>>
>>> Is the problem that web servers cannot send to Kafka fast enough or your
>>> consumers cannot process messages off of kafka fast enough?
>>> What is the average size of these messages?
>>>
>>> -Dave
>>>
>>> -Original Message-
>>> From: SenthilKumar K [mailto:senthilec...@gmail.com>> senthilec...@gmail.com>]
>>> Sent: Wednesday, June 21, 2017 7:58 AM
>>> To: us...@kafka.apache.org<mailto:us...@kafka.apache.org>
>>> Cc: senthilec...@apache.org<mailto:senthilec...@apache.org>; Senthil
>>> kumar; dev@kafka.apache.org<m

Re: Handling 2 to 3 Million Events before Kafka

2017-06-22 Thread SenthilKumar K
Thanks Barton.. I'll look into these ..

On Thu, Jun 22, 2017 at 7:12 AM, Garrett Barton <garrett.bar...@gmail.com>
wrote:

> Getting good concurrency in a webapp is more than doable.  Check out these
> benchmarks:
> https://www.techempower.com/benchmarks/#section=data-r14=ph=db
> I linked to the single query one because thats closest to a single
> operation like you will be doing.
>
> I'd also note if the data delivery does not need to be guaranteed you
> could go faster switching the web servers over to UDP and using async mode
> on the kafka producers.
>
> On Wed, Jun 21, 2017 at 2:23 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
>
>> I’m not really familiar with Netty so I won’t be of much help.   Maybe
>> try posting on a Netty forum to see what they think?
>> -Dave
>>
>> From: SenthilKumar K [mailto:senthilec...@gmail.com]
>> Sent: Wednesday, June 21, 2017 10:28 AM
>> To: Tauzell, Dave
>> Cc: us...@kafka.apache.org; senthilec...@apache.org; dev@kafka.apache.org
>> Subject: Re: Handling 2 to 3 Million Events before Kafka
>>
>> So netty would work for this case ?  I do have netty server and seems to
>> be i'm not getting the expected results .. here is the git
>> https://github.com/senthilec566/netty4-server , is this right
>> implementation ?
>>
>> Cheers,
>> Senthil
>>
>> On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave <
>> dave.tauz...@surescripts.com<mailto:dave.tauz...@surescripts.com>> wrote:
>> I see.
>>
>> 1.   You don’t want the 100k machines sending directly to kafka.
>>
>> 2.   You can only have a small number of web servers
>>
>> People certainly have web-servers handling over 100k concurrent
>> connections.  See this for some examples:  https://github.com/smallnest/C
>> 1000K-Servers .
>>
>> It seems possible with the right sort of kafka producer tuning.
>>
>> -Dave
>>
>> From: SenthilKumar K [mailto:senthilec...@gmail.com> senthilec...@gmail.com>]
>> Sent: Wednesday, June 21, 2017 8:55 AM
>> To: Tauzell, Dave
>> Cc: us...@kafka.apache.org<mailto:us...@kafka.apache.org>;
>> senthilec...@apache.org<mailto:senthilec...@apache.org>;
>> dev@kafka.apache.org<mailto:dev@kafka.apache.org>; Senthil kumar
>> Subject: Re: Handling 2 to 3 Million Events before Kafka
>>
>> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
>> memory ..
>>
>> Hi Dave ,  The problem is not with Kafka , it's all about how do you
>> handle huge data before kafka.  I did a simple test with 5 node Kafka
>> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
>> scaling issue ...
>>
>> All we are trying is before kafka how do we handle messages from
>> different servers ...  Webservers can send fast to kafka but still i can
>> handle only 50k events per second which is less for my use case.. also i
>> can't deploy 20 webservers to handle this load. I'm looking for an option
>> what could be the best candidate before kafka , it should be super fast in
>> getting all and send it to kafka producer ..
>>
>>
>> --Senthil
>>
>> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
>> dave.tauz...@surescripts.com<mailto:dave.tauz...@surescripts.com>> wrote:
>> What are your configurations?
>>
>> - production
>> - brokers
>> - consumers
>>
>> Is the problem that web servers cannot send to Kafka fast enough or your
>> consumers cannot process messages off of kafka fast enough?
>> What is the average size of these messages?
>>
>> -Dave
>>
>> -Original Message-
>> From: SenthilKumar K [mailto:senthilec...@gmail.com> senthilec...@gmail.com>]
>> Sent: Wednesday, June 21, 2017 7:58 AM
>> To: us...@kafka.apache.org<mailto:us...@kafka.apache.org>
>> Cc: senthilec...@apache.org<mailto:senthilec...@apache.org>; Senthil
>> kumar; dev@kafka.apache.org<mailto:dev@kafka.apache.org>
>> Subject: Handling 2 to 3 Million Events before Kafka
>>
>> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>>
>> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
>> is really good candidate for us to handle this ingestion rate ..
>>
>>
>> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>>
>> I see the problem in Http Server where it can't handle beyond 50K events
>> per instance ..  I'm thinking some other solution would be right choice
>> before Kafka ..
>>
>> Anyone worked on similar use case and similar load ? Suggestions/Thoughts
>> ?
>>
>> --Senthil
>> This e-mail and any files transmitted with it are confidential, may
>> contain sensitive information, and are intended solely for the use of the
>> individual or entity to whom they are addressed. If you have received this
>> e-mail in error, please notify the sender by reply e-mail immediately and
>> destroy all copies of the e-mail and any attachments.
>>
>>
>>
>


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
So netty would work for this case ?  I do have netty server and seems to be
i'm not getting the expected results .. here is the git
https://github.com/senthilec566/netty4-server , is this right
implementation ?

Cheers,
Senthil

On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave <dave.tauz...@surescripts.com
> wrote:

> I see.
>
> 1.   You don’t want the 100k machines sending directly to kafka.
>
> 2.   You can only have a small number of web servers
>
>
>
> People certainly have web-servers handling over 100k concurrent
> connections.  See this for some examples:  https://github.com/smallnest/
> C1000K-Servers .
>
>
>
> It seems possible with the right sort of kafka producer tuning.
>
>
>
> -Dave
>
>
>
> *From:* SenthilKumar K [mailto:senthilec...@gmail.com]
> *Sent:* Wednesday, June 21, 2017 8:55 AM
> *To:* Tauzell, Dave
> *Cc:* us...@kafka.apache.org; senthilec...@apache.org;
> dev@kafka.apache.org; Senthil kumar
> *Subject:* Re: Handling 2 to 3 Million Events before Kafka
>
>
>
> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
> memory ..
>
>
>
> Hi Dave ,  The problem is not with Kafka , it's all about how do you
> handle huge data before kafka.  I did a simple test with 5 node Kafka
> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
> scaling issue ...
>
>
>
> All we are trying is before kafka how do we handle messages from different
> servers ...  Webservers can send fast to kafka but still i can handle only
> 50k events per second which is less for my use case.. also i can't deploy
> 20 webservers to handle this load. I'm looking for an option what could be
> the best candidate before kafka , it should be super fast in getting all
> and send it to kafka producer ..
>
>
>
>
>
> --Senthil
>
>
>
> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
>
> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: us...@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; dev@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
memory ..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle
huge data before kafka.  I did a simple test with 5 node Kafka Cluster
which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling
issue ...

All we are trying is before kafka how do we handle messages from different
servers ...  Webservers can send fast to kafka but still i can handle only
50k events per second which is less for my use case.. also i can't deploy
20 webservers to handle this load. I'm looking for an option what could be
the best candidate before kafka , it should be super fast in getting all
and send it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <dave.tauz...@surescripts.com
> wrote:

> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
> -----Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: us...@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; dev@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
is really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events
per instance ..  I'm thinking some other solution would be right choice
before Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil


Kafka Time Based Index - Server Property ?

2017-05-30 Thread SenthilKumar K
Hi All ,  I've started exploring SearchMessagesByTimestamp
https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-Searchmessagebytimestamp
.

Kafka Producer produces  the record with timestamp .  When i try to search
Timestamps few cases its working but in few case its not working.

Working Case :  If i ask kafka to fetch timestamp last 15 minz , i could
see consumer.offsetsForTimes returns results .

Not working Case:  If i ask Kafka to fetch last 2 minz or 1 min , i could
see consumer.offsetsForTimes returns nothing.

My Kafka Cluster is running with default server.properties , Am i missing
anything here ?  Is there any setting to make it working  for the case last
30 seconds etc  ???

Pls advise me here ...

Cheers,
Senthil


Re: Efficient way of Searching Messages By Timestamp - Kafka

2017-05-28 Thread SenthilKumar K
Hi Dev, It would be great if anybody share your experience on Search
Message by Timestamp ..

Cheer's,
Senthil

On May 28, 2017 2:08 AM, "SenthilKumar K" <senthilec...@gmail.com> wrote:

> Hi Team , Any help here Pls ?
>
> Cheers,
> Senthil
>
> On Sat, May 27, 2017 at 8:25 PM, SenthilKumar K <senthilec...@gmail.com>
> wrote:
>
>> Hello Kafka Developers , Users ,
>>
>> We are exploring the SearchMessageByTimestamp feature in Kafka for
>> our use case .
>>
>> Use Case : Kafka will be realtime message bus , users should be able
>> to pull Logs by specifying start_date and end_date or  Pull me last five
>> minutes data etc ...
>>
>> I did POC on SearchMessageByTimestamp , here is the code
>> https://gist.github.com/senthilec566/16e8e28b32834666fea132afc3a4e2f9 .
>> And i observed that Searching Messages is slow ..
>>
>> Here is small test i did :
>> Query :Fetch Logs of Last *5 minutes*:
>> Result:
>> No of Records fetched : *30*
>> Fetch Time *6210* ms
>>
>> Above test performed in a topic which has 4 partitions. In each partition
>> search & query processing happened .. in other words
>> consumer.offsetsForTimes()
>> consumer.assign(Arrays.asList(partition))
>> consumer.seek(this.partition, offsetTimestamp.offset())
>> consumer.poll(100)
>>
>> are the API calls of each partition.. I realized that , this was the
>> reason for Kafka taking more time..
>>
>> What is efficient way of implementing SerachMessageByTimeStamp ?  Is
>> Kafka right candidate for our Use Case ?
>>
>> Pls add your thoughts here ...
>>
>>
>> Cheers,
>> Senthil
>>
>
>


Re: Efficient way of Searching Messages By Timestamp - Kafka

2017-05-27 Thread SenthilKumar K
Hi Team , Any help here Pls ?

Cheers,
Senthil

On Sat, May 27, 2017 at 8:25 PM, SenthilKumar K <senthilec...@gmail.com>
wrote:

> Hello Kafka Developers , Users ,
>
> We are exploring the SearchMessageByTimestamp feature in Kafka for our
> use case .
>
> Use Case : Kafka will be realtime message bus , users should be able
> to pull Logs by specifying start_date and end_date or  Pull me last five
> minutes data etc ...
>
> I did POC on SearchMessageByTimestamp , here is the code
> https://gist.github.com/senthilec566/16e8e28b32834666fea132afc3a4e2f9 .
> And i observed that Searching Messages is slow ..
>
> Here is small test i did :
> Query :Fetch Logs of Last *5 minutes*:
> Result:
> No of Records fetched : *30*
> Fetch Time *6210* ms
>
> Above test performed in a topic which has 4 partitions. In each partition
> search & query processing happened .. in other words
> consumer.offsetsForTimes()
> consumer.assign(Arrays.asList(partition))
> consumer.seek(this.partition, offsetTimestamp.offset())
> consumer.poll(100)
>
> are the API calls of each partition.. I realized that , this was the
> reason for Kafka taking more time..
>
> What is efficient way of implementing SerachMessageByTimeStamp ?  Is Kafka
> right candidate for our Use Case ?
>
> Pls add your thoughts here ...
>
>
> Cheers,
> Senthil
>


Efficient way of Searching Messages By Timestamp - Kafka

2017-05-27 Thread SenthilKumar K
Hello Kafka Developers , Users ,

We are exploring the SearchMessageByTimestamp feature in Kafka for our
use case .

Use Case : Kafka will be realtime message bus , users should be able to
pull Logs by specifying start_date and end_date or  Pull me last five
minutes data etc ...

I did POC on SearchMessageByTimestamp , here is the code
https://gist.github.com/senthilec566/16e8e28b32834666fea132afc3a4e2f9 . And
i observed that Searching Messages is slow ..

Here is small test i did :
Query :Fetch Logs of Last *5 minutes*:
Result:
No of Records fetched : *30*
Fetch Time *6210* ms

Above test performed in a topic which has 4 partitions. In each partition
search & query processing happened .. in other words
consumer.offsetsForTimes()
consumer.assign(Arrays.asList(partition))
consumer.seek(this.partition, offsetTimestamp.offset())
consumer.poll(100)

are the API calls of each partition.. I realized that , this was the reason
for Kafka taking more time..

What is efficient way of implementing SerachMessageByTimeStamp ?  Is Kafka
right candidate for our Use Case ?

Pls add your thoughts here ...


Cheers,
Senthil


Re: Kafka Read Data from All Partition Using Key or Timestamp

2017-05-25 Thread SenthilKumar K
Thanks a lot Hans..  By using KafkaConsumer API
https://gist.github.com/senthilec566/16e8e28b32834666fea132afc3a4e2f9 i can
query the data using timestamp ..

It worked !


Now another question to achieve Parallelism on reading data ..

Example :  topic : test
  partitions : 4

KafkaConsumer allows to search timestamp based messaged and search it in
each partition , Right now the way i coded is
1) Fetch no of Partitions
2) Use ForkJoinPool
3) Submit Task to ForknJoinPool
4) Combine result

Each task creates its own Consumer and reads the data ... Example total 4
consumers .. the expected search result time is little high.. Same case for
the case if i use Single Consumer since 4 times it has to read and join the
result..

How to implement this efficiently i.e in a Single Request read data from
all partitions etc ??? whichever way gives me good performance it would opt
for it :-) ..

Pls suggest me here !

Cheers,
Senthil


On Thu, May 25, 2017 at 8:30 PM, Hans Jespersen <h...@confluent.io> wrote:

> The timeindex was added in 0.10 so I think you need to use the new
> Consumer API to access this functionality. Specifically you should call
> offsetsForTimes()
>
> https://kafka.apache.org/0102/javadoc/org/apache/kafka/
> clients/consumer/Consumer.html#offsetsForTimes(java.util.Map)
>
> -hans
>
> > On May 25, 2017, at 6:39 AM, SenthilKumar K <senthilec...@gmail.com>
> wrote:
> >
> > I did an experiment on searching messages using timestamps ..
> >
> > Step 1: Used Producer with Create Time ( CT )
> > Step 2 : Verify whether it reflects in Kafka or not
> >  .index  .log
> >  .timeindex
> >These three files in disk and seems to be time_index working .
> >
> > Step 3: Let's look into data
> >offset: 121 position: 149556 *CreateTime*: 1495718896912 isvalid:
> > true payloadsize: 1194 magic: 1 compresscodec: NONE crc: 1053048980
> > keysize: 8
> >
> >  Looks good ..
> > Step 4 :  Check .timeindex file .
> >  timestamp: 1495718846912 offset: 116
> >  timestamp: 1495718886912 offset: 120
> >  timestamp: 1495718926912 offset: 124
> >  timestamp: 1495718966912 offset: 128
> >
> > So all set for Querying data using timestamp ?
> >
> > Kafka version : kafka_2.11-0.10.2.1
> >
> > Here is the code i'm using to search query -->
> > https://gist.github.com/senthilec566/bc8ed1dfcf493f0bb5c473c50854dff9
> >
> > requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(
> queryTime,
> > 1));
> > If i pass my own timestamp , always getting zero result ..
> > *Same question asked here too
> > **https://stackoverflow.com/questions/31917134/how-to-use-
> unix-timestamp-to-get-offset-using-simpleconsumer-api
> > <https://stackoverflow.com/questions/31917134/how-to-use-
> unix-timestamp-to-get-offset-using-simpleconsumer-api>*
> > .
> >
> >
> > Also i could notice below error in index file:
> >
> > *Found timestamp mismatch* in
> > :/home/user/kafka-logs/topic-0/.timeindex
> >
> >  Index timestamp: 0, log timestamp: 1495717686913
> >
> > *Found out of order timestamp* in
> > :/home/user/kafka-logs/topic-0/.timeindex
> >
> >  Index timestamp: 0, Previously indexed timestamp: 1495719406912
> >
> > Not sure what is missing here :-( ... Pls advise me here!
> >
> >
> > Cheers,
> > Senthil
> >
> > On Thu, May 25, 2017 at 3:36 PM, SenthilKumar K <senthilec...@gmail.com>
> > wrote:
> >
> >> Thanks a lot Mayuresh. I will look into SearchMessageByTimestamp feature
> >> in Kafka ..
> >>
> >> Cheers,
> >> Senthil
> >>
> >> On Thu, May 25, 2017 at 1:12 PM, Mayuresh Gharat <
> >> gharatmayures...@gmail.com> wrote:
> >>
> >>> Hi Senthil,
> >>>
> >>> Kafka does allow search message by timestamp after KIP-33 :
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+
> >>> Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-S
> >>> earchmessagebytimestamp
> >>>
> >>> The new consumer does provide you a way to get offsets by timestamp.
> You
> >>> can use these offsets to seek to that offset and consume from there.
> So if
> >>> you want to consume between a range you can get the start and end
> offset
> >>> based on the timestamps, seek to the start offset and consume and
> process
> 

Re: Kafka Read Data from All Partition Using Key or Timestamp

2017-05-25 Thread SenthilKumar K
I did an experiment on searching messages using timestamps ..

Step 1: Used Producer with Create Time ( CT )
Step 2 : Verify whether it reflects in Kafka or not
  .index  .log
  .timeindex
These three files in disk and seems to be time_index working .

Step 3: Let's look into data
offset: 121 position: 149556 *CreateTime*: 1495718896912 isvalid:
true payloadsize: 1194 magic: 1 compresscodec: NONE crc: 1053048980
keysize: 8

  Looks good ..
Step 4 :  Check .timeindex file .
  timestamp: 1495718846912 offset: 116
  timestamp: 1495718886912 offset: 120
  timestamp: 1495718926912 offset: 124
  timestamp: 1495718966912 offset: 128

So all set for Querying data using timestamp ?

Kafka version : kafka_2.11-0.10.2.1

Here is the code i'm using to search query -->
https://gist.github.com/senthilec566/bc8ed1dfcf493f0bb5c473c50854dff9

requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(queryTime,
1));
If i pass my own timestamp , always getting zero result ..
*Same question asked here too
**https://stackoverflow.com/questions/31917134/how-to-use-unix-timestamp-to-get-offset-using-simpleconsumer-api
<https://stackoverflow.com/questions/31917134/how-to-use-unix-timestamp-to-get-offset-using-simpleconsumer-api>*
.


Also i could notice below error in index file:

*Found timestamp mismatch* in
:/home/user/kafka-logs/topic-0/.timeindex

  Index timestamp: 0, log timestamp: 1495717686913

*Found out of order timestamp* in
:/home/user/kafka-logs/topic-0/.timeindex

  Index timestamp: 0, Previously indexed timestamp: 1495719406912

Not sure what is missing here :-( ... Pls advise me here!


Cheers,
Senthil

On Thu, May 25, 2017 at 3:36 PM, SenthilKumar K <senthilec...@gmail.com>
wrote:

> Thanks a lot Mayuresh. I will look into SearchMessageByTimestamp feature
> in Kafka ..
>
> Cheers,
> Senthil
>
> On Thu, May 25, 2017 at 1:12 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> Hi Senthil,
>>
>> Kafka does allow search message by timestamp after KIP-33 :
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+
>> Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-S
>> earchmessagebytimestamp
>>
>> The new consumer does provide you a way to get offsets by timestamp. You
>> can use these offsets to seek to that offset and consume from there. So if
>> you want to consume between a range you can get the start and end offset
>> based on the timestamps, seek to the start offset and consume and process
>> the data till you reach the end offset.
>>
>> But these timestamps are either CreateTime(when the message was created
>> and you will have to specify this when you do the send()) or
>> LogAppendTime(when the message was appended to the log on the kafka broker)
>> : https://kafka.apache.org/0101/javadoc/org/apache/kafka/clien
>> ts/producer/ProducerRecord.html
>>
>> Kafka does not look at the fields in your data (key/value) for giving
>> back you the data. What I meant was it will not look at the timestamp
>> specified by you in the actual data payload.
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Thu, May 25, 2017 at 12:43 PM, SenthilKumar K <senthilec...@gmail.com>
>> wrote:
>>
>>> Hello Dev Team, Pls let me know if any option to read data from Kafka
>>> (all
>>> partition ) using timestamp . Also can we set custom offset value to
>>> messages ?
>>>
>>> Cheers,
>>> Senthil
>>>
>>> On Wed, May 24, 2017 at 7:33 PM, SenthilKumar K <senthilec...@gmail.com>
>>> wrote:
>>>
>>> > Hi All ,  We have been using Kafka for our Use Case which helps in
>>> > delivering real time raw logs.. I have a requirement to fetch data from
>>> > Kafka by using offset ..
>>> >
>>> > DataSet Example :
>>> > {"access_date":"2017-05-24 13:57:45.044","format":"json",
>>> > "start":"1490296463.031"}
>>> > {"access_date":"2017-05-24 13:57:46.044","format":"json",
>>> > "start":"1490296463.031"}
>>> > {"access_date":"2017-05-24 13:57:47.044","format":"json",
>>> > "start":"1490296463.031"}
>>> > {"access_date":"2017-05-24 13:58:02.042","format":"json",
>>> > "start":"1490296463.031"}
>>> >
>>> > Above JSON data will be stored in Kafka..
>>> >
>>> > Key --> acces_date in epoch format
>>> > Value --> whole JSON.
>>> >
>>> > Data Access Pattern:
>>> >   1) Get me last 2 minz data ?
>>> >2) Get me records between 2017-05-24 13:57:42:00 to 2017-05-24
>>> > 13:57:44:00 ?
>>> >
>>> > How to achieve this in Kafka ?
>>> >
>>> > I tried using SimpleConsumer , but it expects partition and not sure
>>> > SimpleConsumer would match our requirement...
>>> >
>>> > Appreciate you help !
>>> >
>>> > Cheers,
>>> > Senthil
>>> >
>>>
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>


Re: Kafka Read Data from All Partition Using Key or Timestamp

2017-05-25 Thread SenthilKumar K
Thanks a lot Mayuresh. I will look into SearchMessageByTimestamp feature in
Kafka ..

Cheers,
Senthil

On Thu, May 25, 2017 at 1:12 PM, Mayuresh Gharat <gharatmayures...@gmail.com
> wrote:

> Hi Senthil,
>
> Kafka does allow search message by timestamp after KIP-33 :
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 33+-+Add+a+time+based+log+index#KIP-33-Addatimebasedlogindex-
> Searchmessagebytimestamp
>
> The new consumer does provide you a way to get offsets by timestamp. You
> can use these offsets to seek to that offset and consume from there. So if
> you want to consume between a range you can get the start and end offset
> based on the timestamps, seek to the start offset and consume and process
> the data till you reach the end offset.
>
> But these timestamps are either CreateTime(when the message was created
> and you will have to specify this when you do the send()) or
> LogAppendTime(when the message was appended to the log on the kafka broker)
> : https://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/producer/
> ProducerRecord.html
>
> Kafka does not look at the fields in your data (key/value) for giving back
> you the data. What I meant was it will not look at the timestamp specified
> by you in the actual data payload.
>
> Thanks,
>
> Mayuresh
>
> On Thu, May 25, 2017 at 12:43 PM, SenthilKumar K <senthilec...@gmail.com>
> wrote:
>
>> Hello Dev Team, Pls let me know if any option to read data from Kafka (all
>> partition ) using timestamp . Also can we set custom offset value to
>> messages ?
>>
>> Cheers,
>> Senthil
>>
>> On Wed, May 24, 2017 at 7:33 PM, SenthilKumar K <senthilec...@gmail.com>
>> wrote:
>>
>> > Hi All ,  We have been using Kafka for our Use Case which helps in
>> > delivering real time raw logs.. I have a requirement to fetch data from
>> > Kafka by using offset ..
>> >
>> > DataSet Example :
>> > {"access_date":"2017-05-24 13:57:45.044","format":"json",
>> > "start":"1490296463.031"}
>> > {"access_date":"2017-05-24 13:57:46.044","format":"json",
>> > "start":"1490296463.031"}
>> > {"access_date":"2017-05-24 13:57:47.044","format":"json",
>> > "start":"1490296463.031"}
>> > {"access_date":"2017-05-24 13:58:02.042","format":"json",
>> > "start":"1490296463.031"}
>> >
>> > Above JSON data will be stored in Kafka..
>> >
>> > Key --> acces_date in epoch format
>> > Value --> whole JSON.
>> >
>> > Data Access Pattern:
>> >   1) Get me last 2 minz data ?
>> >2) Get me records between 2017-05-24 13:57:42:00 to 2017-05-24
>> > 13:57:44:00 ?
>> >
>> > How to achieve this in Kafka ?
>> >
>> > I tried using SimpleConsumer , but it expects partition and not sure
>> > SimpleConsumer would match our requirement...
>> >
>> > Appreciate you help !
>> >
>> > Cheers,
>> > Senthil
>> >
>>
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


Re: Kafka Read Data from All Partition Using Key or Timestamp

2017-05-25 Thread SenthilKumar K
Hello Dev Team, Pls let me know if any option to read data from Kafka (all
partition ) using timestamp . Also can we set custom offset value to
messages ?

Cheers,
Senthil

On Wed, May 24, 2017 at 7:33 PM, SenthilKumar K <senthilec...@gmail.com>
wrote:

> Hi All ,  We have been using Kafka for our Use Case which helps in
> delivering real time raw logs.. I have a requirement to fetch data from
> Kafka by using offset ..
>
> DataSet Example :
> {"access_date":"2017-05-24 13:57:45.044","format":"json",
> "start":"1490296463.031"}
> {"access_date":"2017-05-24 13:57:46.044","format":"json",
> "start":"1490296463.031"}
> {"access_date":"2017-05-24 13:57:47.044","format":"json",
> "start":"1490296463.031"}
> {"access_date":"2017-05-24 13:58:02.042","format":"json",
> "start":"1490296463.031"}
>
> Above JSON data will be stored in Kafka..
>
> Key --> acces_date in epoch format
> Value --> whole JSON.
>
> Data Access Pattern:
>   1) Get me last 2 minz data ?
>2) Get me records between 2017-05-24 13:57:42:00 to 2017-05-24
> 13:57:44:00 ?
>
> How to achieve this in Kafka ?
>
> I tried using SimpleConsumer , but it expects partition and not sure
> SimpleConsumer would match our requirement...
>
> Appreciate you help !
>
> Cheers,
> Senthil
>


Kafka Read Data from All Partition Using Key or Timestamp

2017-05-24 Thread SenthilKumar K
Hi All ,  We have been using Kafka for our Use Case which helps in
delivering real time raw logs.. I have a requirement to fetch data from
Kafka by using offset ..

DataSet Example :
{"access_date":"2017-05-24
13:57:45.044","format":"json","start":"1490296463.031"}
{"access_date":"2017-05-24
13:57:46.044","format":"json","start":"1490296463.031"}
{"access_date":"2017-05-24
13:57:47.044","format":"json","start":"1490296463.031"}
{"access_date":"2017-05-24
13:58:02.042","format":"json","start":"1490296463.031"}

Above JSON data will be stored in Kafka..

Key --> acces_date in epoch format
Value --> whole JSON.

Data Access Pattern:
  1) Get me last 2 minz data ?
   2) Get me records between 2017-05-24 13:57:42:00 to 2017-05-24
13:57:44:00 ?

How to achieve this in Kafka ?

I tried using SimpleConsumer , but it expects partition and not sure
SimpleConsumer would match our requirement...

Appreciate you help !

Cheers,
Senthil