kafka connector for mongodb as a source

2017-03-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I am creating kafka connector for mongodb as a source .My connector is
starting and connecting with kafka but it is not committing any offset.

This is output after starting connector.

[root@localhost kafka_2.11-0.10.1.1]# bin/connect-standalone.sh
config/connect-standalone.properties config/mongodb.properties
[2017-03-27 18:32:58,019] INFO StandaloneConfig values:
rest.advertised.host.name = null
task.shutdown.graceful.timeout.ms = 5000
rest.host.name = null
rest.advertised.port = null
bootstrap.servers = [localhost:9092]
offset.flush.timeout.ms = 5000
offset.flush.interval.ms = 1
rest.port = 8083
internal.key.converter = class
org.apache.kafka.connect.json.JsonConverter
access.control.allow.methods =
access.control.allow.origin =
offset.storage.file.filename = /tmp/connect.offsets
internal.value.converter = class
org.apache.kafka.connect.json.JsonConverter
value.converter = class org.apache.kafka.connect.json.JsonConverter
key.converter = class org.apache.kafka.connect.json.JsonConverter
 (org.apache.kafka.connect.runtime.standalone.StandaloneConfig:178)
[2017-03-27 18:32:58,162] INFO Logging initialized @609ms
(org.eclipse.jetty.util.log:186)
[2017-03-27 18:32:58,392] INFO Kafka Connect starting
(org.apache.kafka.connect.runtime.Connect:52)
[2017-03-27 18:32:58,392] INFO Herder starting
(org.apache.kafka.connect.runtime.standalone.StandaloneHerder:70)
[2017-03-27 18:32:58,393] INFO Worker starting
(org.apache.kafka.connect.runtime.Worker:113)
[2017-03-27 18:32:58,393] INFO Starting FileOffsetBackingStore with file
/tmp/connect.offsets
(org.apache.kafka.connect.storage.FileOffsetBackingStore:60)
[2017-03-27 18:32:58,398] INFO Worker started
(org.apache.kafka.connect.runtime.Worker:118)
[2017-03-27 18:32:58,398] INFO Herder started
(org.apache.kafka.connect.runtime.standalone.StandaloneHerder:72)
[2017-03-27 18:32:58,398] INFO Starting REST server
(org.apache.kafka.connect.runtime.rest.RestServer:98)
[2017-03-27 18:32:58,493] INFO jetty-9.2.15.v20160210
(org.eclipse.jetty.server.Server:327)
[2017-03-27 18:32:59,621] INFO HV01: Hibernate Validator 5.1.2.Final
(org.hibernate.validator.internal.util.Version:27)
Mar 27, 2017 6:32:59 PM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The
(sub)resource method listConnectors in
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains
empty path annotation.
WARNING: The (sub)resource method createConnector in
org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains
empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in
org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource
contains empty path annotation.
WARNING: The (sub)resource method serverInfo in
org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty
path annotation.

[2017-03-27 18:33:00,015] INFO Started
o.e.j.s.ServletContextHandler@44e3760b{/,null,AVAILABLE}
(org.eclipse.jetty.server.handler.ContextHandler:744)
[2017-03-27 18:33:00,042] INFO Started ServerConnector@7f58ad44{HTTP/1.1}{
0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:266)
[2017-03-27 18:33:00,043] INFO Started @2492ms
(org.eclipse.jetty.server.Server:379)
[2017-03-27 18:33:00,043] INFO REST server listening at
http://127.0.0.1:8083/, advertising URL http://127.0.0.1:8083/
(org.apache.kafka.connect.runtime.rest.RestServer:150)
[2017-03-27 18:33:00,043] INFO Kafka Connect started
(org.apache.kafka.connect.runtime.Connect:58)
[2017-03-27 18:33:00,048] INFO ConnectorConfig values:
connector.class =
org.apache.kafka.connect.mongodb.MongodbSourceConnector
tasks.max = 1
name = mongodb
value.converter = null
key.converter = null
 (org.apache.kafka.connect.runtime.ConnectorConfig:178)
[2017-03-27 18:33:00,048] INFO Creating connector mongodb of type
org.apache.kafka.connect.mongodb.MongodbSourceConnector
(org.apache.kafka.connect.runtime.Worker:159)
[2017-03-27 18:33:00,051] INFO Instantiated connector mongodb with version
0.10.0.1 of type class
org.apache.kafka.connect.mongodb.MongodbSourceConnector
(org.apache.kafka.connect.runtime.Worker:162)
[2017-03-27 18:33:00,053] INFO Finished creating connector mongodb
(org.apache.kafka.connect.runtime.Worker:173)
[2017-03-27 18:33:00,053] INFO SourceConnectorConfig values:
connector.class =
org.apache.kafka.connect.mongodb.MongodbSourceConnector
tasks.max = 1
name = mongodb
value.converter = null
key.converter = null
 (org.apache.kafka.connect.runtime.SourceConnectorConfig:178)
[2017-03-27 18:33:00,056] INFO Creating task mongodb-0
(org.apache.kafka.connect.runtime.Worker:252)
[2017-03-27 18:33:00,056] INFO ConnectorConfig values:
connector.class =
org.apache.kafka.connect.mongodb.MongodbSourceConnector
task

Creating-kafka-connect

2017-03-03 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I want to create my own kafka connector which will connect multiple data
source.
Could anyone please help me in doing so...


kafka-connect-salesforce

2017-03-01 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I am trying to use kafka-connect-salesforce . topic is created but there is
no data in it. should i have to start producer also to send data ?

Thank you.


kafka-connect-salesforce

2017-02-28 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I want to use kafka-connect-salesforce but i am not able to use it .
can any one provide steps to how to use it.

Thank you.


Re: Kafkaconnect

2017-02-28 Thread VIVEK KUMAR MISHRA 13BIT0066
Actually my data sources are salesforce and mailchimp. i have developed an
api that will fetch data from there but now i want that if any changes are
happening in data of salesforce and mailchimp sources then that changes
should be reflected in my topic data.

On Tue, Feb 28, 2017 at 5:53 PM, Philippe Derome  wrote:

> watch some videos from Ewan Cheslack-Postava.
>
> https://www.google.ca/webhp?sourceid=chrome-instant&rlz=
> 1C5CHFA_enCA523CA566&ion=1&espv=2&ie=UTF-8#q=youtube+
> ewan+Cheslack-Postava&*
>
> On Tue, Feb 28, 2017 at 6:55 AM, VIVEK KUMAR MISHRA 13BIT0066 <
> vivekkumar.mishra2...@vit.ac.in> wrote:
>
> > Hi All,
> >
> > What is the use of kafka-connect.
> >
>


Kafkaconnect

2017-02-28 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

What is the use of kafka-connect.


Re: Kafka Connect

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Actually my data sources are salesforce and mailchimp. i have developed an
api that will fetch data from there but now i want that if any changes are
happening in data of salesforce and mailchimp sources then that changes
should be reflected in my topic data.

On Mon, Feb 27, 2017 at 8:54 PM, Tauzell, Dave  wrote:

> Also, see this article on streaming changes from MySQL to kafka:
> https://wecode.wepay.com/posts/streaming-databases-in-
> realtime-with-mysql-debezium-kafka
>
> -Original Message-
> From: Tauzell, Dave
> Sent: Monday, February 27, 2017 9:07 AM
> To: users@kafka.apache.org
> Subject: RE: Kafka Connect
>
> Are you specifically talking about relational databases?Kafka Connect
> has  a JDBC source (http://docs.confluent.io/3.1.
> 1/connect/connect-jdbc/docs/source_connector.html) which can push data
> changes to kafka.  It can only run sql queries, though, so out of the box
> it will just get you updated and new rows.   If you want to get a list of
> changes you'll either need to build that into your schema or use something
> else that does CDC (Change Data Capture) on your source.
>
> -Dave
>
> -Original Message-
> From: VIVEK KUMAR MISHRA 13BIT0066 [mailto:vivekkumar.mishra2...@vit.ac.in
> ]
> Sent: Monday, February 27, 2017 7:35 AM
> To: users@kafka.apache.org
> Subject: Kafka Connect
>
> How to use kafka connect using python to get information  about
> update,delete and insertion  of data at various data sources?
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


Updation of data in kafka topic based on changes in data sources.

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

Is it possible to update kafka topic data based on changes in data sources
using python?



Kafka Connect

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
How to use kafka connect using python to get information  about
update,delete and insertion  of data at various data sources?


partition creation using python api

2017-02-26 Thread VIVEK KUMAR MISHRA 13BIT0066
There is one partition class in pykafka package in partition.py .Could you
please tell me how to use that class.

Thank you.


Re: Creating topic partitions automatically using python

2017-02-26 Thread VIVEK KUMAR MISHRA 13BIT0066
There is one partition class in pykafka package in partition.py .Could you
please tell me how to use
that class.

On Fri, Feb 24, 2017 at 12:36 AM, Jeff Widman  wrote:

> This is probably a better fit for the pykafka issue tracker.
>
> AFAIK, there's no public kafka API for creating partitions right now, so
> pykafka is likely hacking around this by calling out to internal Java APIs,
> so it will be brittle
>
> On Thu, Feb 23, 2017 at 5:13 AM, VIVEK KUMAR MISHRA 13BIT0066 <
> vivekkumar.mishra2...@vit.ac.in> wrote:
>
> > Hi All,
> >
> > I am trying to create partitions automatically using python script
> instead
> > of doing the same through terminal.
> >
> > I am using partition class of pykafka package to do so and
> > kafka_2.11-0.9.0.0,but I am not getting the desired output, only one
> > partition is getting created.
> > Please do suggest me any solution through which I can do this using
> > python.
> >
> > Thank You!
> >
> > Regards,
> > Vivek Mishra
> >
>


Re: creating partitions programmatically

2017-02-26 Thread VIVEK KUMAR MISHRA 13BIT0066
There is one partition class in pykafka package in partition.py .Could you
please tell me how to use
that class.

On Sun, Feb 26, 2017 at 11:44 PM, Hans Jespersen  wrote:

> The Confluent python client does not support this today. If you find a
> python api that does create topic partitions don't expect it to work with a
> secure Kafka cluster and expect it to have to be completely re-written in
> the near future when the new Kafka Admin API is implemented under the
> covers.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670
>  */
>
> On Sun, Feb 26, 2017 at 5:44 PM, VIVEK KUMAR MISHRA 13BIT0066 <
> vivekkumar.mishra2...@vit.ac.in> wrote:
>
> > My question is can we create partitions in topic using any pythonic API?
> >
> > On Sun, Feb 26, 2017 at 8:24 PM, Hans Jespersen 
> wrote:
> >
> > > The current Java AdminUtils are older code that talks to zookeeper and
> > > does not support a secured Kafka cluster.
> > > There will be a new Java Admin API in the future that talks to the
> Kafka
> > > brokers directly using the admin extensions to the Kafka protocol which
> > are
> > > already in the 0.10.2 brokers.
> > > I would expect that once there is a reference implementation of the new
> > > Java Admin client that other client libraries would implement similar
> > > interfaces.
> > > In the meantime I’m afraid we just need to use the existing Java admin
> > > utils or the CLI tools for just a little while longer.
> > >
> > > -hans
> > >
> > >
> > >
> > >
> > > > On Feb 26, 2017, at 2:42 PM, VIVEK KUMAR MISHRA 13BIT0066 <
> > > vivekkumar.mishra2...@vit.ac.in> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > In kafka java driver, we have kafka.admin.AdminUtils class which has
> > > > methods like createTopic and addPartitions(). Do we have these type
> of
> > > > class and methods in any of kafka python drivers.If there is please
> do
> > > > suggest.
> > > >
> > > > Thank you .
> > >
> > >
> >
>


Re: creating partitions programmatically

2017-02-26 Thread VIVEK KUMAR MISHRA 13BIT0066
My question is can we create partitions in topic using any pythonic API?

On Sun, Feb 26, 2017 at 8:24 PM, Hans Jespersen  wrote:

> The current Java AdminUtils are older code that talks to zookeeper and
> does not support a secured Kafka cluster.
> There will be a new Java Admin API in the future that talks to the Kafka
> brokers directly using the admin extensions to the Kafka protocol which are
> already in the 0.10.2 brokers.
> I would expect that once there is a reference implementation of the new
> Java Admin client that other client libraries would implement similar
> interfaces.
> In the meantime I’m afraid we just need to use the existing Java admin
> utils or the CLI tools for just a little while longer.
>
> -hans
>
>
>
>
> > On Feb 26, 2017, at 2:42 PM, VIVEK KUMAR MISHRA 13BIT0066 <
> vivekkumar.mishra2...@vit.ac.in> wrote:
> >
> > Hi All,
> >
> > In kafka java driver, we have kafka.admin.AdminUtils class which has
> > methods like createTopic and addPartitions(). Do we have these type of
> > class and methods in any of kafka python drivers.If there is please do
> > suggest.
> >
> > Thank you .
>
>


creating partitions programmatically

2017-02-26 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

In kafka java driver, we have kafka.admin.AdminUtils class which has
methods like createTopic and addPartitions(). Do we have these type of
class and methods in any of kafka python drivers.If there is please do
suggest.

Thank you .


Creating topic partitions using python

2017-02-24 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I want to create kafka partitions for topic automatically using pykafka or
kafka-python.
Please do suggest me any solution through which I can do this using
python.

Thank You!

Regards,
Vivek Mishra


Creating topic partitions automatically using python

2017-02-23 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All,

I am trying to create partitions automatically using python script instead
of doing the same through terminal.

I am using partition class of pykafka package to do so and
kafka_2.11-0.9.0.0,but I am not getting the desired output, only one
partition is getting created.
Please do suggest me any solution through which I can do this using
python.

Thank You!

Regards,
Vivek Mishra


sending mailchimp data to kafka cluster using producer api

2017-02-11 Thread VIVEK KUMAR MISHRA 13BIT0066
Hello sir,

I want to send mailchimp data to kafka broker(Topic) using producer api.
counld you please help me?


about producer and consumer api

2017-02-10 Thread VIVEK KUMAR MISHRA 13BIT0066
Hello sir,

I am learning Kafka. I know how to run kafka producers and consumers using
terminal but know i want to run by my terminal i have written my
sampleProducer and consumer code in java but i am not able to run . would
you please help me out .

Kindly look into that matter.

thanks and regards.


Zookeeper fails to see all the brokers at once

2016-10-28 Thread vivek thakre
Hello All,

I have a Kafka Cluster deployed on AWS.
I am noticing this issue randomly when none of the brokers are registered
with zookeeper ( I have set up a monitor on this by using zk-shell util)

During this issue, the cluster continues to operate i.e events can be
produced and consumed. But the topics can not be created or listed

When I query for brokers using Zk Shell (ls /brokers/ids) , its all empty.

I have noticed this issue in both following setup:
1. 3 Zookeepers and 12 Brokers are deployed on Single AWS Zone
2. Multizone deployment, where 3 Zookeepers one on each AWS zone and 12
brokers, 4 on each AWS Zone.

Not sure if this is related to AWS or Kafka.
I looked for logs , but do not see any errors (currently log level is set
to INFO)

Kafka version is 0.9.0.1. (I have seen this issue happened with 0.8.2.1 too)

Has any one encountered this issue?
I am looking for some pointers as to what could be wrong in my setup and
how should I go about debugging

Summary of broker's server.properties

log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.flush.scheduler.interval.ms=2000
log.flush.interval.ms=1
log.flush.interval.messages=2
log.cleanup.interval.mins=30
log.segment.bytes=5
log.retention.check.interval.ms=18
message.max.bytes=900
num.network.threads=3
num.io.threads=8
num.replica.fetchers=4
num.partitions=2
num.recovery.threads.per.data.dir=1
queued.max.requests=500
replica.fetch.max.bytes=112
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
zookeeper.connection.timeout.ms=6
zookeeper.session.timeout.ms=1

Thanks,
Vivek


Re: KStream-to-KStream Join Example

2016-07-13 Thread vivek thakre
Yes, there are same number of partitions to both the topic, also same
partition key i.e userId
If I just join the streams without applying the map functions (in this
case userClickStream
and userEventSrtream) , it works.

Thanks,
Vivek


On Wed, Jul 13, 2016 at 4:53 PM, Philippe Derome  wrote:

> Did you specify same number of partitions for the two input topics you are
> joining? I think that this is usually the first thing people ask to verify
> with errors similar to yours.
>
> If you are experimenting with learning some concepts, it is simpler to
> always use one partition for your topics.
> On 13 Jul 2016 7:40 p.m., "vivek thakre"  wrote:
>
> > Hello,
> >
> > I want to join 2 Topics (KStreams)
> >
> >
> > Stream 1
> > Topic :  userIdClicks
> > Key : userId
> > Value : JSON String with event details
> >
> > Stream 2
> > Topic :  userIdChannel
> > Key : userId
> > Value : JSON String  with event details and has channel value
> >
> > I could not find any examples with KStream-to-KStream Join.
> >
> > Here is my code
> >
> > //build stream userIdClicks
> > > KStream userClickStream = builder.stream(stringSerde,
> > stringSerde,
> > > "userClicks");
> > >
> >
> >
> > > //create stream -> < userId, 1 (count) >
> > > KStream *userClickCountStream* = userClickStream.filter((
> > > userId,record)-> userId != null) .map((userId,record) -> new
> KeyValue<>(
> > > userId,1l));
> > >
> >
> >
> > > //build stream userChannelStream
> > > KStream userEventStream = builder.stream(stringSerde,
> > > stringSerde, "userEvents");
> > >
> >
> >
> > > //create stream  : extract channel value from json
> > string
> > > KStream *userChannelStream* =  userEventStream
> > > .filter((userId,record)-> userId != null)
> > > .map((userId,record) -> new KeyValue<>(userId
> > > ,JsonPath.read(record, "$.event.page.channel").toString()));
> > >
> >
> >
> > > //join *userClickCountStream* with
> > > *userChannelStream*KTable clicksPerChannel =
> > > userClickCountStream
> > > .join(userChannelStream, new ValueJoiner > > ChannelWithClicks>() {
> > >  @Override
> > >  public ChannelWithClicks apply(Long clicks, String
> channel)
> > {
> > > return new ChannelWithClicks(channel == null ?
> "UNKNOWN"
> > > : channel, clicks);
> > >  }
> > >  },
> > JoinWindows.of("ClicksPerChannelwindowed").after(3).before(3))
> > > //30 secs before and after
> > > .map((user, channelWithClicks) -> new
> > KeyValue<>(channelWithClicks
> > > .getChannel(), channelWithClicks.getClicks()))
> > > .reduceByKey(
> > > (firstClicks, secondClicks) -> firstClicks +
> > > secondClicks,
> > >  stringSerde, longSerde,
> "ClicksPerChannelUnwindowed"
> > > );
> >
> > When I run this topology, I get an exception
> >
> > Invalid topology building: KSTREAM-MAP-03 and
> > KSTREAM-MAP-06 are not joinable
> >
> > I looking for a way to join 2 KStreams.
> >
> > Thanks,
> >
> > Vivek
> >
>


KStream-to-KStream Join Example

2016-07-13 Thread vivek thakre
Hello,

I want to join 2 Topics (KStreams)


Stream 1
Topic :  userIdClicks
Key : userId
Value : JSON String with event details

Stream 2
Topic :  userIdChannel
Key : userId
Value : JSON String  with event details and has channel value

I could not find any examples with KStream-to-KStream Join.

Here is my code

//build stream userIdClicks
> KStream userClickStream = builder.stream(stringSerde, 
> stringSerde,
> "userClicks");
>


> //create stream -> < userId, 1 (count) >
> KStream *userClickCountStream* = userClickStream.filter((
> userId,record)-> userId != null) .map((userId,record) -> new KeyValue<>(
> userId,1l));
>


> //build stream userChannelStream
> KStream userEventStream = builder.stream(stringSerde,
> stringSerde, "userEvents");
>


> //create stream  : extract channel value from json string
> KStream *userChannelStream* =  userEventStream
> .filter((userId,record)-> userId != null)
> .map((userId,record) -> new KeyValue<>(userId
> ,JsonPath.read(record, "$.event.page.channel").toString()));
>


> //join *userClickCountStream* with
> *userChannelStream*KTable clicksPerChannel =
> userClickCountStream
> .join(userChannelStream, new ValueJoiner ChannelWithClicks>() {
>  @Override
>  public ChannelWithClicks apply(Long clicks, String channel) {
> return new ChannelWithClicks(channel == null ? "UNKNOWN"
> : channel, clicks);
>  }
>  }, 
> JoinWindows.of("ClicksPerChannelwindowed").after(3).before(3))
> //30 secs before and after
> .map((user, channelWithClicks) -> new KeyValue<>(channelWithClicks
> .getChannel(), channelWithClicks.getClicks()))
> .reduceByKey(
> (firstClicks, secondClicks) -> firstClicks +
> secondClicks,
>  stringSerde, longSerde, "ClicksPerChannelUnwindowed"
> );

When I run this topology, I get an exception

Invalid topology building: KSTREAM-MAP-03 and
KSTREAM-MAP-06 are not joinable

I looking for a way to join 2 KStreams.

Thanks,

Vivek


Re: Kafka Streams : Old Producer

2016-07-11 Thread Vivek
Thanks a lot Micheal.
I used WallClockTimeStampExtractor for now.

Thanks,
Vivek

> On Jul 8, 2016, at 1:25 AM, Michael Noll  wrote:
> 
> Vivek,
> 
> in this case you should manually embed a timestamp within the payload of
> the produced messages (e.g. as a Long field in an Avro-encoded message
> value).  This would need to be done by the producer.
> 
> Then, in Kafka Streams, you'd need to implement a custom
> TimestampExtractor that can retrieve this timestamp from the message
> payload. And you need to configure your StreamsConfig to use that custom
> timestamp.
> 
> Hope this helps,
> Michael
> 
> 
> 
>> On Thursday, July 7, 2016, vivek thakre  wrote:
>> 
>> Thats right Ismael, I am looking for work arounds either on 0.9.0.1
>> Producer side or on the Kafka Streams side so that I can process messages
>> produced by 0.9.0.1 producer using Kafka Streams Library.
>> 
>> Thanks,
>> Vivek
>> 
>> On Thu, Jul 7, 2016 at 9:05 AM, Ismael Juma > > wrote:
>> 
>>> Hi,
>>> 
>>> Matthias, I think Vivek's question is not whether Kafka Streams can be
>> used
>>> with a Kafka 0.9 broker (which it cannot). The question is whether Kafka
>>> Streams can process messages produced with a 0.9.0.1 producer into a
>>> 0.10.0.0 broker. Is that right? If so, would a custom TimestampExtractor
>>> work?
>>> 
>>> Ismael
>>> 
>>> On Thu, Jul 7, 2016 at 12:29 PM, Matthias J. Sax > >
>>> wrote:
>>> 
>>>> Hi Vivek,
>>>> 
>>>> Kafka Streams works only with Kafka 0.10 (but not with 0.9).
>>>> 
>>>> I am not aware of any work around to allow for 0.9 usage.
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>>> On 07/07/2016 05:37 AM, vivek thakre wrote:
>>>>> Can kafka streams library work with the messages produced by 0.9.0.1
>>>>> producer?
>>>>> I guess not since the old producer would not add timestamp. ( I am
>>>> getting
>>>>> invalid timestamp exception)
>>>>> 
>>>>> As I cannot change our producer application setup, I have to use
>>> 0.9.0.1
>>>>> producer.
>>>>> Is there a workaround that I can try to use Kafka Streams?
>>>>> 
>>>>> Thanks,
>>>>> Vivek
> 
> 
> -- 
> Best regards,
> Michael Noll
> 
> 
> 
> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
> Apache Kafka and Confluent Platform: www.confluent.io/download
> <http://www.confluent.io/download>*


Re: Kafka Streams : Old Producer

2016-07-07 Thread vivek thakre
Thats right Ismael, I am looking for work arounds either on 0.9.0.1
Producer side or on the Kafka Streams side so that I can process messages
produced by 0.9.0.1 producer using Kafka Streams Library.

Thanks,
Vivek

On Thu, Jul 7, 2016 at 9:05 AM, Ismael Juma  wrote:

> Hi,
>
> Matthias, I think Vivek's question is not whether Kafka Streams can be used
> with a Kafka 0.9 broker (which it cannot). The question is whether Kafka
> Streams can process messages produced with a 0.9.0.1 producer into a
> 0.10.0.0 broker. Is that right? If so, would a custom TimestampExtractor
> work?
>
> Ismael
>
> On Thu, Jul 7, 2016 at 12:29 PM, Matthias J. Sax 
> wrote:
>
> > Hi Vivek,
> >
> > Kafka Streams works only with Kafka 0.10 (but not with 0.9).
> >
> > I am not aware of any work around to allow for 0.9 usage.
> >
> >
> > -Matthias
> >
> > On 07/07/2016 05:37 AM, vivek thakre wrote:
> > > Can kafka streams library work with the messages produced by 0.9.0.1
> > > producer?
> > > I guess not since the old producer would not add timestamp. ( I am
> > getting
> > > invalid timestamp exception)
> > >
> > > As I cannot change our producer application setup, I have to use
> 0.9.0.1
> > > producer.
> > > Is there a workaround that I can try to use Kafka Streams?
> > >
> > > Thanks,
> > > Vivek
> > >
> >
> >
>


Kafka Streams : Old Producer

2016-07-06 Thread vivek thakre
Can kafka streams library work with the messages produced by 0.9.0.1
producer?
I guess not since the old producer would not add timestamp. ( I am getting
invalid timestamp exception)

As I cannot change our producer application setup, I have to use 0.9.0.1
producer.
Is there a workaround that I can try to use Kafka Streams?

Thanks,
Vivek


Re: Kafka Producer connection issue on AWS private EC2 instance

2016-06-29 Thread vivek thakre
There are no errors in the broker logs.
The Kafka Cluster in itself is functional. I have other producers and
consumers working which are in public subnet (same as kafka cluster).



On Wed, Jun 29, 2016 at 7:15 PM, Kamesh Kompella  wrote:

> For what it's worth, I used to get similar messages with docker instances
> on centos.
>
> The way I debugged the problem was by looking at Kafka logs. In that case,
> it turned out that brokers could not reach zk and this info was in the
> logs. The logs will list the parameters the broker used at start up and any
> errors.
>
> In my case, the problem was the firewall that blocked access to zk from
> Kafka.
>
> > On Jun 29, 2016, at 6:56 PM, vivek thakre 
> wrote:
> >
> > I have Kafka Cluster setup on AWS Public Subnet with all brokers having
> > elastic IPs
> > My producers are on private subnet and not able to produce to the kafka
> on
> > public subnet.
> > Both subnets are in same VPC
> >
> > I added the private ip/cidr of producer ec2 instance to Public Kafka's
> > security group.
> > (I can telnet from private ec2 instance to brokers private ip on 9092
> port)
> >
> > From the ec2 instance on private subnet, I can list the topics using ZK's
> > private ip
> >
> > [ec2-user@ip-x-x-x-x kafka_2.10-0.9.0.1]$ bin/kafka-topics.sh
> --zookeeper
> > :2181 --list
> > test
> >
> > When I try to produce from private ec2 instance using broker's private
> IP,
> > I get following error
> >
> > [ec2-user@ip-x-x-x-x kafka_2.10-0.9.0.1]$ bin/kafka-console-producer.sh
> > --broker-list :9092 --topic test
> >
> > [2016-06-29 18:47:38,328] ERROR Error when sending message to topic test
> > with key: null, value: 3 bytes with error: Batch Expired
> > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> >
> > When I try to produce from private ec2 instance using broker's public
> IP, I
> > get following error.
> >
> > [2016-06-29 18:53:15,918] ERROR Error when sending message to topic test
> > with key: null, value: 3 bytes with error: Failed to update metadata
> after
> > 6 ms.
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> >
> > Few settings from server.properties
> > # The id of the broker. This must be set to a unique integer for each
> > broker.
> > broker.id=0
> >
> > # Socket Server Settings
> > #
> >
> > listeners=PLAINTEXT://:9092
> >
> > # The port the socket server listens on
> > #port=9092
> >
> > # Hostname the broker will bind to. If not set, the server will bind to
> all
> > interfaces
> > host.name=
> >
> > # Hostname the broker will advertise to producers and consumers. If not
> > set, it uses the
> > # value for "host.name" if configured.  Otherwise, it will use the value
> > returned from
> > # java.net.InetAddress.getCanonicalHostName().
> > advertised.host.name=
> >
> > Please let me know if I am doing something wrong.
> >
> > Thank you
> >
> > Vivek
>


Kafka Producer connection issue on AWS private EC2 instance

2016-06-29 Thread vivek thakre
I have Kafka Cluster setup on AWS Public Subnet with all brokers having
elastic IPs
My producers are on private subnet and not able to produce to the kafka on
public subnet.
Both subnets are in same VPC

I added the private ip/cidr of producer ec2 instance to Public Kafka's
security group.
(I can telnet from private ec2 instance to brokers private ip on 9092 port)

>From the ec2 instance on private subnet, I can list the topics using ZK's
private ip

[ec2-user@ip-x-x-x-x kafka_2.10-0.9.0.1]$ bin/kafka-topics.sh --zookeeper
:2181 --list
test

When I try to produce from private ec2 instance using broker's private IP,
I get following error

[ec2-user@ip-x-x-x-x kafka_2.10-0.9.0.1]$ bin/kafka-console-producer.sh
--broker-list :9092 --topic test

[2016-06-29 18:47:38,328] ERROR Error when sending message to topic test
with key: null, value: 3 bytes with error: Batch Expired
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)

When I try to produce from private ec2 instance using broker's public IP, I
get following error.

[2016-06-29 18:53:15,918] ERROR Error when sending message to topic test
with key: null, value: 3 bytes with error: Failed to update metadata after
6 ms. (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)

Few settings from server.properties
# The id of the broker. This must be set to a unique integer for each
broker.
broker.id=0

# Socket Server Settings
#

listeners=PLAINTEXT://:9092

# The port the socket server listens on
#port=9092

# Hostname the broker will bind to. If not set, the server will bind to all
interfaces
host.name=

# Hostname the broker will advertise to producers and consumers. If not
set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value
returned from
# java.net.InetAddress.getCanonicalHostName().
advertised.host.name=

Please let me know if I am doing something wrong.

Thank you

Vivek


Kafka Consumer - Java

2016-02-23 Thread vivek shankar
Hello All,

Can you please help with the below :

I was reading up on Kafka 0.9 API version and came across the below :

The following is a draft design that uses a high-available consumer
coordinator at the broker side to handle consumer rebalance. By migrating
the rebalance logic from the consumer to the coordinator we can resolve the
consumer split brain problem and help thinner the consumer client.

Please can you help with the below questions  :

I have a single topic with single consumer group id. There are 20
partitions. The consumer client is running on a single node which has 20
threads. We are using zookeeper to fetch messages from Kafka.

1) When using Kafka 0.8.2 API , how can re balancing be done? Can the same
consumer project be deployed into multiple nodes and will zookeeper
automatically do the rebalancing?

2) What are the changes that we need to take care of when upgrading to 0.9
API version? Should underlying Kafka installation be changed  by itself or
is it suffice to just use 0.9 consumer api?

3) Can zookeeper be completely eliminated when upgrading to 0.9 version?

Thank You!

Regards


Re: future of Camus?

2015-10-22 Thread vivek thakre
We are using Apache Flume as a router to consume data from Kafka and push
to HDFS.
With Flume 1.6, Kafka Channel, Source and Sink are available out of the box.

Here is the blog post from Cloudera
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

Thanks,

Vivek Thakre



On Thu, Oct 22, 2015 at 2:29 PM, Hawin Jiang  wrote:

> Very useful information for us.
> Thanks Guozhang.
> On Oct 22, 2015 2:02 PM, "Guozhang Wang"  wrote:
>
> > Hi Adrian,
> >
> > Another alternative approach is to use Kafka's own Copycat framework for
> > data ingressing / egressing. It will be released in our 0.9.0 version
> > expected in Nov.
> >
> > Under Copycat users can write different "connector" instantiated for
> > different source / sink systems, while for your case there is a in-built
> > HDFS connector coming along with the framework itself. You can find more
> > details in these Kafka wikis / java docs:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> >
> >
> >
> https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html
> >
> > Guozhang
> >
> >
> > On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai 
> > wrote:
> >
> > > Take a look at secor:
> > >
> > > https://github.com/pinterest/secor
> > >
> > > Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> > > underlying systems such as Hadoop, it only uses Kafka high level
> consumer
> > > to balance the work loads.  Very easy to understand and manage.  It's
> > > probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> > > Lots of web companies use this to do the kafka data ingestion
> > > (Pinterest/Uber/AirBnb).
> > >
> > >
> > > On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead  >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > We're looking at options for getting data from Kafka onto HDFS and
> > Camus
> > > > looks like the natural choice for this. It's also evident that
> LinkedIn
> > > who
> > > > originally created Camus are taking things in a different direction
> and
> > > are
> > > > advising people to use their Gobblin ETL framework instead. We feel
> > that
> > > > Gobblin is overkill for many simple use cases and Camus seems a much
> > > > simpler and better fit. The problem now is that with LinkedIn
> > apparently
> > > > withdrawing official support for it it appears that any changes to
> > Camus
> > > > are being managed by various forks of it and it looks like everyone
> is
> > > > building and using their own versions. Wouldn't it be better for a
> > > > community to form around one official fork so development efforts can
> > be
> > > > focused on this? Any thoughts on this?
> > > >
> > > > Thanks,
> > > >
> > > > Adrian
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>