Too many commits

2019-04-24 Thread yuvraj singh
Hi all ,

In my application i am committing every offset to kafka one by one and my
max poll size is 30 . I am facing lot of commit failures so is it because
of above reasons ?

Thanks
Yubraj Singh

[image: Mailtrack]

Sender
notified by
Mailtrack

04/25/19,
12:16:48 PM


RE: kafka consumer metadata expire

2019-04-24 Thread 赖剑清
Hi,

Have you tried setting the METADATA_MAX_AGE_CONFIG (default: 300,000ms) smaller?
It seems the consumer won't actually update the metadata info until it's out of 
date.

>-Original Message-
>From: Shengnan YU [mailto:ysna...@hotmail.com]
>Sent: Wednesday, April 24, 2019 1:43 PM
>To: users@kafka.apache.org
>Subject: kafka consumer metadata expire
>
>Hi everyone
>How to update kafka consumer's metadata when some topics are deleted.
>The metadata in consumer will not be expired or remove outdated topic,
>which leads to UNKNOWN_TOPIC exception when fetching metadata from
>the cluster. Thank you very mych.
>
>extend/html/proSignature.html?ftlId=1&name=ysnakie&uid=ysnakie%40hot
>mail.com&iconUrl=http%3A%2F%2Fmail-
>online.nosdn.127.net%2Fsmc8371a9788890d59e567ed336b96676b.jpg&items
>=%5B%22ysnakie%40hotmail.com%22%5D>
>[http://mail-
>online.nosdn.127.net/smc8371a9788890d59e567ed336b96676b.jpg]
>ysnakie
>
>ysna...@hotmail.com
>
>签名由 网易邮箱大师
>定制


Re: Source Connector Task in a distributed env

2019-04-24 Thread Venkata S A
Thank you Ryann & Hans. I will look into it.
The spooldir, I explored it too and found that it too suits for standalone
as you mentioned.

'Venkata

On Wed 24 Apr, 2019, 22:34 Hans Jespersen,  wrote:

> Your connector sounds a lot like this one
> https://github.com/jcustenborder/kafka-connect-spooldir
>
> I do not think you can run such a connector in distributed mode though.
> Typically something like this runs in standalone mode to avoid conflicts.
>
> -hans
>
>
> On Wed, Apr 24, 2019 at 1:08 AM Venkata S A  wrote:
>
> > Hello Team,
> >
> >   I am developing a custom Source Connector that watches a
> > given directory for any new files. My question is in a Distributed
> > environment, how will the tasks in different nodes handle the file Queue?
> >
> >   Referring to this sample
> > <
> >
> https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory
> > >
> > ,
> > poll() in SourceTask is polling the directory at specified interval for a
> > new files and fetching the files in a Queue as below:
> >
> > Queue queue = ((DirWatcher) task).getFilesQueue();
> > >
> >
> > So, When in a 3 node cluster, this is run individually by each task. But
> > then, How is the synchronization happening between all the tasks in
> > different nodes to avoid duplication of file reading to kafka ?
> >
> >
> > Thank you,
> > Venkata S
> >
>


Re: Kafka question on Stream Processing

2019-04-24 Thread Bruno Cadonna
Hi Gagan,

If you want to read a message, you need to poll the message from the
broker. The brokers have only very limited notion of message content. They
only know that a message has a key, a value, and some metadata, but they
are not able to interpret the contents of those message components. The
clients are responsible to read and process the messages. For reading and
processing, the clients need to poll the messages from the brokers.

For the processing you want to do, you could use Kafka Streams. See
https://kafka.apache.org/documentation/streams/ for more information. Have
a look at the branch DSL operation there.

Best regards,
Bruno

On Wed, Apr 24, 2019 at 1:54 AM Gagan Sabharwal  wrote:

> Hi team,
>
> Say we have a client which has pushed a message to a topic. The message has
> a a simple structure
>
> Task - Time of task
> Send an email - 1530
>
> Now say that this message is consumed by a consumer subscribed to this
> topic.
> Since Topic already has a storage, what I intend to do is just read the
> message (not poll it) and see if it is before 1530 then send it to the tail
> of the partition of that topic. Does Kafka provide such an Api? Next time
> when the consumer reads the message and see if the current time is after
> 1530, it will poll the message and execute the task.
>
> Regards
> Gagan
>


Re: Source Connector Task in a distributed env

2019-04-24 Thread Hans Jespersen
Your connector sounds a lot like this one
https://github.com/jcustenborder/kafka-connect-spooldir

I do not think you can run such a connector in distributed mode though.
Typically something like this runs in standalone mode to avoid conflicts.

-hans


On Wed, Apr 24, 2019 at 1:08 AM Venkata S A  wrote:

> Hello Team,
>
>   I am developing a custom Source Connector that watches a
> given directory for any new files. My question is in a Distributed
> environment, how will the tasks in different nodes handle the file Queue?
>
>   Referring to this sample
> <
> https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory
> >
> ,
> poll() in SourceTask is polling the directory at specified interval for a
> new files and fetching the files in a Queue as below:
>
> Queue queue = ((DirWatcher) task).getFilesQueue();
> >
>
> So, When in a 3 node cluster, this is run individually by each task. But
> then, How is the synchronization happening between all the tasks in
> different nodes to avoid duplication of file reading to kafka ?
>
>
> Thank you,
> Venkata S
>


Re: Source Connector Task in a distributed env

2019-04-24 Thread Ryanne Dolan
Venkata, the example you have linked creates a single task config s.t.
there is no parallelism -- a single task runs on the cluster, regardless of
the number of nodes. In order to introduce parallelism, your
SourceConnector needs to group all known files among N partitions and
return N task configs for N tasks. You can use
ConnectorUtils.groupPartitions() for this. In each task config, specify the
specific group of files for that task, as grouped by groupPartitions().

Then your SourceConnector can watch for new files. Anytime a new file is
detected, call context.requestTaskReconfiguration(), which will restart
this process.

Ryanne

On Wed, Apr 24, 2019 at 3:08 AM Venkata S A  wrote:

> Hello Team,
>
>   I am developing a custom Source Connector that watches a
> given directory for any new files. My question is in a Distributed
> environment, how will the tasks in different nodes handle the file Queue?
>
>   Referring to this sample
> <
> https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory
> >
> ,
> poll() in SourceTask is polling the directory at specified interval for a
> new files and fetching the files in a Queue as below:
>
> Queue queue = ((DirWatcher) task).getFilesQueue();
> >
>
> So, When in a 3 node cluster, this is run individually by each task. But
> then, How is the synchronization happening between all the tasks in
> different nodes to avoid duplication of file reading to kafka ?
>
>
> Thank you,
> Venkata S
>


Kafka consumer downgrade issue

2019-04-24 Thread Andreas Nilsson
Hi all,

Recently we upgraded our application from the more primitive Java client APIs 
(kafka.javaapi.consumer.SimpleConsumer, kafka.api.FetchRequest and friends) to 
the more friendly poll-based org.apache.kafka.clients.consumer.KafkaConsumer 
using Kafka Java client libraries version 1.1.0.

The upgrade went fine and meant we could remove a LOT of custom code we had 
previously needed to use. This was also released into a version of the 
application that went into QA / staging environments of a client of ours. The 
upgrade (environment was not wiped) in the staging environment went fine as 
well without any errors. Due to an unrelated serious issue that we discovered a 
couple of days later, we had to advice the client a roll-back from the new 
application version (still had not gone into production). We didn't think this 
would lead to any issues.

What DID occur when the application was downgraded in the staging client 
environment, however, was Kafka clients starting to get a lot of offset fetch 
errors:

WARN  [2019-04-24 01:19:43,269] custom.application.package.KafkaClient: Failed 
to fetch messages from Kafka. Error code: 1

Now, error code 1 means offset out of range as far as I'm aware. I would not 
have expected this since I was pretty sure offsets were stored in ZooKeeper 
with the old SimpleConsumer (I can see the offsets being stored in nodes like 
/consumers//offsets//0) and in Kafka internal topic 
__consumer_offsets with the new KafkaConsumer (I can NOT see offsets in 
ZooKeeper for a couple of new-style consumers we already had).

The ZooKeeper-stored consumer offsets should also never have been "out of 
range" since there would strictly always be more events since they were last 
used. Could this error simply mean that the ZooKeeper offsets was TOO OLD 
(meaning Kafka had thrown away messages)?

I wouldn't have been surprised to see some kind of error if the offset data was 
shared.

This is NOT a serious issue for us at the moment, we were just surprised that 
the downgrade ran into errors.

Regards,
Andreas Nilsson


Kafka question on Stream Processing

2019-04-24 Thread Gagan Sabharwal
Hi team,

Say we have a client which has pushed a message to a topic. The message has
a a simple structure

Task - Time of task
Send an email - 1530

Now say that this message is consumed by a consumer subscribed to this
topic.
Since Topic already has a storage, what I intend to do is just read the
message (not poll it) and see if it is before 1530 then send it to the tail
of the partition of that topic. Does Kafka provide such an Api? Next time
when the consumer reads the message and see if the current time is after
1530, it will poll the message and execute the task.

Regards
Gagan


Source Connector Task in a distributed env

2019-04-24 Thread Venkata S A
Hello Team,

  I am developing a custom Source Connector that watches a
given directory for any new files. My question is in a Distributed
environment, how will the tasks in different nodes handle the file Queue?

  Referring to this sample

,
poll() in SourceTask is polling the directory at specified interval for a
new files and fetching the files in a Queue as below:

Queue queue = ((DirWatcher) task).getFilesQueue();
>

So, When in a 3 node cluster, this is run individually by each task. But
then, How is the synchronization happening between all the tasks in
different nodes to avoid duplication of file reading to kafka ?


Thank you,
Venkata S