[jira] [Created] (KAFKA-4678) Create separate page for Connect docs

2017-01-20 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4678:
--

 Summary: Create separate page for Connect docs
 Key: KAFKA-4678
 URL: https://issues.apache.org/jira/browse/KAFKA-4678
 Project: Kafka
  Issue Type: Improvement
  Components: documentation, KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava
Priority: Minor


The single-page http://kafka.apache.org/documentation/ is quite long, and will 
get even longer with the inclusion of info on Kafka Connect's included 
transformations.

Recently Kafka Streams documentation was split off to its own page with a short 
overview in the main doc page. We should do the same for {{connect.html}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2374: KAFKA-3209: KIP-66: more single message transforms

2017-01-13 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2374

KAFKA-3209: KIP-66: more single message transforms

WIP, in this PR I'd also like to add doc generation for transformations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka more-smt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2374


commit f34cc71c9931ea7ec5dd045512c623196928a2a3
Author: Shikhar Bhushan 
Date:   2017-01-13T20:00:31Z

SetSchemaMetadata SMT

commit 09af66b3f1861bf2a0718aaf79ca1a3bd22adcec
Author: Shikhar Bhushan 
Date:   2017-01-13T21:44:57Z

Support schemaless and rename HoistToStruct->Hoist; add the inverse Extract 
transform

commit 022f4920c5f09d068bbf49e47091a1333dc48be2
Author: Shikhar Bhushan 
Date:   2017-01-13T21:51:43Z

InsertField transform -- fix bad name for interface containing config name 
constants

commit c5260a718e2f0ade66c4607a4b9c21abda61b90c
Author: Shikhar Bhushan 
Date:   2017-01-13T22:01:25Z

ValueToKey SMT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2365: MINOR: avoid closing over both pre & post-transfor...

2017-01-12 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2365

MINOR: avoid closing over both pre & post-transform record in 
WorkerSourceTask

Followup to #2299 for KAFKA-3209

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka 2299-followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2365


commit b0abf743d0329bbbc9620b1b0ef6acd4b3b035b3
Author: Shikhar Bhushan 
Date:   2017-01-13T00:32:11Z

MINOR: avoid closing over both pre & post-transform record in 
WorkerSourceTask

Followup to #2299 for KAFKA-3209




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2182: ConfigDef experimentation - support List and Ma...

2017-01-11 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/2182


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-09 Thread Shikhar Bhushan
Thanks all. The vote passed with +5 (binding).

On Fri, Jan 6, 2017 at 11:37 AM Shikhar Bhushan 
wrote:

That makes sense to me, I'll fold that into the PR and update the KIP if it
gets committed in that form.

On Fri, Jan 6, 2017 at 9:44 AM Jason Gustafson  wrote:

+1 One minor comment: would it make sense to let the `Transformation`
interface extend `o.a.k.c.Configurable` and remove the `init` method?

On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede  wrote:

> +1 (binding)
>
> On Wed, Jan 4, 2017 at 2:36 PM Shikhar Bhushan 
> wrote:
>
> > I do plan on introducing a new `connect:transforms` module (which
> > `connect:runtime` will depend on), so they will live in a separate
module
> > in the source tree and output.
> >
> > ( https://github.com/apache/kafka/pull/2299 )
> >
> > On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava 
> > wrote:
> >
> > > +1
> > >
> > > Gwen, re: bundling transformations, would it help at all to isolate
> them
> > to
> > > a separate jar or is the concern purely about maintaining them as part
> of
> > > Kafka?
> > >
> > > -Ewen
> > >
> > > On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira 
> > wrote:
> > > >
> > > > > I would have preferred not to bundle transformations, but since
SMT
> > > > > capability is a much needed feature, I'll take it in its current
> > form.
> > > > >
> > > > > +1
> > > > >
> > > > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <
> > shik...@confluent.io
> > > >
> > > > > wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start voting on KIP-66:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Shikhar
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Gwen Shapira
> > > > > Product Manager | Confluent
> > > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760>
<(650)%20450-2760> | @gwenshap
> > > > > Follow us: Twitter | blog
> > > > >
> > > >
> > >
> >
> --
> Thanks,
> Neha
>


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-06 Thread Shikhar Bhushan
That makes sense to me, I'll fold that into the PR and update the KIP if it
gets committed in that form.

On Fri, Jan 6, 2017 at 9:44 AM Jason Gustafson  wrote:

> +1 One minor comment: would it make sense to let the `Transformation`
> interface extend `o.a.k.c.Configurable` and remove the `init` method?
>
> On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede  wrote:
>
> > +1 (binding)
> >
> > On Wed, Jan 4, 2017 at 2:36 PM Shikhar Bhushan 
> > wrote:
> >
> > > I do plan on introducing a new `connect:transforms` module (which
> > > `connect:runtime` will depend on), so they will live in a separate
> module
> > > in the source tree and output.
> > >
> > > ( https://github.com/apache/kafka/pull/2299 )
> > >
> > > On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava <
> e...@confluent.io>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Gwen, re: bundling transformations, would it help at all to isolate
> > them
> > > to
> > > > a separate jar or is the concern purely about maintaining them as
> part
> > of
> > > > Kafka?
> > > >
> > > > -Ewen
> > > >
> > > > On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian  >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira 
> > > wrote:
> > > > >
> > > > > > I would have preferred not to bundle transformations, but since
> SMT
> > > > > > capability is a much needed feature, I'll take it in its current
> > > form.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <
> > > shik...@confluent.io
> > > > >
> > > > > > wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to start voting on KIP-66:
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Shikhar
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Gwen Shapira
> > > > > > Product Manager | Confluent
> > > > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760>
> <(650)%20450-2760> | @gwenshap
> > > > > > Follow us: Twitter | blog
> > > > > >
> > > > >
> > > >
> > >
> > --
> > Thanks,
> > Neha
> >
>


[jira] [Commented] (KAFKA-4598) Create new SourceTask commit callback method that takes offsets param

2017-01-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805460#comment-15805460
 ] 

Shikhar Bhushan commented on KAFKA-4598:


Yeah, that's a reasonable alternative with the caveat you pointed out.

> Create new SourceTask commit callback method that takes offsets param
> -
>
> Key: KAFKA-4598
> URL: https://issues.apache.org/jira/browse/KAFKA-4598
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>
> {{SourceTask.commit()}} can be invoked concurrently with a 
> {{SourceTask.poll()}} in progress. Thus it is currently not possible to know 
> what offset state the commit call corresponds to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: KafkaConnect SinkTask::put

2017-01-06 Thread Shikhar Bhushan
Sorry I forgot to specify, this needs to go into your Connect worker
configuration.
On Fri, Jan 6, 2017 at 02:57  wrote:

> Hi Shikhar,
>
> I've just added this to ~config/consumer.properties in the Kafka folder
> but it doesn't appear to have made any difference.  Have I put it in the
> wrong place?
>
> Thanks again,
> David
>
> -----Original Message-
> From: Shikhar Bhushan [mailto:shik...@confluent.io]
> Sent: 05 January 2017 18:12
> To: dev@kafka.apache.org
> Subject: Re: KafkaConnect SinkTask::put
>
> Hi David,
>
> You can override the underlying consumer's `max.poll.records` setting for
> this. E.g.
> consumer.max.poll.records=500
>
> Best,
>
> Shikhar
>
> On Thu, Jan 5, 2017 at 3:59 AM  wrote:
>
> > Is there any way of limiting the number of events that are passed into
> > the call to the put(Collection) method?
> >
> > I'm writing a set of events to Kafka via a source Connector/Task and
> > reading these from a sink Connector/Task.
> > If I generate of the order of 10k events the number of SinkRecords
> > passed to the put method starts off very low but quickly rises in
> > large increments such that 9k events are passed to a later invocation of
> the put method.
> >
> > Furthermore, processing a large number of events in a single call (I'm
> > writing to Elasticsearch) appears to cause the source task poll()
> > method to timeout, raising a CommitFailedException which,
> > incidentally, I can't see how to catch.
> >
> > Thanks for any help you can provide,
> > David
> >
>


[jira] [Commented] (KAFKA-4598) Create new SourceTask commit callback method that takes offsets param

2017-01-05 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802110#comment-15802110
 ] 

Shikhar Bhushan commented on KAFKA-4598:


In the meantime the workaround is to use 
{{SourceTask.commitRecord(SourceRecord)}} to keep track of committable offset 
state.

> Create new SourceTask commit callback method that takes offsets param
> -
>
> Key: KAFKA-4598
> URL: https://issues.apache.org/jira/browse/KAFKA-4598
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>
> {{SourceTask.commit()}} can be invoked concurrently with a 
> {{SourceTask.poll()}} in progress. Thus it is currently not possible to know 
> what offset state the commit call corresponds to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4598) Create new SourceTask commit callback method that takes offsets param

2017-01-05 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4598:
--

 Summary: Create new SourceTask commit callback method that takes 
offsets param
 Key: KAFKA-4598
 URL: https://issues.apache.org/jira/browse/KAFKA-4598
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


{{SourceTask.commit()}} can be invoked concurrently with a 
{{SourceTask.poll()}} in progress. Thus it is currently not possible to know 
what offset state the commit call corresponds to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: KafkaConnect SinkTask::put

2017-01-05 Thread Shikhar Bhushan
Hi David,

You can override the underlying consumer's `max.poll.records` setting for
this. E.g.
consumer.max.poll.records=500

Best,

Shikhar

On Thu, Jan 5, 2017 at 3:59 AM  wrote:

> Is there any way of limiting the number of events that are passed into the
> call to the put(Collection) method?
>
> I'm writing a set of events to Kafka via a source Connector/Task and
> reading these from a sink Connector/Task.
> If I generate of the order of 10k events the number of SinkRecords passed
> to the put method starts off very low but quickly rises in large increments
> such that 9k events are passed to a later invocation of the put method.
>
> Furthermore, processing a large number of events in a single call (I'm
> writing to Elasticsearch) appears to cause the source task poll() method to
> timeout, raising a CommitFailedException which, incidentally, I can't see
> how to catch.
>
> Thanks for any help you can provide,
> David
>


[GitHub] kafka pull request #2313: KAFKA-4575: ensure topic created before starting s...

2017-01-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2313

KAFKA-4575: ensure topic created before starting sink for 
ConnectDistributedTest.test_pause_resume_sink

Otherwise in this test the sink task goes through the pause/resume cycle 
with 0 assigned partitions, since the default metadata refresh interval is 
quite long

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4575

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2313


commit fdc2bb353de995ba09398d0851b934d3aee4570c
Author: Shikhar Bhushan 
Date:   2017-01-05T00:28:14Z

KAFKA-4575: ensure topic created before starting sink for 
ConnectDistributedTest.test_pause_resume_sink

Otherwise in this test the sink task goes through the pause/resume cycle 
with 0 assigned partitions, since the default metadata refresh interval is 
quite long




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming source connector

2017-01-04 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799807#comment-15799807
 ] 

Shikhar Bhushan commented on KAFKA-4575:


By the way, the error message is misleading, it should read 'after resuming 
_sink_ connector'.

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming source connector
> 
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming sink connector

2017-01-04 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4575:
---
Summary: Transient failure in 
ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after 
resuming sink connector  (was: Transient failure in 
ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after 
resuming source connector)

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming sink connector
> --
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming source connector

2017-01-04 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4575 started by Shikhar Bhushan.
--
> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming source connector
> 
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-04 Thread Shikhar Bhushan
I do plan on introducing a new `connect:transforms` module (which
`connect:runtime` will depend on), so they will live in a separate module
in the source tree and output.

( https://github.com/apache/kafka/pull/2299 )

On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava 
wrote:

> +1
>
> Gwen, re: bundling transformations, would it help at all to isolate them to
> a separate jar or is the concern purely about maintaining them as part of
> Kafka?
>
> -Ewen
>
> On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian 
> wrote:
>
> > +1
> >
> > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira  wrote:
> >
> > > I would have preferred not to bundle transformations, but since SMT
> > > capability is a much needed feature, I'll take it in its current form.
> > >
> > > +1
> > >
> > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan  >
> > > wrote:
> > > > Hi all,
> > > >
> > > > I'd like to start voting on KIP-66:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > >
> > > > Best,
> > > >
> > > > Shikhar
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>


[VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-04 Thread Shikhar Bhushan
Hi all,

I'd like to start voting on KIP-66:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Best,

Shikhar


[GitHub] kafka pull request #2299: KAFKA-3209: KIP-66: single message transforms

2017-01-03 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2299

KAFKA-3209: KIP-66: single message transforms

*WIP* particularly around testing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka smt-2017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2299


commit 1178670f36d8fbdf5cbdbb2728ace7bf4f0e7300
Author: Shikhar Bhushan 
Date:   2017-01-03T19:21:17Z

KAFKA-3209: KIP-66: single message transforms




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2017-01-03 Thread Shikhar Bhushan
Makes sense Ewen, I edited the KIP to include this criteria.

I'd like to start a voting thread soon unless anyone has additional points
for discussion.

On Fri, Dec 30, 2016 at 12:14 PM Ewen Cheslack-Postava 
wrote:

On Thu, Dec 15, 2016 at 7:41 PM, Shikhar Bhushan 
wrote:

> There is no decision being proposed on the final list of transformations
> that will ever be in Kafka :-) Just the initial set we should roll with.
>

I'd second this comment as well. I'm very wary of the slippery slope, which
is why I wasn't in favor of including any connectors except for very simple
demos.

But it might be useful to have some initial guidelines, and might even make
sense to include them in the KIP so they are easy for others to find. I
think both the examples Gwen gave are easily excluded with a simple rule:
SMTs that are shipped with Kafka should be general enough to apply to many
data sources & serialization formats. email is a very specific type of data
(email headers and HL7 are pretty similar) and Avro is a specific
serialization format where, presumably, the Connect data type you'd have to
receive to do this transformation is just a byte array of the original Avro
file. In contrast, the included transformations in the current KIP are
*really* broadly applicable; apart from timestamps, I think they pretty
much all could potentially be applied to *any* stream of data.

I think the more interesting cases that we'll probably end up debating are
around serialization formats that "fit" within other connectors, in
particular I'm thinking of CSV and line-oriented JSON parsing. Individual
connectors may avoid this (or not be aware that the data has this
structure), but users will want that type of transformation to be easy and
baked in.

-Ewen


>
> On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira  wrote:
>
> You are absolutely right that the vast majority of NiFi's processors are
> not what we would consider SMT.
>
> I went over the list and I think the still contain just short of 50 legit
> SMTs:
> https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+
> NiFi+Transformations
>
> You are right that ExtractHL7 is an extreme that clearly doesn't belong in
> Apache Kafka, but just before that we have ExtractAvroMetadata that may
> fit? and ExtractEmailHeaders doesn't sound totally outlandish either...
>
> Nothing in the baked-in list by Shikhar looks out of place. I am concerned
> about slipperly slope. Or the arbitrariness of the decision if we say that
> this list is final and nothing else will ever make it into Kafka.
>
> Gwen
>
> On Thu, Dec 15, 2016 at 3:00 PM, Ewen Cheslack-Postava 
> wrote:
>
> > I think there are a couple of factors that make transformations and
> > connectors different.
> >
> > First, NiFi's 150 processors is a bit misleading. In NiFi, processors
> cover
> > data sources, data sinks, serialization/deserialization, *and*
> > transformations. I haven't filtered the list to see how many fall into
> the
> > first 3 categories, but it's a *lot* of the processors they have.
> >
> > Second, since transformations only apply to a single message and I'd
> think
> > they generally shouldn't be interacting with external services (i.e. I
> > think trying to do enrichment in SMT is probably a bad idea), the scope
> of
> > possible transformations is reasonably limited and the transformations
> > themselves tend to be small and easily maintainable. I think this is a
> > dramatic difference from connectors, which are each substantial projects
> in
> > their own right.
> >
> > While I get the slippery slope argument re: including specific
> > transformations, I think we can come up with a reasonable policy (and
via
> > KIPs we can, as a community, come to an agreement based purely on taste
> if
> > it comes down to that). In particular, I'd say keep the core general
> (i.e.
> > no domain-specific transformations/parsing like HL7), pure data
> > manipulation (i.e. no enrichment), and nothing that could just as well
be
> > done as a converter/serializer/deserializer/source connector/sink
> > connector.
> >
> > I was very staunchly against including connectors (aside from a simple
> > example) directly in Kafka, so this may seem like a reversal of
position.
> > But I think the % of use cases covered will look very different between
> > connectors and transformations. Sure, some connectors are very popular,
> and
> > moreso right now because they are the most thoroughly developed, tested,
> > etc. But the top 3 most common transformations will probably be used
> across
> > all the top 20 most popular connectors. I have no do

[jira] [Work started] (KAFKA-3209) Support single message transforms in Kafka Connect

2017-01-02 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3209 started by Shikhar Bhushan.
--
> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3513) Transient failure of OffsetValidationTest

2017-01-01 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15791790#comment-15791790
 ] 

Shikhar Bhushan commented on KAFKA-3513:


There was a failure in the last run  
http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-01-01--001.1483262815--apache--trunk--3d7e884/report.html

{noformat}
Module: kafkatest.tests.client.consumer_test
Class:  OffsetValidationTest
Method: test_broker_failure
Arguments:
{
  "clean_shutdown": true,
  "enable_autocommit": true
}
{noformat}

> Transient failure of OffsetValidationTest
> -
>
> Key: KAFKA-3513
> URL: https://issues.apache.org/jira/browse/KAFKA-3513
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-04-05--001.1459840046--apache--trunk--31e263e/report.html
> The version of the test fails in this case is:
> Module: kafkatest.tests.client.consumer_test
> Class:  OffsetValidationTest
> Method: test_broker_failure
> Arguments:
> {
>   "clean_shutdown": true,
>   "enable_autocommit": false
> }
> and others passed. It's unclear if the parameters actually have any impact on 
> the failure.
> I did some initial triage and it looks like the test code isn't seeing all 
> the group members join the group (receive partition assignments), but it 
> appears from the logs that they all did. This could indicate a simple timing 
> issue, but I haven't been able to verify that yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming source connector

2016-12-29 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-4575:
--

Assignee: Shikhar Bhushan

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming source connector
> 
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming source connector

2016-12-29 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4575:
---
Component/s: system tests
 KafkaConnect

> Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in 
> consuming messages after resuming source connector
> 
>
> Key: KAFKA-4575
> URL: https://issues.apache.org/jira/browse/KAFKA-4575
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, system tests
>Reporter: Shikhar Bhushan
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {noformat}
> [INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
>  Summary: Failed to consume messages after resuming source connector
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 267, in test_pause_and_resume_sink
> err_msg="Failed to consume messages after resuming source connector")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Failed to consume messages after resuming source connector
> {noformat}
> We recently fixed KAFKA-4527 and this is a new kind of failure in the same 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4575) Transient failure in ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after resuming source connector

2016-12-29 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4575:
--

 Summary: Transient failure in 
ConnectDistributedTest.test_pause_and_resume_sink in consuming messages after 
resuming source connector
 Key: KAFKA-4575
 URL: https://issues.apache.org/jira/browse/KAFKA-4575
 Project: Kafka
  Issue Type: Test
Reporter: Shikhar Bhushan


http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html

{noformat}
[INFO  - 2016-12-29 08:56:23,050 - runner_client - log - lineno:221]: 
RunnerClient: 
kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_pause_and_resume_sink:
 Summary: Failed to consume messages after resuming source connector
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
 line 267, in test_pause_and_resume_sink
err_msg="Failed to consume messages after resuming source connector")
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: Failed to consume messages after resuming source connector
{noformat}

We recently fixed KAFKA-4527 and this is a new kind of failure in the same test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4574) Transient failure in ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade with security_protocol = SASL_PLAINTEXT, SSL

2016-12-29 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4574:
--

 Summary: Transient failure in 
ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade with security_protocol = 
SASL_PLAINTEXT, SSL
 Key: KAFKA-4574
 URL: https://issues.apache.org/jira/browse/KAFKA-4574
 Project: Kafka
  Issue Type: Test
  Components: system tests
Reporter: Shikhar Bhushan


http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html

{{ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade}} failed with these 
{{security_protocol}} parameters 

{noformat}

test_id:
kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SASL_PLAINTEXT
status: FAIL
run time:   3 minutes 44.094 seconds


1 acked message did not make it to the Consumer. They are: [5076]. We 
validated that the first 1 of these missing messages correctly made it into 
Kafka's data files. This suggests they were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
 line 117, in test_zk_security_upgrade
self.run_produce_consume_validate(self.run_zk_migration)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 1 acked message did not make it to the Consumer. They are: 
[5076]. We validated that the first 1 of these missing messages correctly made 
it into Kafka's data files. This suggests they were lost on their way to the 
consumer.
{noformat}

{noformat}

test_id:
kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SSL
status: FAIL
run time:   3 minutes 50.578 seconds


1 acked message did not make it to the Consumer. They are: [3559]. We 
validated that the first 1 of these missing messages correctly made it into 
Kafka's data files. This suggests they were lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
 line 117, in test_zk_security_upgrade
self.run_produce_consume_validate(self.run_zk_migration)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 101, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 163, in validate
assert success, msg
AssertionError: 1 acked message did not make it to the Consumer. They are: 
[3559]. We validated that the first 1 of these missing messages correctly made 
it into Kafka's data files. This suggests they were lost on their way to the 
consumer.
{noformat}

Previously: KAFKA-3985



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2277: KAFKA-4527: task status was being updated before a...

2016-12-19 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2277

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2277


commit 9911f25e2b06a02c56deadf7c586b5c263a08027
Author: Shikhar Bhushan 
Date:   2016-12-19T21:32:26Z

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
There is no decision being proposed on the final list of transformations
that will ever be in Kafka :-) Just the initial set we should roll with.

On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira  wrote:

You are absolutely right that the vast majority of NiFi's processors are
not what we would consider SMT.

I went over the list and I think the still contain just short of 50 legit
SMTs:
https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+NiFi+Transformations

You are right that ExtractHL7 is an extreme that clearly doesn't belong in
Apache Kafka, but just before that we have ExtractAvroMetadata that may
fit? and ExtractEmailHeaders doesn't sound totally outlandish either...

Nothing in the baked-in list by Shikhar looks out of place. I am concerned
about slipperly slope. Or the arbitrariness of the decision if we say that
this list is final and nothing else will ever make it into Kafka.

Gwen

On Thu, Dec 15, 2016 at 3:00 PM, Ewen Cheslack-Postava 
wrote:

> I think there are a couple of factors that make transformations and
> connectors different.
>
> First, NiFi's 150 processors is a bit misleading. In NiFi, processors
cover
> data sources, data sinks, serialization/deserialization, *and*
> transformations. I haven't filtered the list to see how many fall into the
> first 3 categories, but it's a *lot* of the processors they have.
>
> Second, since transformations only apply to a single message and I'd think
> they generally shouldn't be interacting with external services (i.e. I
> think trying to do enrichment in SMT is probably a bad idea), the scope of
> possible transformations is reasonably limited and the transformations
> themselves tend to be small and easily maintainable. I think this is a
> dramatic difference from connectors, which are each substantial projects
in
> their own right.
>
> While I get the slippery slope argument re: including specific
> transformations, I think we can come up with a reasonable policy (and via
> KIPs we can, as a community, come to an agreement based purely on taste if
> it comes down to that). In particular, I'd say keep the core general (i.e.
> no domain-specific transformations/parsing like HL7), pure data
> manipulation (i.e. no enrichment), and nothing that could just as well be
> done as a converter/serializer/deserializer/source connector/sink
> connector.
>
> I was very staunchly against including connectors (aside from a simple
> example) directly in Kafka, so this may seem like a reversal of position.
> But I think the % of use cases covered will look very different between
> connectors and transformations. Sure, some connectors are very popular,
and
> moreso right now because they are the most thoroughly developed, tested,
> etc. But the top 3 most common transformations will probably be used
across
> all the top 20 most popular connectors. I have no doubt people will end up
> writing custom ones (which is why it's nice to make them pluggable rather
> than choosing a fixed set), but they'll either be very niche (like people
> write custom connectors for their internal systems) or be more broadly
> applicable but very domain specific such that they are easy to reject for
> inclusion.
>
> @Gwen if we filtered the list of NiFi processors to ones that fit that
> criteria, would that still be too long a list for your taste? Similarly,
> let's say we were going to include some baked in; in that case, does
> anything look out of place to you in the list Shikhar has included in the
> KIP?
>
> -Ewen
>
> On Thu, Dec 15, 2016 at 2:01 PM, Gwen Shapira  wrote:
>
> > I agree about the ease of use in adding a small-subset of built-in
> > transformations.
> >
> > But the same thing is true for connectors - there are maybe 5 super
> popular
> > OSS connectors and the rest is a very long tail. We drew the line at not
> > adding any, because thats the easiest and because we did not want to
turn
> > Kafka into a collection of transformations.
> >
> > I really don't want to end up with 135 (or even 20) transformations in
> > Kafka. So either we have a super-clear definition of what belongs and
> what
> > doesn't - or we put in one minimal example and the rest goes into the
> > ecosystem.
> >
> > We can also start by putting transformations on github and just see if
> > there is huge demand for them in Apache. It is easier to add stuff to
the
> > project later than to remove functionality.
> >
> >
> >
> > On Thu, Dec 15, 2016 at 11:59 AM, Shikhar Bhushan 
> > wrote:
> >
> > > I have updated KIP-66
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 66%3A+Single+Message+Transform

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I think the tradeoffs for including connectors are different. Connectors
are comparatively larger in scope, they tend to come with their own set of
dependencies for the systems they need to talk to. Transformations as I
imagine them - at least the ones on the table in the wiki currently -
should be a single not-very-large class (or 3 when there are simple *Key
and *Value variants deriving from a base implementing the common
functionality), in some cases relying on common utilities for munging with
the Connect data API. Correspondingly, the maintenance burden is also
smaller.

It's true that it would probably be easier to add specific transformations
down the line than evolve/remove, but I have faith we can strike a good
balance in making the call on what to include from the start.

On > super-clear definition of what belongs and what doesn't

How about: small and broadly applicable, configurable in an easily
understandable manner, no external dependencies, concrete use-case

On Thu, Dec 15, 2016 at 2:01 PM Gwen Shapira  wrote:

I agree about the ease of use in adding a small-subset of built-in
transformations.

But the same thing is true for connectors - there are maybe 5 super popular
OSS connectors and the rest is a very long tail. We drew the line at not
adding any, because thats the easiest and because we did not want to turn
Kafka into a collection of transformations.

I really don't want to end up with 135 (or even 20) transformations in
Kafka. So either we have a super-clear definition of what belongs and what
doesn't - or we put in one minimal example and the rest goes into the
ecosystem.

We can also start by putting transformations on github and just see if
there is huge demand for them in Apache. It is easier to add stuff to the
project later than to remove functionality.



On Thu, Dec 15, 2016 at 11:59 AM, Shikhar Bhushan 
wrote:

> I have updated KIP-66
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 66%3A+Single+Message+Transforms+for+Kafka+Connect
> with
> the changes I proposed in the design.
>
> Gwen, I think the main downside to not including some transformations with
> Kafka Connect is that it seems less user friendly if folks have to make
> sure to have the right transformation(s) on the classpath as well, besides
> their connector(s). Additionally by going in with a small set included, we
> can encourage a consistent configuration and implementation style and
> provide utilities for e.g. data transformations, which I expect we will
> definitely need (discussed under 'Patterns for data transformations').
>
> It does get hard to draw the line once you go from 'none' to 'some'. To
get
> discussion going, if we get agreement on 'none' vs 'some', I added a table
> under 'Bundled transformations' for transformations which I think are
worth
> including.
>
> For many of these, I have noticed their absence in the wild as a pain
point
> --
> TimestampRouter:
> https://github.com/confluentinc/kafka-connect-elasticsearch/issues/33
> Mask:
> https://groups.google.com/d/msg/confluent-platform/3yHb8_
> mCReQ/sTQc3dNgBwAJ
> Insert:
> http://stackoverflow.com/questions/40664745/elasticsearch-connector-for-
> kafka-connect-offset-and-timestamp
> RegexRouter:
> https://groups.google.com/d/msg/confluent-platform/
> yEBwu1rGcs0/gIAhRp6kBwAJ
> NumericCast:
> https://github.com/confluentinc/kafka-connect-
> jdbc/issues/101#issuecomment-249096119
> TimestampConverter:
> https://groups.google.com/d/msg/confluent-platform/
> gGAOsw3Qeu4/8JCqdDhGBwAJ
> ValueToKey: https://github.com/confluentinc/kafka-connect-jdbc/pull/166
>
> In other cases, their functionality is already being implemented by
> connectors in divergent ways: RegexRouter, Insert, Replace, HoistToStruct,
> ExtractFromStruct
>
> On Wed, Dec 14, 2016 at 6:00 PM Gwen Shapira  wrote:
>
> I'm a bit concerned about adding transformations in Kafka. NiFi has 150
> processors, presumably they are all useful for someone. I don't know if
I'd
> want all of that in Apache Kafka. What's the downside of keeping it out?
Or
> at least keeping the built-in set super minimal (Flume has like 3 built-in
> interceptors)?
>
> Gwen
>
> On Wed, Dec 14, 2016 at 1:36 PM, Shikhar Bhushan 
> wrote:
>
> > With regard to a), just using `ConnectRecord` with `newRecord` as a new
> > abstract method would be a fine choice. In prototyping, both options end
> up
> > looking pretty similar (in terms of how transformations are implemented
> and
> > the runtime initializes and uses them) and I'm starting to lean towards
> not
> > adding a new interface into the mix.
> >
> > On b) I think we should include a small set of useful transfo

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I have updated KIP-66
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
with
the changes I proposed in the design.

Gwen, I think the main downside to not including some transformations with
Kafka Connect is that it seems less user friendly if folks have to make
sure to have the right transformation(s) on the classpath as well, besides
their connector(s). Additionally by going in with a small set included, we
can encourage a consistent configuration and implementation style and
provide utilities for e.g. data transformations, which I expect we will
definitely need (discussed under 'Patterns for data transformations').

It does get hard to draw the line once you go from 'none' to 'some'. To get
discussion going, if we get agreement on 'none' vs 'some', I added a table
under 'Bundled transformations' for transformations which I think are worth
including.

For many of these, I have noticed their absence in the wild as a pain point
--
TimestampRouter:
https://github.com/confluentinc/kafka-connect-elasticsearch/issues/33
Mask:
https://groups.google.com/d/msg/confluent-platform/3yHb8_mCReQ/sTQc3dNgBwAJ
Insert:
http://stackoverflow.com/questions/40664745/elasticsearch-connector-for-kafka-connect-offset-and-timestamp
RegexRouter:
https://groups.google.com/d/msg/confluent-platform/yEBwu1rGcs0/gIAhRp6kBwAJ
NumericCast:
https://github.com/confluentinc/kafka-connect-jdbc/issues/101#issuecomment-249096119
TimestampConverter:
https://groups.google.com/d/msg/confluent-platform/gGAOsw3Qeu4/8JCqdDhGBwAJ
ValueToKey: https://github.com/confluentinc/kafka-connect-jdbc/pull/166

In other cases, their functionality is already being implemented by
connectors in divergent ways: RegexRouter, Insert, Replace, HoistToStruct,
ExtractFromStruct

On Wed, Dec 14, 2016 at 6:00 PM Gwen Shapira  wrote:

I'm a bit concerned about adding transformations in Kafka. NiFi has 150
processors, presumably they are all useful for someone. I don't know if I'd
want all of that in Apache Kafka. What's the downside of keeping it out? Or
at least keeping the built-in set super minimal (Flume has like 3 built-in
interceptors)?

Gwen

On Wed, Dec 14, 2016 at 1:36 PM, Shikhar Bhushan 
wrote:

> With regard to a), just using `ConnectRecord` with `newRecord` as a new
> abstract method would be a fine choice. In prototyping, both options end
up
> looking pretty similar (in terms of how transformations are implemented
and
> the runtime initializes and uses them) and I'm starting to lean towards
not
> adding a new interface into the mix.
>
> On b) I think we should include a small set of useful transformations with
> Connect, since they can be applicable across different connectors and we
> should encourage some standardization for common operations. I'll update
> KIP-66 soon including a spec of transformations that I believe are worth
> including.
>
> On Sat, Dec 10, 2016 at 11:52 PM Ewen Cheslack-Postava 
> wrote:
>
> If anyone has time to review here, it'd be great to get feedback. I'd
> imagine that the proposal itself won't be too controversial -- keeps
> transformations simple (by only allowing map/filter), doesn't affect the
> rest of the framework much, and fits in with general config structure
we've
> used elsewhere (although ConfigDef could use some updates to make this
> easier...).
>
> I think the main open questions for me are:
>
> a) Is TransformableRecord worth it to avoid reimplementing small bits of
> code (it allows for a single implementation of the interface to trivially
> apply to both Source and SinkRecords). I think I prefer this, but it does
> come with some commitment to another interface on top of ConnectRecord. We
> could alternatively modify ConnectRecord which would require fewer
changes.
> b) How do folks feel about built-in transformations and the set that are
> mentioned here? This brings us way back to the discussion of built-in
> connectors. Transformations, especially when intended to be lightweight
and
> touch nothing besides the data already in the record, seem different from
> connectors -- there might be quite a few, but hopefully limited. Since we
> (hopefully) already factor out most serialization-specific stuff via
> Converters, I think we can keep this pretty limited. That said, I have no
> doubt some folks will (in my opinion) abuse this feature to do data
> enrichment by querying external systems, so building a bunch of
> transformations in could potentially open the floodgates, or at least make
> decisions about what is included vs what should be 3rd party muddy.
>
> -Ewen
>
>
> On Wed, Dec 7, 2016 at 11:46 AM, Shikhar Bhushan 
> wrote:
>
> > Hi all,
> >
> > I have another iteration at a propo

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-14 Thread Shikhar Bhushan
With regard to a), just using `ConnectRecord` with `newRecord` as a new
abstract method would be a fine choice. In prototyping, both options end up
looking pretty similar (in terms of how transformations are implemented and
the runtime initializes and uses them) and I'm starting to lean towards not
adding a new interface into the mix.

On b) I think we should include a small set of useful transformations with
Connect, since they can be applicable across different connectors and we
should encourage some standardization for common operations. I'll update
KIP-66 soon including a spec of transformations that I believe are worth
including.

On Sat, Dec 10, 2016 at 11:52 PM Ewen Cheslack-Postava 
wrote:

If anyone has time to review here, it'd be great to get feedback. I'd
imagine that the proposal itself won't be too controversial -- keeps
transformations simple (by only allowing map/filter), doesn't affect the
rest of the framework much, and fits in with general config structure we've
used elsewhere (although ConfigDef could use some updates to make this
easier...).

I think the main open questions for me are:

a) Is TransformableRecord worth it to avoid reimplementing small bits of
code (it allows for a single implementation of the interface to trivially
apply to both Source and SinkRecords). I think I prefer this, but it does
come with some commitment to another interface on top of ConnectRecord. We
could alternatively modify ConnectRecord which would require fewer changes.
b) How do folks feel about built-in transformations and the set that are
mentioned here? This brings us way back to the discussion of built-in
connectors. Transformations, especially when intended to be lightweight and
touch nothing besides the data already in the record, seem different from
connectors -- there might be quite a few, but hopefully limited. Since we
(hopefully) already factor out most serialization-specific stuff via
Converters, I think we can keep this pretty limited. That said, I have no
doubt some folks will (in my opinion) abuse this feature to do data
enrichment by querying external systems, so building a bunch of
transformations in could potentially open the floodgates, or at least make
decisions about what is included vs what should be 3rd party muddy.

-Ewen


On Wed, Dec 7, 2016 at 11:46 AM, Shikhar Bhushan 
wrote:

> Hi all,
>
> I have another iteration at a proposal for this feature here:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Connect+Transforms+-+Proposed+Design
>
> I'd welcome your feedback and comments.
>
> Thanks,
>
> Shikhar
>
> On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava 
> wrote:
>
> On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan 
> wrote:
>
> > >
> > >
> > > Hmm, operating on ConnectRecords probably doesn't work since you need
> to
> > > emit the right type of record, which might mean instantiating a new
> one.
> > I
> > > think that means we either need 2 methods, one for SourceRecord, one
> for
> > > SinkRecord, or we'd need to limit what parts of the message you can
> > modify
> > > (e.g. you can change the key/value via something like
> > > transformKey(ConnectRecord) and transformValue(ConnectRecord), but
> other
> > > fields would remain the same and the fmwk would handle allocating new
> > > Source/SinkRecords if needed)
> > >
> >
> > Good point, perhaps we could add an abstract method on ConnectRecord
that
> > takes all the shared fields as parameters and the implementations return
> a
> > copy of the narrower SourceRecord/SinkRecord type as appropriate.
> > Transformers would only operate on ConnectRecord rather than caring
about
> > SourceRecord or SinkRecord (in theory they could instanceof/cast, but
the
> > API should discourage it)
> >
> >
> > > Is there a use case for hanging on to the original? I can't think of a
> > > transformation where you'd need to do that (or couldn't just order
> things
> > > differently so it isn't a problem).
> >
> >
> > Yeah maybe this isn't really necessary. No strong preference here.
> >
> > That said, I do worry a bit that farming too much stuff out to
> transformers
> > > can result in "programming via config", i.e. a lot of the simplicity
> you
> > > get from Connect disappears in long config files. Standardization
would
> > be
> > > nice and might just avoid this (and doesn't cost that much
implementing
> > it
> > > in each connector), and I'd personally prefer something a bit less
> > flexible
> > > but consistent and easy to configure.
> >
> >
> > Not sure w

[jira] [Commented] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-12-12 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743300#comment-15743300
 ] 

Shikhar Bhushan commented on KAFKA-3209:


Thanks [~snisarg]. I self-assigned it as I don't believe you were actively 
working on it and the ticket was unassigned, but I'd be happy to collaborate if 
you have time.

Yes, I think we should continue the work by updating KIP-66. I'm working on 
drafting the proposal into KIP form and I'll send a ML update when it's ready.

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>        Reporter: Neha Narkhede
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4524) ConfigDef.Type.LIST does not handle escaping

2016-12-12 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4524:
--

 Summary: ConfigDef.Type.LIST does not handle escaping
 Key: KAFKA-4524
 URL: https://issues.apache.org/jira/browse/KAFKA-4524
 Project: Kafka
  Issue Type: Bug
Reporter: Shikhar Bhushan


{{ConfigDef.Type.LIST}} expects a CSV list, but does not handle escaping. It is 
not possible to provide values containing commas.

We should probably adopt the semi-standard way of escaping CSV as in 
https://tools.ietf.org/html/rfc4180



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-12-09 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-3209:
--

Assignee: Shikhar Bhushan

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2232: HOTFIX: Fix HerderRequest.compareTo()

2016-12-08 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2232

HOTFIX: Fix HerderRequest.compareTo()

With KAFKA-3008 (#1788), the implementation does not respect the contract 
that 'sgn(x.compareTo(y)) == -sgn(y.compareTo(x))'

This fix addresses the hang with JDK8 in DistributedHerderTest.compareTo()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka herderreq-compareto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2232


commit 4c5a8102a340478509a1c9331bf53ceccac394fb
Author: Shikhar Bhushan 
Date:   2016-12-08T23:18:59Z

HOTFIX: Fix HerderRequest.compareTo()

With KAFKA-3008 (#1788), the implementation does not respect the contract 
that 'sgn(x.compareTo(y)) == -sgn(y.compareTo(x))'

This fix addresses the hang with JDK8 in DistributedHerderTest.compareTo()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-12-07 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729737#comment-15729737
 ] 

Shikhar Bhushan commented on KAFKA-3209:


[~snisarg] and [~jjchorrobe], I revived the discussion thread and I'd welcome 
your thoughts on there about this proposal: 
https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-07 Thread Shikhar Bhushan
Hi all,

I have another iteration at a proposal for this feature here:
https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design

I'd welcome your feedback and comments.

Thanks,

Shikhar

On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava 
wrote:

On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan 
wrote:

> >
> >
> > Hmm, operating on ConnectRecords probably doesn't work since you need to
> > emit the right type of record, which might mean instantiating a new one.
> I
> > think that means we either need 2 methods, one for SourceRecord, one for
> > SinkRecord, or we'd need to limit what parts of the message you can
> modify
> > (e.g. you can change the key/value via something like
> > transformKey(ConnectRecord) and transformValue(ConnectRecord), but other
> > fields would remain the same and the fmwk would handle allocating new
> > Source/SinkRecords if needed)
> >
>
> Good point, perhaps we could add an abstract method on ConnectRecord that
> takes all the shared fields as parameters and the implementations return a
> copy of the narrower SourceRecord/SinkRecord type as appropriate.
> Transformers would only operate on ConnectRecord rather than caring about
> SourceRecord or SinkRecord (in theory they could instanceof/cast, but the
> API should discourage it)
>
>
> > Is there a use case for hanging on to the original? I can't think of a
> > transformation where you'd need to do that (or couldn't just order
things
> > differently so it isn't a problem).
>
>
> Yeah maybe this isn't really necessary. No strong preference here.
>
> That said, I do worry a bit that farming too much stuff out to
transformers
> > can result in "programming via config", i.e. a lot of the simplicity you
> > get from Connect disappears in long config files. Standardization would
> be
> > nice and might just avoid this (and doesn't cost that much implementing
> it
> > in each connector), and I'd personally prefer something a bit less
> flexible
> > but consistent and easy to configure.
>
>
> Not sure what the you're suggesting :-) Standardized config properties for
> a small set of transformations, leaving it upto connectors to integrate?
>

I just mean that you get to the point where you're practically writing a
Kafka Streams application, you're just doing it through either an
incredibly convoluted set of transformers and configs, or a single
transformer with incredibly convoluted set of configs. You basically get to
the point where you're config is a mini DSL and you're not really saving
that much.

The real question is how much we want to venture into the "T" part of ETL.
I tend to favor minimizing how much we take on since the rest of Connect
isn't designed for it, it's designed around the E & L parts.

-Ewen


> Personally I'm skeptical of that level of flexibility in transformers --
> > its getting awfully complex and certainly takes us pretty far from
> "config
> > only" realtime data integration. It's not clear to me what the use cases
> > are that aren't covered by a small set of common transformations that
can
> > be chained together (e.g. rename/remove fields, mask values, and maybe a
> > couple more).
> >
>
> I agree that we should have some standard transformations that we ship
with
> connect that users would ideally lean towards for routine tasks. The ones
> you mention are some good candidates where I'd imagine can expose simple
> config, e.g.
>transform.filter.whitelist=x,y,z # filter to a whitelist of fields
>transfom.rename.spec=oldName1=>newName1, oldName2=>newName2
>topic.rename.replace=-/_
>topic.rename.prefix=kafka_
> etc..
>
> However the ecosystem will invariably have more complex transformers if we
> make this pluggable. And because ETL is messy, that's probably a good
thing
> if folks are able to do their data munging orthogonally to connectors, so
> that connectors can focus on the logic of how data should be copied
from/to
> datastores and Kafka.
>
>
> > In any case, we'd probably also have to change configs of connectors if
> we
> > allowed configs like that since presumably transformer configs will just
> be
> > part of the connector config.
> >
>
> Yeah, haven't thought much about how all the configuration would tie
> together...
>
> I think we'd need the ability to:
> - spec transformer chain (fully-qualified class names? perhaps special
> aliases for built-in ones? perhaps third-party fqcns can be assigned
> aliases by users in the chain spec, for easier configuration and to
> uniquely identify a transformation when it occurs more than one time in a
> chain?)
> - configure each transformer -- all properties prefixed with that
> transformer's ID (fqcn / alias) get destined to it
>
> Additionally, I think we would probably want to allow for topic-specific
> overrides <https://issues.apache.org/jira/browse/KAFKA-3962> (e.g. you
> want
> certain transformations for one topic, but different ones for another...)
>



--
Thanks,
Ewen


[GitHub] kafka pull request #2196: KAFKA-3910: prototype of another approach to cycli...

2016-11-30 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2196

KAFKA-3910: prototype of another approach to cyclic schemas



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka KAFKA-3910

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2196


commit 03a85d774e9d6a2feebba1ae1f50967619857040
Author: Shikhar Bhushan 
Date:   2016-12-01T01:39:22Z

KAFKA-3910: prototype of another approach to cyclic schemas




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2182: ConfigDef experimentation - support List and Ma...

2016-11-28 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2182

ConfigDef experimentation - support List and Map



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka configdef-experimentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2182


commit 39a10054606c2ae9d26d7b0625c7c59210129b09
Author: Shikhar Bhushan 
Date:   2016-11-28T19:27:21Z

ConfigDef experimentation - support List and Map




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2139: KAFKA-4161: KIP-89: Allow sink connectors to decou...

2016-11-15 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2139

KAFKA-4161: KIP-89: Allow sink connectors to decouple flush and offset 
commit



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4161-deux

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2139


commit 706bcc860fed939a00171ebf61fdab8639d99b06
Author: Shikhar Bhushan 
Date:   2016-11-15T00:29:43Z

KAFKA-4161: KIP-89: Allow sink connectors to decouple flush and offset 
commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2131: Remove failing ConnectDistributedTest.test_bad_con...

2016-11-14 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2131

Remove failing ConnectDistributedTest.test_bad_connector_class

Since #1911 was merged it is hard to externally test a connector 
transitioning to FAILED state due to an initialization failure, which is what 
this test was attempting to verify.

The unit test added in #1778 already exercises exception-handling around 
Connector instantiation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka test_bad_connector_class

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit 32a2b8e7a09f5b002e5e058d70bdcec90cb12944
Author: Shikhar Bhushan 
Date:   2016-11-15T00:50:31Z

Remove failing ConnectDistributedTest.test_bad_connector_class

Since #1911 was merged it is hard to externally test a connector 
transitioning to FAILED state due to an initialization failure, which is what 
this test was attempting to verify.

The unit test added in #1778 already exercises exception-handling around 
Connector instantiation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3462) Allow SinkTasks to disable consumer offset commit

2016-11-14 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-3462:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

This will be handled with KIP-89 / KAFKA-4161. Tasks that wish to disable 
framework-managed offset commits can return an empty map from 
{{SinkTask.preCommit()}} to make it a no-op.

> Allow SinkTasks to disable consumer offset commit 
> --
>
> Key: KAFKA-3462
> URL: https://issues.apache.org/jira/browse/KAFKA-3462
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Liquan Pei
>Assignee: Liquan Pei
>Priority: Minor
> Fix For: 0.10.2.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
>  SinkTasks should be able to disable consumer offset commit if they manage 
> offsets in the sink data store rather than using Kafka consumer offsets.  For 
> example, an HDFS connector might record offsets in HDFS to provide exactly 
> once delivery. When the SinkTask is started or a rebalance occurs, the task 
> would reload offsets from HDFS. In this case, disabling consumer offset 
> commit will save some CPU cycles and network IOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-14 Thread Shikhar Bhushan
The vote passed with +3 binding votes. Thanks all!

On Sun, Nov 13, 2016 at 1:42 PM Gwen Shapira  wrote:

+1 (binding)

On Nov 9, 2016 2:17 PM, "Shikhar Bhushan"  wrote:

> Hi,
>
> I would like to initiate a vote on KIP-89
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit
>
> Best,
>
> Shikhar
>


[jira] [Work started] (KAFKA-4161) Decouple flush and offset commits

2016-11-09 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4161 started by Shikhar Bhushan.
--
> Decouple flush and offset commits
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> allow for connectors to have flexible policies around flushes. This would be 
> in addition to the time interval based flushes that are controlled with 
> {{offset.flush.interval.ms}}, for which the clock should be reset when any 
> kind of flush happens.
> We should probably also support requesting flushes via the 
> {{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
> off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-09 Thread Shikhar Bhushan
Hi,

I would like to initiate a vote on KIP-89

https://cwiki.apache.org/confluence/display/KAFKA/KIP-89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit

Best,

Shikhar


[jira] [Assigned] (KAFKA-3910) Cyclic schema support in ConnectSchema and SchemaBuilder

2016-11-07 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-3910:
--

Assignee: Shikhar Bhushan  (was: Ewen Cheslack-Postava)

> Cyclic schema support in ConnectSchema and SchemaBuilder
> 
>
> Key: KAFKA-3910
> URL: https://issues.apache.org/jira/browse/KAFKA-3910
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: John Hofman
>Assignee: Shikhar Bhushan
>
> Cyclic schema's are not supported by ConnectSchema or SchemaBuilder. 
> Subsequently the AvroConverter (confluentinc/schema-registry) hits a stack 
> overflow when converting a cyclic avro schema, e.g:
> {code}
> {"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}
> {code}
> This is a blocking issue for all connectors running on the connect framework 
> with data containing cyclic references. The AvroConverter cannot support 
> cyclic schema's until the underlying ConnectSchema and SchemaBuilder do.
> To reproduce the stack-overflow (Confluent-3.0.0):
> Produce some cyclic data:
> {code}
> bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test 
> --property value.schema='{"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}'
> {"value":1,"next":null} 
> {"value":1,"next":{"list":{"value":2,"next":null}}}
> {code}
> Then try to consume it with connect:
> {code:title=connect-console-sink.properties}
> name=local-console-sink 
> connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector 
> tasks.max=1 
> topics=test
> {code}
> {code}
> ./bin/connect-standalone 
> ./etc/schema-registry/connect-avro-standalone.properties 
> connect-console-sink.properties  
> … start up logging … 
> java.lang.StackOverflowError 
>  at org.apache.avro.JsonProperties.getJsonProp(JsonProperties.java:54) 
>  at org.apache.avro.JsonProperties.getProp(JsonProperties.java:45) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1055) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4161) Decouple flush and offset commits

2016-11-04 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637718#comment-15637718
 ] 

Shikhar Bhushan commented on KAFKA-4161:


Created KIP-89 for this 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit

> Decouple flush and offset commits
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> allow for connectors to have flexible policies around flushes. This would be 
> in addition to the time interval based flushes that are controlled with 
> {{offset.flush.interval.ms}}, for which the clock should be reset when any 
> kind of flush happens.
> We should probably also support requesting flushes via the 
> {{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
> off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-04 Thread Shikhar Bhushan
Hi all,

I created KIP-89 for making a Connect API change that allows for sink
connectors to decouple flush and offset commits.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit

I'd welcome your input.

Best,

Shikhar


[GitHub] kafka pull request #1968: MINOR: missing whitespace in doc for `ssl.cipher.s...

2016-11-04 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1968


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4356) o.a.k.common.utils.SystemTime.sleep() swallows InterruptedException

2016-11-03 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4356.

Resolution: Duplicate

> o.a.k.common.utils.SystemTime.sleep() swallows InterruptedException
> ---
>
> Key: KAFKA-4356
> URL: https://issues.apache.org/jira/browse/KAFKA-4356
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Shikhar Bhushan
>Priority: Minor
>
> {{org.apache.kafka.common.utils.SystemTime.sleep()}} catches and ignores 
> {{InterruptedException}}. When doing so normally the interruption state 
> should still be restored with {{Thread.currentThread().interrupt()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4375) Kafka consumer may swallow some interrupts meant for the calling thread

2016-11-03 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634044#comment-15634044
 ] 

Shikhar Bhushan commented on KAFKA-4375:


Good to have a report of this being a problem, I opened KAFKA-4356 recently 
after chancing on the code. It seems like an oversight rather than by design.

> Kafka consumer may swallow some interrupts meant for the calling thread
> ---
>
> Key: KAFKA-4375
> URL: https://issues.apache.org/jira/browse/KAFKA-4375
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Stig Rohde Døssing
>
> Apache Storm has added a new data source ("spout") based on the Kafka 0.9 
> consumer. Storm interacts with the consumer by having one thread per spout 
> instance loop calls to poll/commitSync etc. When Storm shuts down, another 
> thread indicates that the looping threads should shut down by interrupting 
> them, and joining them.
> If one of the looping threads happen to be interrupted while executing 
> certain sleeps in some consumer methods (commitSync and committed at least), 
> the interrupt can be lost because they contain a call to SystemTime.sleep, 
> which swallows the interrupt.
> Is this behavior by design, or can SystemTime be changed to reset the thread 
> interrupt flag when catching an InterruptedException? 
> I haven't checked the rest of the client code, so it's possible that this is 
> an issue in other parts of the code too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4356) o.a.k.common.utils.SystemTime.sleep() swallows InterruptedException

2016-10-28 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4356:
--

 Summary: o.a.k.common.utils.SystemTime.sleep() swallows 
InterruptedException
 Key: KAFKA-4356
 URL: https://issues.apache.org/jira/browse/KAFKA-4356
 Project: Kafka
  Issue Type: Bug
Reporter: Shikhar Bhushan
Priority: Minor


{{org.apache.kafka.common.utils.SystemTime.sleep()}} catches and ignores 
{{InterruptedException}}. When doing so normally the interruption state should 
still be restored with {{Thread.currentThread().interrupt()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2040: KAFKA-4161: prototype for exploring API change

2016-10-26 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/2040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4342) Kafka-connect- support tinyint values

2016-10-25 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4342.

Resolution: Not A Problem

The Connect schema type {{Schema.Type.INT8}} accurately maps to a signed Java 
{{byte}}. Given the absence of unsigned types in Java, I think we just have to 
live with that...

We can followup on the JDBC connector issue you created 
https://github.com/confluentinc/kafka-connect-jdbc/pull/152

> Kafka-connect- support tinyint values
> -
>
> Key: KAFKA-4342
> URL: https://issues.apache.org/jira/browse/KAFKA-4342
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sagar Rao
>
> We have been using Kafka-connect-jdbc  actively for one of our projects and 
> one of the issues that we have noticed is the way it handles the tinyint 
> values.
> Our database is on mysql and mysql allows both signed and unsigned values to 
> be stored. So, it can have values going upto 255 but when kafka-connect sees 
> values beyond 128, it fails. 
> Reason being, in the ConnectSchema class, the INT8 maps to a Byte which is a 
> signed value. If we look at the jdbc docs then this is what they say about 
> handling tinyint values:
> https://docs.oracle.com/javase/6/docs/technotes/guides/jdbc/getstart/mapping.html
> 8.3.4 TINYINT
> The JDBC type TINYINT represents an 8-bit integer value between 0 and 255 
> that may be signed or unsigned.
> The corresponding SQL type, TINYINT, is currently supported by only a subset 
> of the major databases. Portable code may therefore prefer to use the JDBC 
> SMALLINT type, which is widely supported.
> The recommended Java mapping for the JDBC TINYINT type is as either a Java 
> byte or a Java short. The 8-bit Java byte type represents a signed value from 
> -128 to 127, so it may not always be appropriate for larger TINYINT values, 
> whereas the 16-bit Java short will always be able to hold all TINYINT values.
> I had submitted a PR for this last week. But it failed in the jenkins build 
> for unrelated test case. So, if someone can take a look at this or suggest 
> something then it would be great:
> https://github.com/apache/kafka/pull/2044



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4306) Connect workers won't shut down if brokers are not available

2016-10-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586568#comment-15586568
 ] 

Shikhar Bhushan commented on KAFKA-4306:


KAFKA-4154 is another issue relating to broker-unavailability preventing 
shutdown, but I believe there it's because it failed to finish startup in the 
first place.

> Connect workers won't shut down if brokers are not available
> 
>
> Key: KAFKA-4306
> URL: https://issues.apache.org/jira/browse/KAFKA-4306
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Gwen Shapira
>Assignee: Ewen Cheslack-Postava
>
> If brokers are not available and we try to shut down connect workers, sink 
> connectors will be stuck in a loop retrying to commit offsets:
> 2016-10-17 09:39:14,907] INFO Marking the coordinator 192.168.1.9:9092 (id: 
> 2147483647 rack: null) dead for group connect-dump-kafka-config1 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:600)
> [2016-10-17 09:39:14,907] ERROR Commit of 
> WorkerSinkTask{id=dump-kafka-config1-0} offsets threw an unexpected 
> exception:  (org.apache.kafka.connect.runtime.WorkerSinkTask:194)
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: 
> org.apache.kafka.common.errors.GroupCoordinatorNotAvailableException
> We should probably limit the number of retries before doing "unclean" 
> shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3462) Allow SinkTasks to disable consumer offset commit

2016-10-18 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-3462:
---
Issue Type: Improvement  (was: Bug)

> Allow SinkTasks to disable consumer offset commit 
> --
>
> Key: KAFKA-3462
> URL: https://issues.apache.org/jira/browse/KAFKA-3462
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Liquan Pei
>Assignee: Liquan Pei
>Priority: Minor
> Fix For: 0.10.2.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
>  SinkTasks should be able to disable consumer offset commit if they manage 
> offsets in the sink data store rather than using Kafka consumer offsets.  For 
> example, an HDFS connector might record offsets in HDFS to provide exactly 
> once delivery. When the SinkTask is started or a rebalance occurs, the task 
> would reload offsets from HDFS. In this case, disabling consumer offset 
> commit will save some CPU cycles and network IOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4161) Decouple flush and offset commits

2016-10-18 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4161:
---
Issue Type: Improvement  (was: New Feature)

> Decouple flush and offset commits
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> allow for connectors to have flexible policies around flushes. This would be 
> in addition to the time interval based flushes that are controlled with 
> {{offset.flush.interval.ms}}, for which the clock should be reset when any 
> kind of flush happens.
> We should probably also support requesting flushes via the 
> {{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
> off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2040: KAFKA-4161: prototype for exploring API change

2016-10-18 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2040

KAFKA-4161: prototype for exploring API change



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4161

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2040


commit ed75ad7b5618aff9fc85573748c23a5229144bc3
Author: Shikhar Bhushan 
Date:   2016-10-18T19:50:28Z

KAFKA-4161: prototype for exploring API change




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1969: MINOR: missing fullstop in doc for `max.partition....

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1969

MINOR: missing fullstop in doc for `max.partition.fetch.bytes`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1969


commit 3a9568911ca226979f4058129a3f238d1f0187c1
Author: Shikhar Bhushan 
Date:   2016-10-04T22:43:21Z

MINOR: missing fullstop in doc for `max.partition.fetch.bytes`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1968: MINOR: missing whitespace in doc for `ssl.cipher.s...

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1968

MINOR: missing whitespace in doc for `ssl.cipher.suites`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1968


commit 44001af72e226a3d683ab8169f8816e7cdf67a49
Author: Shikhar Bhushan 
Date:   2016-10-04T22:42:01Z

MINOR: missing whitespace in doc for `ssl.cipher.suites`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1964: KAFKA-4010: add ConfigDef toEnrichedRst() for addi...

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1964

KAFKA-4010: add ConfigDef toEnrichedRst() for additional fields in output

followup on https://github.com/apache/kafka/pull/1696

cc @rekhajoshm 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1964


commit 630fd4c5220ca5c934492318037f8a493a305b01
Author: Joshi 
Date:   2016-08-03T04:08:41Z

KAFKA-4010; ConfigDef.toEnrichedRst() to have grouped sections with 
dependents info

commit b7d4e80f32714de351b3af3a26e34817258be0cc
Author: Joshi 
Date:   2016-09-08T22:24:25Z

KAFKA-4010; updated for review comments

commit 70cb9ff98f075376c1537feb8abc8fe41bea1b83
Author: Joshi 
Date:   2016-09-09T00:59:18Z

KAFKA-4010; updated for review comments

commit 4fabee350b0a7279c891d1a291fa04c346258703
Author: Joshi 
Date:   2016-09-15T19:22:55Z

KAFKA-4010; updated for review comments

commit fff244e6e523d6b506656fbacf252b6730d5ed98
Author: Joshi 
Date:   2016-09-16T05:32:52Z

KAFKA-4010; updated for review comments

commit ffc35f0f5a7fd8727465cbb5e481dfabe8c6b438
Author: Shikhar Bhushan 
Date:   2016-10-04T21:18:25Z

Merge branch 'KAFKA-4010' of https://github.com/rekhajoshm/kafka into 
kafka-4010

commit d8f1d8122a2b24442bf586e6be130970cdfda016
Author: Shikhar Bhushan 
Date:   2016-10-04T21:25:02Z

tweaks and tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3906) Connect logical types do not support nulls.

2016-09-23 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-3906.

Resolution: Fixed

I think we should handle null values at the converter layer to avoid 
duplication in logical type impls, as [~ewencp] suggested in the PR. KAFKA-4183 
fixes the null handling for logical types in {{JsonConverter}}.

> Connect logical types do not support nulls.
> ---
>
> Key: KAFKA-3906
> URL: https://issues.apache.org/jira/browse/KAFKA-3906
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jeremy Custenborder
>Assignee: Ewen Cheslack-Postava
>
> The logical types for Kafka Connect do not support null data values. Date, 
> Decimal, Time, and Timestamp all will throw null reference exceptions if a 
> null is passed in to their fromLogical and toLogical methods. Date, Time, and 
> Timestamp require signature changes for these methods to support nullable 
> types.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1872: KAFKA-4183: centralize checking for optional and d...

2016-09-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1872

KAFKA-4183: centralize checking for optional and default values to avoid 
bugs

Cleaner to just check once for optional & default value from the 
`convertToConnect()` function.

It also helps address an issue with conversions for logical type schemas 
that have default values and null as the included value. That test case is 
_probably_ not an issue in practice, since when using the `JsonConverter` to 
serialize a missing field with a default value, it will serialize the default 
value for the field. But in the face of JSON data streaming in from a topic 
being [generous on input, strict on 
output](http://tedwise.com/2009/05/27/generous-on-input-strict-on-output) seems 
best.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4183

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1872


commit 1e09c6431f11361e7f3a5af4c09a8174c3547669
Author: Shikhar Bhushan 
Date:   2016-09-16T23:17:40Z

KAFKA-4183: centralize checking for optional and default values to avoid 
bugs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Reopened] (KAFKA-4183) Logical converters in JsonConverter don't properly handle null values

2016-09-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reopened KAFKA-4183:

  Assignee: Shikhar Bhushan  (was: Ewen Cheslack-Postava)

[~rhauch] Reopening this, I noticed an issue with handling default values. E.g. 
this test

{noformat}
@Test
public void timestampToConnectDefval() {
Schema schema = Timestamp.builder().defaultValue(new 
java.util.Date(42)).schema();
String msg = "{ \"schema\": { \"type\": \"int64\", \"name\": 
\"org.apache.kafka.connect.data.Timestamp\", \"version\": 1, \"default\": 42 }, 
\"payload\": null }";
SchemaAndValue schemaAndValue = converter.toConnectData(TOPIC, 
msg.getBytes());
assertEquals(schema, schemaAndValue.schema());
assertEquals(new java.util.Date(42), schemaAndValue.value());
}
{noformat}

Happy to create a followup PR since I'm poking around with it

> Logical converters in JsonConverter don't properly handle null values
> -
>
> Key: KAFKA-4183
> URL: https://issues.apache.org/jira/browse/KAFKA-4183
> Project: Kafka
>  Issue Type: Bug
>      Components: KafkaConnect
>Affects Versions: 0.10.0.1
>Reporter: Randall Hauch
>Assignee: Shikhar Bhushan
> Fix For: 0.10.1.0
>
>
> The {{JsonConverter.TO_CONNECT_LOGICAL_CONVERTERS}} map contains 
> {{LogicalTypeConverter}} implementations to convert from the raw value into 
> the corresponding logical type value, and they are used during 
> deserialization of message keys and/or values. However, these implementations 
> do not handle the case when the input raw value is null, which can happen 
> when a key or value has a schema that is or contains a field that is 
> _optional_.
> Consider a Kafka Connect schema of type STRUCT that contains a field "date" 
> with an optional schema of type {{org.apache.kafka.connect.data.Date}}. When 
> the key or value with this schema contains a null "date" field and is 
> serialized, the logical serializer properly will serialize the null value. 
> However, upon _deserialization_, the 
> {{JsonConverter.TO_CONNECT_LOGICAL_CONVERTERS}} are used to convert the 
> literal value (which is null) to a logical value. All of the 
> {{JsonConverter.TO_CONNECT_LOGICAL_CONVERTERS}} implementations will throw a 
> NullPointerException when the input value is null. 
> For example:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.json.JsonConverter$14.convert(JsonConverter.java:224)
>   at 
> org.apache.kafka.connect.json.JsonConverter.convertToConnect(JsonConverter.java:731)
>   at 
> org.apache.kafka.connect.json.JsonConverter.access$100(JsonConverter.java:53)
>   at 
> org.apache.kafka.connect.json.JsonConverter$12.convert(JsonConverter.java:200)
>   at 
> org.apache.kafka.connect.json.JsonConverter.convertToConnect(JsonConverter.java:727)
>   at 
> org.apache.kafka.connect.json.JsonConverter.jsonToConnect(JsonConverter.java:354)
>   at 
> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:343)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3906) Connect logical types do not support nulls.

2016-09-16 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497533#comment-15497533
 ] 

Shikhar Bhushan commented on KAFKA-3906:


[~jcustenborder] did this come up in the context of {{JsonConverter}}, and if 
so can it be closed since KAFKA-4183 patched that?

> Connect logical types do not support nulls.
> ---
>
> Key: KAFKA-3906
> URL: https://issues.apache.org/jira/browse/KAFKA-3906
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Jeremy Custenborder
>Assignee: Ewen Cheslack-Postava
>
> The logical types for Kafka Connect do not support null data values. Date, 
> Decimal, Time, and Timestamp all will throw null reference exceptions if a 
> null is passed in to their fromLogical and toLogical methods. Date, Time, and 
> Timestamp require signature changes for these methods to support nullable 
> types.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1865: KAFKA-4173: SchemaProjector should successfully pr...

2016-09-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1865

KAFKA-4173: SchemaProjector should successfully project missing Struct 
field when target field is optional



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4173

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1865


commit 085e30ee5ab49981dbb9b1f353b3ea02fdadec7e
Author: Shikhar Bhushan 
Date:   2016-09-16T18:16:18Z

KAFKA-4173: SchemaProjector should successfully project missing Struct 
field when target field is optional




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4161) Decouple flush and offset commits

2016-09-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4161:
---
Summary: Decouple flush and offset commits  (was: Allow connectors to 
request flush via the context)

> Decouple flush and offset commits
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> allow for connectors to have flexible policies around flushes. This would be 
> in addition to the time interval based flushes that are controlled with 
> {{offset.flush.interval.ms}}, for which the clock should be reset when any 
> kind of flush happens.
> We should probably also support requesting flushes via the 
> {{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
> off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4161) Allow connectors to request flush via the context

2016-09-16 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496820#comment-15496820
 ] 

Shikhar Bhushan commented on KAFKA-4161:


We could also implement KAFKA-3462 here by having the semantics that connectors 
that want to disable offset tracking by Connect can return an empty map from 
{{flushedOffsets()}}. Maybe {{flushedOffsets()}} isn't the best name - really 
want a name implying {{commitableOffsets()}}.

> Allow connectors to request flush via the context
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> allow for connectors to have flexible policies around flushes. This would be 
> in addition to the time interval based flushes that are controlled with 
> {{offset.flush.interval.ms}}, for which the clock should be reset when any 
> kind of flush happens.
> We should probably also support requesting flushes via the 
> {{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
> off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4173) SchemaProjector should successfully project when source schema field is missing and target schema field is optional

2016-09-14 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4173:
--

 Summary: SchemaProjector should successfully project when source 
schema field is missing and target schema field is optional
 Key: KAFKA-4173
 URL: https://issues.apache.org/jira/browse/KAFKA-4173
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


As reported in https://github.com/confluentinc/kafka-connect-hdfs/issues/115



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4161) Allow connectors to request flush via the context

2016-09-14 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491296#comment-15491296
 ] 

Shikhar Bhushan commented on KAFKA-4161:


bq. Probably worth clarifying whether we're really talking about just flush 
here or offset commit as well. Flush really only exists in order to support 
offset commit (from the framework's perspective), but since you mention full 
buffers I think you might be getting at a slightly different use case for 
connectors.

Sorry I wasn't clear, flushing data & offset commit are currently coupled as 
you pointed out. If we want to avoid unnecessary redelivery of records it is 
best to commit offsets with the 'most current' knowledge of them, which we 
currently have after calling {{flush()}}.

bq. In general, I think it'd actually be even better to just get rid of the 
idea of having to flush as a common operation as it hurts throughput to have to 
flush entirely to commit offsets (we are flushing the pipeline, which is never 
good). Ideally we coudl do what the framework does with source connectors and 
just track which data has been successfully delivered and use that for the 
majority of offset commits. We'd still need it for cases like shutdown where we 
want to make sure all data has been sent, but since the framework controls 
delivery of data, maybe its even better just to wait for that data to be 
written. 

Good points, I agree it would be better to make it so {{flush()}} is not 
routine since it can hurt throughput. I think we can deprecate it altogether. 
As a proposal:
{noformat}
abstract class SinkTask {
..
 // New method
public Map flushedOffsets() { throw new 
NotImplementedException(); }

@Deprecated
public void flush(Map offsets) { }
..
}
{noformat}

Then periodic offset committing business would get at the {{flushedOffsets()}}, 
and if that is not implemented, call {{flush()}} as currently so it can commit 
the offset state as of the last {{put()}} call.

I don't think {{flush()}} is needed even at shutdown. Tasks are already being 
advised via {{close()}} and can choose to flush any buffered data from there. 
We can do a final offset commit based on the {{flushedOffsets()}} after 
{{close()}} (though this does imply a quirk that even after a 
{{TopicPartition}} is closed we expect tasks to keep offset state around in the 
map returned by {{flushedOffsets()}}).

Additionally, it would be good to have a {{context.requestCommit()}} in the 
spirit of {{context.requestFlush()}} as I was originally proposing. The 
motivation is that connectors can optimize for avoiding unnecessary redelivery 
when recovering from failures. Connectors can choose whatever policies are best 
like number-of-records or size-based batching/buffering for writing to the 
destination system as part of the normal flow of calls to {{put()}}, and 
request a commit when they have actually written data to the destination 
system. There need not be a strong guarantee about whether offset committing 
actually happens after such a request so we don't commit offsets too often and 
can choose to only do it after some minimum interval, e.g. in case a connector 
always requests commit after a put.

bq. The main reason I think we even need the explicit flush() is that some 
connectors may have very long delays between flushes (e.g. any object stores 
like S3) such that they need to be told directly that they need to write all 
their data (or discard it).

I don't believe it is currently possible for a connector to communicate that it 
wants to discard data rather than write it out when {{flush()}} is called 
(aside from I guess throwing an exception...). With the above proposal the 
decision of when and whether or not to write data would be completely upto 
connectors.

bq. Was there a specific connector & scenario you were thinking about here?

This came up in a thread on the user list ('Sink Connector feature request: 
SinkTask.putAndReport()')

> Allow connectors to request flush via the context
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>      Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
> 

[jira] [Created] (KAFKA-4161) Allow connectors to request flush via the context

2016-09-13 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4161:
--

 Summary: Allow connectors to request flush via the context
 Key: KAFKA-4161
 URL: https://issues.apache.org/jira/browse/KAFKA-4161
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


It is desirable to have, in addition to the time-based flush interval, volume 
or size-based commits. E.g. a sink connector which is buffering in terms of 
number of records may want to request a flush when the buffer is full, or when 
sufficient amount of data has been buffered in a file.

Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
allow for connectors to have flexible policies around flushes. This would be in 
addition to the time interval based flushes that are controlled with 
{{offset.flush.interval.ms}}, for which the clock should be reset when any kind 
of flush happens.

We should probably also support requesting flushes via the 
{{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2016-09-13 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4159:
--

 Summary: Allow overriding producer & consumer properties at the 
connector level
 Key: KAFKA-4159
 URL: https://issues.apache.org/jira/browse/KAFKA-4159
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


As an example use cases, overriding a sink connector's consumer's partition 
assignment strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4154) Kafka Connect fails to shutdown if it has not completed startup

2016-09-12 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4154:
---
Fix Version/s: (was: 0.10.0.2)

> Kafka Connect fails to shutdown if it has not completed startup
> ---
>
> Key: KAFKA-4154
> URL: https://issues.apache.org/jira/browse/KAFKA-4154
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> To reproduce:
> 1. Start Kafka Connect in distributed mode without Kafka running 
> {{./bin/connect-distributed.sh config/connect-distributed.properties}}
> 2. Ctrl+C fails to terminate the process
> thread dump:
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode):
> "Thread-1" #13 prio=5 os_prio=31 tid=0x7fc29a18a800 nid=0x7007 waiting on 
> condition [0x73129000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bd7d91d8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.stop(DistributedHerder.java:357)
>   at 
> org.apache.kafka.connect.runtime.Connect.stop(Connect.java:71)
>   at 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook.run(Connect.java:93)
> "SIGINT handler" #27 daemon prio=9 os_prio=31 tid=0x7fc29aa6a000 
> nid=0x560f in Object.wait() [0x71a61000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>   at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>   at java.lang.Shutdown.runHooks(Shutdown.java:123)
>   at java.lang.Shutdown.sequence(Shutdown.java:167)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - locked <0x0007b0244600> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Terminator$1.handle(Terminator.java:52)
>   at sun.misc.Signal$1.run(Signal.java:212)
>   at java.lang.Thread.run(Thread.java:745)
> "kafka-producer-network-thread | producer-1" #15 daemon prio=5 os_prio=31 
> tid=0x7fc29a0b7000 nid=0x7a03 runnable [0x72608000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at 
> sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at 
> sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0007bd7788d8> (a sun.nio.ch.Util$2)
>   - locked <0x0007bd7788e8> (a 
> java.util.Collections$UnmodifiableSet)
>   - locked <0x0007bd77> (a sun.nio.ch.KQueueSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.kafka.common.network.Selector.select(Selector.java:470)
>   at 
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>   at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
>   at 
> org.apache.kafka.clients.produc

[jira] [Updated] (KAFKA-4154) Kafka Connect fails to shutdown if it has not completed startup

2016-09-12 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4154:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.0.2

> Kafka Connect fails to shutdown if it has not completed startup
> ---
>
> Key: KAFKA-4154
> URL: https://issues.apache.org/jira/browse/KAFKA-4154
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.0.2
>
>
> To reproduce:
> 1. Start Kafka Connect in distributed mode without Kafka running 
> {{./bin/connect-distributed.sh config/connect-distributed.properties}}
> 2. Ctrl+C fails to terminate the process
> thread dump:
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode):
> "Thread-1" #13 prio=5 os_prio=31 tid=0x7fc29a18a800 nid=0x7007 waiting on 
> condition [0x73129000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bd7d91d8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.stop(DistributedHerder.java:357)
>   at 
> org.apache.kafka.connect.runtime.Connect.stop(Connect.java:71)
>   at 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook.run(Connect.java:93)
> "SIGINT handler" #27 daemon prio=9 os_prio=31 tid=0x7fc29aa6a000 
> nid=0x560f in Object.wait() [0x71a61000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>   at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>   at java.lang.Shutdown.runHooks(Shutdown.java:123)
>   at java.lang.Shutdown.sequence(Shutdown.java:167)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - locked <0x0007b0244600> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Terminator$1.handle(Terminator.java:52)
>   at sun.misc.Signal$1.run(Signal.java:212)
>   at java.lang.Thread.run(Thread.java:745)
> "kafka-producer-network-thread | producer-1" #15 daemon prio=5 os_prio=31 
> tid=0x7fc29a0b7000 nid=0x7a03 runnable [0x72608000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at 
> sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at 
> sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0007bd7788d8> (a sun.nio.ch.Util$2)
>   - locked <0x0007bd7788e8> (a 
> java.util.Collections$UnmodifiableSet)
>   - locked <0x0007bd77> (a sun.nio.ch.KQueueSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.kafka.common.network.Selector.select(Selector.java:470)
>   at 
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>   at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:2

[jira] [Created] (KAFKA-4154) Kafka Connect fails to shutdown if it has not completed startup

2016-09-12 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4154:
--

 Summary: Kafka Connect fails to shutdown if it has not completed 
startup
 Key: KAFKA-4154
 URL: https://issues.apache.org/jira/browse/KAFKA-4154
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan
 Fix For: 0.10.1.0


To reproduce:
1. Start Kafka Connect in distributed mode without Kafka running 
{{./bin/connect-distributed.sh config/connect-distributed.properties}}
2. Ctrl+C fails to terminate the process

thread dump:
{noformat}
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode):

"Thread-1" #13 prio=5 os_prio=31 tid=0x7fc29a18a800 nid=0x7007 waiting on 
condition [0x73129000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007bd7d91d8> (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.stop(DistributedHerder.java:357)
at org.apache.kafka.connect.runtime.Connect.stop(Connect.java:71)
at 
org.apache.kafka.connect.runtime.Connect$ShutdownHook.run(Connect.java:93)

"SIGINT handler" #27 daemon prio=9 os_prio=31 tid=0x7fc29aa6a000 nid=0x560f 
in Object.wait() [0x71a61000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0007bd63db38> (a 
org.apache.kafka.connect.runtime.Connect$ShutdownHook)
at java.lang.Thread.join(Thread.java:1245)
- locked <0x0007bd63db38> (a 
org.apache.kafka.connect.runtime.Connect$ShutdownHook)
at java.lang.Thread.join(Thread.java:1319)
at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <0x0007b0244600> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Terminator$1.handle(Terminator.java:52)
at sun.misc.Signal$1.run(Signal.java:212)
at java.lang.Thread.run(Thread.java:745)

"kafka-producer-network-thread | producer-1" #15 daemon prio=5 os_prio=31 
tid=0x7fc29a0b7000 nid=0x7a03 runnable [0x72608000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0007bd7788d8> (a sun.nio.ch.Util$2)
- locked <0x0007bd7788e8> (a java.util.Collections$UnmodifiableSet)
- locked <0x0007bd77> (a sun.nio.ch.KQueueSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:470)
at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Thread.java:745)

"DistributedHerder" #14 prio=5 os_prio=31 tid=0x7fc29a11e000 nid=0x7803 
waiting on condition [0x72505000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.kafka.common.utils.SystemTime.sleep(SystemTime.java:37)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:299)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1310)
at 
org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:131)
at 
org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:86)
  

[jira] [Resolved] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4048.

Resolution: Not A Problem

Turns out all exceptions from {{task.flush()}} are treated as retriable (see 
{{WorkerSinkTask.commitOffsets()}}), so there is nothing to do here.

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468352#comment-15468352
 ] 

Shikhar Bhushan commented on KAFKA-3962:


This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't framework-level 
validation support when doing this, though.

> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Shikhar Bhushan
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468352#comment-15468352
 ] 

Shikhar Bhushan edited comment on KAFKA-3962 at 9/6/16 7:49 PM:


This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't library-level 
validation support when doing this, though.


was (Author: shikhar):
This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't framework-level 
validation support when doing this, though.

> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Shikhar Bhushan
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468012#comment-15468012
 ] 

Shikhar Bhushan commented on KAFKA-4127:


Dupe of KAFKA-3968

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4127.

Resolution: Duplicate

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan closed KAFKA-4127.
--

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1815: KAFKA-4115: grow default heap size for connect-dis...

2016-09-01 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1815

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka connect-heap-opts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1815


commit 1ceb80ed040ed0a5b988857273af5a2606f1df0b
Author: Shikhar Bhushan 
Date:   2016-09-02T04:01:27Z

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4115) Grow default heap settings for distributed Connect from 256M to 1G

2016-09-01 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4115:
--

 Summary: Grow default heap settings for distributed Connect from 
256M to 1G
 Key: KAFKA-4115
 URL: https://issues.apache.org/jira/browse/KAFKA-4115
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


Currently, both {{connect-standalone.sh}} and {{connect-distributed.sh}} start 
the Connect JVM with the default heap settings from {{kafka-run-class.sh}} of 
{{-Xmx256M}}.

At least for distributed connect, we should default to a much higher limit like 
1G. While the 'correct' sizing is workload dependent, with a system where you 
can run arbitrary connector plugins which may perform buffering of data, we 
should provide for more headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4100) Connect Struct schemas built using SchemaBuilder with no fields cause NPE in Struct constructor

2016-08-29 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4100 started by Shikhar Bhushan.
--
> Connect Struct schemas built using SchemaBuilder with no fields cause NPE in 
> Struct constructor
> ---
>
> Key: KAFKA-4100
> URL: https://issues.apache.org/jira/browse/KAFKA-4100
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.1
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> Avro records can legitimately have 0 fields (though arguable how useful that 
> is).
> When using the Confluent Schema Registry's {{AvroConverter}} with such a 
> schema,
> {noformat}
> java.lang.NullPointerException
>   at org.apache.kafka.connect.data.Struct.(Struct.java:56)
>   at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:980)
>   at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:782)
>   at 
> io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:103)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:358)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because it is using the {{SchemaBuilder}} to create the Struct 
> schema, which provides a {{field(..)}} builder for each field. If there are 
> no fields, the list stays as null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1800: KAFKA-4100: ensure 'fields' and 'fieldsByName' are...

2016-08-29 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1800

KAFKA-4100: ensure 'fields' and 'fieldsByName' are not null for Struct 
schemas



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4100

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1800


commit 1918046f3247d135cbeeddfbadafbe333bde2d55
Author: Shikhar Bhushan 
Date:   2016-08-29T21:55:33Z

KAFKA-4100: ensure 'fields' and 'fieldsByName' are not null for Struct 
schemas




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4100) Connect Struct schemas built using SchemaBuilder with no fields cause NPE in Struct constructor

2016-08-29 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4100:
--

 Summary: Connect Struct schemas built using SchemaBuilder with no 
fields cause NPE in Struct constructor
 Key: KAFKA-4100
 URL: https://issues.apache.org/jira/browse/KAFKA-4100
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.1
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan
Priority: Minor
 Fix For: 0.10.1.0


Avro records can legitimately have 0 fields (though arguable how useful that 
is).

When using the Confluent Schema Registry's {{AvroConverter}} with such a schema,
{noformat}
java.lang.NullPointerException
at org.apache.kafka.connect.data.Struct.(Struct.java:56)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:980)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:782)
at 
io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:103)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:358)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This is because it is using the {{SchemaBuilder}} to create the Struct schema, 
which provides a {{field(..)}} builder for each field. If there are no fields, 
the list stays as null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1790: KAFKA-4070: implement Connect Struct.toString()

2016-08-25 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1790

KAFKA-4070: implement Connect Struct.toString()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka add-struct-tostring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1790


commit 00a47cca8f18f9de8f69718fc41c02a2162e07c6
Author: Shikhar Bhushan 
Date:   2016-08-25T23:38:27Z

KAFKA-4070: implement Connect Struct.toString()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-24 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1778: KAFKA-4042: Contain connector & task start/stop fa...

2016-08-24 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1778

KAFKA-4042: Contain connector & task start/stop failures within the Worker

Invoke the statusListener.onFailure() callback on start failures so that 
the statusBackingStore is updated. This involved a fix to the putSafe() 
functionality which prevented any update that was not preceded by a (non-safe) 
put() from completing, so here when a connector or task is transitioning 
directly to FAILED.

Worker start methods can still throw if the same connector name or task ID 
is already registered with the worker, as this condition should not happen.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup-take4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1778


commit 050b80331f63ec71f16a644e7fa8006823c94ecc
Author: Shikhar Bhushan 
Date:   2016-08-23T23:00:10Z

KAFKA-4042: Contain connector & task start/stop failures within the Worker

Invoke the statusListener.onFailure() callback on start failures so that 
the statusBackingStore is updated. This involved a fix to the putSafe() 
functionality which prevented any update that was not preceded by a (non-safe) 
put() from completing, so here when a connector or task is transitioning 
directly to FAILED.

Worker start methods can still throw if the same connector name or task ID 
is already registered with the worker, as this condition should not happen.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4068) FileSinkTask - use JsonConverter to serialize

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4068.

Resolution: Not A Problem

I was thinking JSON since it would be easy to serialize to a human-readable 
format with that. But if we want to implement a more useful 
{{Struct.toString()}} in any case for debugging purposes, we should probably do 
that instead. Fair point about keeping the file sink connector as simple as 
possible. Closing this in favor of KAFKA-4070.

> FileSinkTask - use JsonConverter to serialize
> -
>
> Key: KAFKA-4068
> URL: https://issues.apache.org/jira/browse/KAFKA-4068
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>Priority: Minor
>
> People new to Connect often try out hooking up e.g. a Kafka topic with Avro 
> data to the file sink connector, only to find the file contain values like:
> {noformat}
> org.apache.kafka.connect.data.Struct@ca1bf85a
> org.apache.kafka.connect.data.Struct@c298db6a
> org.apache.kafka.connect.data.Struct@44108fbd
> {noformat}
> This is because currently the {{FileSinkConnector}} is meant as a toy example 
> that expects the schema to be {{Schema.STRING_SCHEMA}}, though it just 
> {{toString()}}'s the value without verifying that. 
> A better experience would probably be if we used 
> {{JsonConverter.fromConnectData()}} for serializing to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4070:
---
Description: Logging of {{Struct}}'s does not currently provide any useful 
output, and users also find it unhelpful e.g. when hooking up a Kafka topic 
with Avro data with the {{FileSinkConnector}} which simply {{toString()}}'s the 
values to the file.  (was: Logging of {{Struct}}'s does not currently provide 
any useful output, and users also find it unhelpful e.g. when hooking up a 
Kafka topic with Avro data with the {{FileSinkConnector}} which simply 
{{toString()}}s the values to the file.)

> Implement a useful Struct.toString()
> 
>
> Key: KAFKA-4070
> URL: https://issues.apache.org/jira/browse/KAFKA-4070
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Priority: Minor
>
> Logging of {{Struct}}'s does not currently provide any useful output, and 
> users also find it unhelpful e.g. when hooking up a Kafka topic with Avro 
> data with the {{FileSinkConnector}} which simply {{toString()}}'s the values 
> to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4070:
---
Description: Logging of {{Struct}}'s does not currently provide any useful 
output, and users also find it unhelpful e.g. when hooking up a Kafka topic 
with Avro data with the {{FileSinkConnector}} which simply {{toString()}}s the 
values to the file.  (was: Logging of {{Struct}}s does not currently provide 
any useful output, and users also find it unhelpful e.g. when hooking up a 
Kafka topic with Avro data with the {{FileSinkConnector}} which simply 
{{toString()}}s the values to the file.)

> Implement a useful Struct.toString()
> 
>
> Key: KAFKA-4070
> URL: https://issues.apache.org/jira/browse/KAFKA-4070
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Priority: Minor
>
> Logging of {{Struct}}'s does not currently provide any useful output, and 
> users also find it unhelpful e.g. when hooking up a Kafka topic with Avro 
> data with the {{FileSinkConnector}} which simply {{toString()}}s the values 
> to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4070:
--

 Summary: Implement a useful Struct.toString()
 Key: KAFKA-4070
 URL: https://issues.apache.org/jira/browse/KAFKA-4070
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Priority: Minor


Logging of {{Struct}}s does not currently provide any useful output, and users 
also find it unhelpful e.g. when hooking up a Kafka topic with Avro data with 
the {{FileSinkConnector}} which simply {{toString()}}s the values to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4068) FileSinkTask - use JsonConverter to serialize

2016-08-19 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4068:
--

 Summary: FileSinkTask - use JsonConverter to serialize
 Key: KAFKA-4068
 URL: https://issues.apache.org/jira/browse/KAFKA-4068
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan
Priority: Minor


People new to Connect often try out hooking up e.g. a Kafka topic with Avro 
data to the file sink connector, only to find the file contain values like:

{noformat}
org.apache.kafka.connect.data.Struct@ca1bf85a
org.apache.kafka.connect.data.Struct@c298db6a
org.apache.kafka.connect.data.Struct@44108fbd
{noformat}

This is because currently the {{FileSinkConnector}} is meant as a toy example 
that expects the schema to be {{Schema.STRING_SCHEMA}}, though it just 
{{toString()}}'s the value without verifying that. 

A better experience would probably be if we used 
{{JsonConverter.fromConnectData()}} for serializing to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-19 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Fix Version/s: 0.10.1.0

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.1.0
>
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-18 Thread shikhar
GitHub user shikhar reopened a pull request:

https://github.com/apache/kafka/pull/1745

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`, or any other exception arising 
from the constructor of the connector or task class when we `newInstance()`), 
so add catch blocks around those calls from the `DistributedHerder` and handle 
by invoking `onFailure()` which updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 4bb02e610b01d7b425f5c39b435d4d7484b89ee9
Author: Shikhar Bhushan 
Date:   2016-08-17T23:29:30Z

KAFKA-4042: prevent `DistributedHerder` thread from dying from 
connector/task lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`), so add catch blocks around those 
calls from the `DistributedHerder` and handle by invoking `onFailure()` which 
updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-18 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-3054.

Resolution: Done

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427362#comment-15427362
 ] 

Shikhar Bhushan commented on KAFKA-3054:


Addressing this in KAFKA-4042, which should take care of remaining robustness 
issues in the {{DistributedHerder}} from bad connector or task configs.

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Description: We only allow for handling {{RetriableException}} from calls 
to {{SinkTask.put()}}, but this is something we should support also for 
{{flush()}}  and arguably also {{open()}}.  (was: We only allow for handling 
{{RetriableException}} from calls to {{SinkTask.put()}}, but this is something 
we should support also for {{flush()}}  and arguably also {{open()}}.

We don't have support for {{RetriableException}} with {{SourceTask}}.)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Summary: Connect does not support RetriableException consistently for sinks 
 (was: Connect does not support RetriableException consistently for sources & 
sinks)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.
> We don't have support for {{RetriableException}} with {{SourceTask}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >