[jira] [Commented] (STORM-3060) Configuration mapping between storm-kafka & storm-kafka-client

2018-05-07 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466630#comment-16466630
 ] 

Srishty Agrawal commented on STORM-3060:


Created [a GitHub 
gist|https://gist.github.com/srishtyagrawal/850b0c3f661cf3c620c27f314791224b] 
with all the information I had about this configuration map and it was reviewed 
by [~Srdo]. The changes were then tracked in 
https://github.com/apache/storm/pull/2637.

> Configuration mapping between storm-kafka & storm-kafka-client
> --
>
> Key: STORM-3060
> URL: https://issues.apache.org/jira/browse/STORM-3060
> Project: Apache Storm
>  Issue Type: Documentation
>  Components: storm-kafka, storm-kafka-client
>Reporter: Srishty Agrawal
>Assignee: Srishty Agrawal
>Priority: Minor
>  Labels: storm-kafka-client
>
> A document which contains mapping of configurations from {{storm-kafka 
> SpoutConfig}} to {{storm-kafka-client KafkaSpoutConfig}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3060) Configuration mapping between storm-kafka & storm-kafka-client

2018-05-07 Thread Srishty Agrawal (JIRA)
Srishty Agrawal created STORM-3060:
--

 Summary: Configuration mapping between storm-kafka & 
storm-kafka-client
 Key: STORM-3060
 URL: https://issues.apache.org/jira/browse/STORM-3060
 Project: Apache Storm
  Issue Type: Documentation
  Components: storm-kafka, storm-kafka-client
Reporter: Srishty Agrawal
Assignee: Srishty Agrawal


A document which contains mapping of configurations from {{storm-kafka 
SpoutConfig}} to {{storm-kafka-client KafkaSpoutConfig}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2018-04-30 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458903#comment-16458903
 ] 

Srishty Agrawal edited comment on STORM-2514 at 4/30/18 7:17 PM:
-

[~Srdo] Sorry, I had forgotten about this ticket altogether. I see that the PR 
that you had linked earlier has been fixed. I can keep this in my `To Do list` 
and will update it when I start working on it again. 


was (Author: srishtyagraw...@gmail.com):
[~Srdo] Sorry, I had forgotten about this ticket altogether. I can keep it in 
my `To Do list`, will update it when I start working on it again.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
>Assignee: Hugo Louro
>Priority: Major
> Attachments: NewClass.java, worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2018-04-30 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458903#comment-16458903
 ] 

Srishty Agrawal commented on STORM-2514:


[~Srdo] Sorry, I had forgotten about this ticket altogether. I can keep it in 
my `To Do list`, will update it when I start working on it again.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
>Assignee: Hugo Louro
>Priority: Major
> Attachments: NewClass.java, worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2877) Introduce an option to configure pagination in Storm UI

2018-01-05 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313708#comment-16313708
 ] 

Srishty Agrawal commented on STORM-2877:


Link to PR : https://github.com/apache/storm/pull/2505

> Introduce an option to configure pagination in Storm UI 
> 
>
> Key: STORM-2877
> URL: https://issues.apache.org/jira/browse/STORM-2877
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-ui
>Affects Versions: 1.x
>Reporter: Srishty Agrawal
>Assignee: Srishty Agrawal
>Priority: Minor
>  Labels: pull-request-available, storm
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current pagination default value for Storm UI is hard-coded to be 20. 
> Pagination has been introduced in Storm 1.x. Having 20 items in the list 
> restricts searching through the page. It will be beneficial to have a 
> configuration option, say {{ui.pagination}}, which will set the default 
> pagination value when Storm UI loads. This option can be added to 
> {{storm.yaml}} along with other configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2877) Introduce an option to configure pagination in Storm UI

2018-01-02 Thread Srishty Agrawal (JIRA)
Srishty Agrawal created STORM-2877:
--

 Summary: Introduce an option to configure pagination in Storm UI 
 Key: STORM-2877
 URL: https://issues.apache.org/jira/browse/STORM-2877
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-ui
Affects Versions: 1.x
Reporter: Srishty Agrawal
Assignee: Srishty Agrawal
Priority: Minor


The current pagination default value for Storm UI is hard-coded to be 20. 
Pagination has been introduced in Storm 1.x. Having 20 items in the list 
restricts searching through the page. It will be beneficial to have a 
configuration option, say {{ui.pagination}}, which will set the default 
pagination value when Storm UI loads. This option can be added to 
{{storm.yaml}} along with other configurations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (STORM-2153) New Metrics Reporting API

2017-06-20 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219
 ] 

Srishty Agrawal edited comment on STORM-2153 at 6/20/17 11:25 PM:
--

We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1+}} and 
were wondering if there are chances of new metrics framework being backported 
to {{v1.1.x}} in the future?


was (Author: srishtyagraw...@gmail.com):
We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and 
were wondering if there are chances of new metrics framework being backported 
to {{v1.1.x}} in the future?

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Perform

[jira] [Comment Edited] (STORM-2153) New Metrics Reporting API

2017-06-20 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219
 ] 

Srishty Agrawal edited comment on STORM-2153 at 6/20/17 11:23 PM:
--

We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and 
were wondering if there are chances of new metrics framework being backported 
to {{v1.1.x}} in the future?


was (Author: srishtyagraw...@gmail.com):
We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and 
were wondering if there are chances of new metrics framework being backported 
to {{v1.1.0}} in the future?

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Perfor

[jira] [Commented] (STORM-2153) New Metrics Reporting API

2017-06-20 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219
 ] 

Srishty Agrawal commented on STORM-2153:


We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and 
were wondering if there are chances of new metrics framework being backported 
to {{v1.1.0}} in the future?

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Performance Implications
> [TBD]
> h2. Metrics Namespaces
> [TBD]
> h2. Metrics Collected
> *Worker*
> || Namespace || Metric Type || Description ||
> *Nimbus*
> || Namespace || Metric Type || Description ||
> *Supervisor*
> || Namespace || Metric Type || Description ||
> h2. User-Define

[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-06-05 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037423#comment-16037423
 ] 

Srishty Agrawal commented on STORM-2514:


Sorry [~Srdo] I completely missed your comment. No, I have not been able to 
dedicate enough time to test the ManualPartitioner, will update the ticket with 
results when I am able to get it working.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
>Assignee: Hugo Louro
> Attachments: NewClass.java, worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition

2017-05-24 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2530:
---
Description: Include taskID in logs generated by trident Kafka Spout. This 
is to maintain consistency between the plain Kafka Spout and trident Kafka 
Spout. Refer to [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506] 
for detailed information about exact log lines where taskID field was added.   
(was: Include taskID in logs generated by trident Kafka Spout. This is to 
maintain consistency between the plain Kafka Spout and trident Kafka Spout. )

> Make trident Kafka Spout log its assigned partition 
> 
>
> Key: STORM-2530
> URL: https://issues.apache.org/jira/browse/STORM-2530
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client, trident
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log, trident
>
> Include taskID in logs generated by trident Kafka Spout. This is to maintain 
> consistency between the plain Kafka Spout and trident Kafka Spout. Refer to 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506] for detailed 
> information about exact log lines where taskID field was added. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition

2017-05-24 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2530:
---
 Labels: information log trident  (was: information log)
Component/s: trident

> Make trident Kafka Spout log its assigned partition 
> 
>
> Key: STORM-2530
> URL: https://issues.apache.org/jira/browse/STORM-2530
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client, trident
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log, trident
>
> Include taskID in logs generated by trident Kafka Spout. This is to maintain 
> consistency between the plain Kafka Spout and trident Kafka Spout. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition

2017-05-24 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2530:
---
Description: Include taskID in logs generated by trident Kafka Spout. This 
is to maintain consistency between the plain Kafka Spout and trident Kafka 
Spout.   (was: On using storm-kafka KafkaSpout, the generated worker logs have 
information about the task index rather than task ID while printing the 
assigned kafka partitions. It is difficult to fetch the executor ID reading a 
particular kafka partition if only task index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.
The executor IDs as indicated by storm-UI were (5 12) and (13 20). 

This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor ID and the kafka parititions. The task ID is still not printed in the 
log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for this topology as well.

)

> Make trident Kafka Spout log its assigned partition 
> 
>
> Key: STORM-2530
> URL: https://issues.apache.org/jira/browse/STORM-2530
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client, trident
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log, trident
>
> Include taskID in logs generated by trident Kafka Spout. This is to maintain 
> consistency between the plain Kafka Spout and trident Kafka Spout. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2530) Make trident Kafka Spout log its assigned partition

2017-05-24 Thread Srishty Agrawal (JIRA)
Srishty Agrawal created STORM-2530:
--

 Summary: Make trident Kafka Spout log its assigned partition 
 Key: STORM-2530
 URL: https://issues.apache.org/jira/browse/STORM-2530
 Project: Apache Storm
  Issue Type: Task
  Components: storm-kafka, storm-kafka-client
Reporter: Srishty Agrawal
Priority: Minor


On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only task index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.
The executor IDs as indicated by storm-UI were (5 12) and (13 20). 

This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor ID and the kafka parititions. The task ID is still not printed in the 
log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for this topology as well.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-19 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018177#comment-16018177
 ] 

Srishty Agrawal commented on STORM-2514:


Thanks for the explanation [~Srdo]. Yes, I agree that any combination of 
partition and offset number should refer to the same message.

I am still struggling to get the example working with the 
ManualPartitioner(using RoundRobinManualPartitioner). I will be posting the 
results soon.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
> Attachments: NewClass.java, worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013262#comment-16013262
 ] 

Srishty Agrawal commented on STORM-2514:


[~Srdo] the logs are as expected if the number of tasks are equivalent to 
number of executors.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
> Attachments: worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239
 ] 

Srishty Agrawal edited comment on STORM-2514 at 5/16/17 11:08 PM:
--

Thanks for the clarification. Please find answers to your questions inline:

*How did you get the task lists you posted?*
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}
The above table has been derived from the worker logs. For instance, if you 
search for {{kafkaspout 3}}, which means Task ID 3, you will find results for 
messages which are being read from partitions 0, 1, 2 and 3. This table seems 
to convey that {noformat}  5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 
and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat}
If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 
10, then only these tasks should be reading from partitions 4, 5, 6 or 7.

*I don't think it's exactly right that all the tasks in executor1 (3 10) will 
be reading from topics 4,6,5,7.*
Sorry for the confusion, I meant only tasks in Executor1 should read from the 
partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I 
agree that there is a possibility that not all the tasks will end up reading 
from partitions 4, 5, 6 and 7.

*Each task has its own spout instance (and corresponding KafkaConsumer), so 
only task 10 is actually assigned those partitions at the start of the log. I'm 
not sure how task 3 is even emitting anything?*
Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to 
read from partitions 4, 5, 6 and 7. If there are more tasks than the number of 
partitions there should be one to one mapping between tasks and partitions is 
what I remember reading in the docs. Hence it is not surprising for me that 
task 3 is reading from a Kafka partition, although it does not seem to read 
from the assigned Kafka partition (according to the {{Setting newly assigned}} 
log line ).

*I also noticed that many tasks are emitting what appears to be the same 
message?*
{noformat}
e.g.
2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1]
{noformat}
Are you deriving this information from the fact that all of the above messages 
have the same message ID (0) ? 

*Is the start of the log the first occurrence of partition assignments in the 
log?*
No, I have randomly taken a segment from the worker logs. I am attaching the 
full worker.log file in this ticket.

*Are you using automatic or manual subscription to Kafka?*
I did not understand the question. I am running Kafka on my local machine and 
using a KafkaSpout to read a topic from this local instance of Kafka. 


was (Author: srishtyagraw...@gmail.com):
Thanks for the clarification. Please find answers to your questions inline:

*How did you get the task lists you posted?*
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}
The above table has been derived from the worker logs. For instance, if you 
search for {{kafkaspout 3}}, which means Task ID 3, you will find results for 
messages which are being read from partitions 0, 1, 2 and 3. This table seems 
to convey that {noformat}  5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 
and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat}
If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 
10, then only these tasks should be reading from partitions 4, 5, 6 or 7.

*I don't think it's exactly right that all the tasks in executor1 (3 10) will 
be reading from topics 4,6,5,7. *
Sorry for the confusion, I meant only tasks in Executor1 should read from the 
partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I 
agree that there is a possibility that not all the tasks will end up reading 
from partitions 4, 5, 6 and 7.

*Each task has its own spout instance (and corresponding KafkaConsumer), so 
only task 10 is actually assigned those partitions at the start of the log. I'm 
not sure how task 3 is even emitting anything?*
Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to 
read from partitions 4, 5, 6 and 7. If there are more tasks than

[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239
 ] 

Srishty Agrawal edited comment on STORM-2514 at 5/16/17 11:08 PM:
--

Thanks for the clarification. Please find answers to your questions inline:

*How did you get the task lists you posted?*
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}
The above table has been derived from the worker logs. For instance, if you 
search for {{kafkaspout 3}}, which means Task ID 3, you will find results for 
messages which are being read from partitions 0, 1, 2 and 3. This table seems 
to convey that {noformat}  5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 
and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat}
If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 
10, then only these tasks should be reading from partitions 4, 5, 6 or 7.

*I don't think it's exactly right that all the tasks in executor1 (3 10) will 
be reading from topics 4,6,5,7. *
Sorry for the confusion, I meant only tasks in Executor1 should read from the 
partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I 
agree that there is a possibility that not all the tasks will end up reading 
from partitions 4, 5, 6 and 7.

*Each task has its own spout instance (and corresponding KafkaConsumer), so 
only task 10 is actually assigned those partitions at the start of the log. I'm 
not sure how task 3 is even emitting anything?*
Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to 
read from partitions 4, 5, 6 and 7. If there are more tasks than the number of 
partitions there should be one to one mapping between tasks and partitions is 
what I remember reading in the docs. Hence it is not surprising for me that 
task 3 is reading from a Kafka partition, although it does not seem to read 
from the assigned Kafka partition (according to the {{Setting newly assigned}} 
log line ).

*I also noticed that many tasks are emitting what appears to be the same 
message?*
{noformat}
e.g.
2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1]
{noformat}
Are you deriving this information from the fact that all of the above messages 
have the same message ID (0) ? 

*Is the start of the log the first occurrence of partition assignments in the 
log?*
No, I have randomly taken a segment from the worker logs. I am attaching the 
full worker.log file in this ticket.

*Are you using automatic or manual subscription to Kafka?*
I did not understand the question. I am running Kafka on my local machine and 
using a KafkaSpout to read a topic from this local instance of Kafka. 


was (Author: srishtyagraw...@gmail.com):
Thanks for the clarification. Please find answers to your questions inline:

*How did you get the task lists you posted?*
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}
The above table has been derived from the worker logs. For instance, if you 
search for {{kafkaspout 3}}, which means Task ID 3, you will find results for 
messages which are being read from partitions 0, 1, 2 and 3. This table seems 
to convey that {noformat}  5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 
and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat}
If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 
10, then only these tasks should be reading from partitions 0, 1, 2 or 3.

*I don't think it's exactly right that all the tasks in executor1 (3 10) will 
be reading from topics 4,6,5,7. *
Sorry for the confusion, I meant only tasks in Executor1 should read from the 
partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I 
agree that there is a possibility that not all the tasks will end up reading 
from partitions 4, 5, 6 and 7.

*Each task has its own spout instance (and corresponding KafkaConsumer), so 
only task 10 is actually assigned those partitions at the start of the log. I'm 
not sure how task 3 is even emitting anything?*
Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to 
read from partitions 4, 5, 6 and 7. If there are more tasks tha

[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2514:
---
Attachment: worker.log

Attaching the original worker.log file.

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
> Attachments: worker.log
>
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 
> (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks 
> in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. 
> These log lines are being printed by Tasks with IDs 10 and 15 in respective 
> executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 (added code, as a part of 
> debugging STORM-2506, to print the Task ID right next to component ID) which 
> runs on Executor1 reads from partition 2 (the second value inside the square 
> brackets), instead of 4, 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239
 ] 

Srishty Agrawal commented on STORM-2514:


Thanks for the clarification. Please find answers to your questions inline:

*How did you get the task lists you posted?*
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}
The above table has been derived from the worker logs. For instance, if you 
search for {{kafkaspout 3}}, which means Task ID 3, you will find results for 
messages which are being read from partitions 0, 1, 2 and 3. This table seems 
to convey that {noformat}  5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 
and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat}
If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 
10, then only these tasks should be reading from partitions 0, 1, 2 or 3.

*I don't think it's exactly right that all the tasks in executor1 (3 10) will 
be reading from topics 4,6,5,7. *
Sorry for the confusion, I meant only tasks in Executor1 should read from the 
partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I 
agree that there is a possibility that not all the tasks will end up reading 
from partitions 4, 5, 6 and 7.

*Each task has its own spout instance (and corresponding KafkaConsumer), so 
only task 10 is actually assigned those partitions at the start of the log. I'm 
not sure how task 3 is even emitting anything?*
Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to 
read from partitions 4, 5, 6 and 7. If there are more tasks than the number of 
partitions there should be one to one mapping between tasks and partitions is 
what I remember reading in the docs. Hence it is not surprising for me that 
task 3 is reading from a Kafka partition, although it does not seem to read 
from the assigned Kafka partition (according to the {{Setting newly assigned}} 
log line ).

*I also noticed that many tasks are emitting what appears to be the same 
message?*
{noformat}
e.g.
2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] 
Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1]
2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] 
Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1]
{noformat}
Are you deriving this information from the fact that all of the above messages 
have the same message ID (0) ? 

*Is the start of the log the first occurrence of partition assignments in the 
log?*
No, I have randomly taken a segment from the worker logs. I am attaching the 
full worker.log file in this ticket.

*Are you using automatic or manual subscription to Kafka?*
I did not understand the question. I am running Kafka on my local machine and 
using a KafkaSpout to read a topic from this local instance of Kafka. 

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment

[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-16 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2514:
---
Description: 
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 (3 
10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in 
Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These 
log lines are being printed by Tasks with IDs 10 and 15 in respective 
executors. 

Logs which emit individual messages do not abide by the above assumption. For 
example in the log mentioned below, Task ID 3 (added code, as a part of 
debugging STORM-2506, to print the Task ID right next to component ID) which 
runs on Executor1 reads from partition 2 (the second value inside the square 
brackets), instead of 4, 5, 6 or 7. 

{noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
default [8topic, 2, 0, null, 1]{noformat}

This behavior has been summarized in the table below : 
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}

[You can find the relevant parts of log file 
here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] 

Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} 
correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
executor2? Are (3 10) not the starting and ending task IDs in Executor1? 

Another interesting thing to note is that, Task IDs 10 and 15 are always 
reading from the partitions they claimed to be reading from (while setting 
partition assignments). 

If my assumptions are correct, there is a bug in the way the mapping 
information is being/passed to worker logs. If not, we need to make changes in 
our docs.

  was:
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being print

[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-15 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2514:
---
Description: 
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 in respective executors. 

Logs which emit individual messages do not abide by the above assumption. For 
example in the log mentioned below, Task ID 3 (added code, as a part of 
debugging STORM-2506, to print the Task ID right next to component ID) which 
runs on Executor1 reads from partition 2 (the second value inside the square 
brackets), instead of 4, 5, 6 or 7. 

{noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
default [8topic, 2, 0, null, 1]{noformat}

This behavior has been summarized in the table below : 
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}

[You can find the relevant parts of log file 
here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] 

Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} 
correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
executor2? Are (3 10) not the starting and ending task IDs in Executor1? 

Another interesting thing to note is that, Task IDs 10 and 15 are always 
reading from the partitions they claimed to be reading from (while setting 
partition assignments). 

If my assumptions are correct, there is a bug in the way the mapping 
information is being/passed to worker logs. If not, we need to make changes in 
our docs.

  was:
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 

[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-15 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2514:
---
Description: 
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 in respective executors. 

Logs which emit individual messages do not abide by the above assumption. For 
example in the log mentioned below, Task ID 3 (added code, as a part of 
debugging STOMR-2506, to print the Task ID right next to component ID) which 
runs on Executor1 reads from partition 2 (the second value inside the square 
brackets), instead of 4, 5, 6 or 7. 

{noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
default [8topic, 2, 0, null, 1]{noformat}

This behavior has been summarized in the table below : 
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}

[You can find the relevant parts of log file 
here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] 

Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} 
correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
executor2? Are (3 10) not the starting and ending task IDs in Executor1? 

Another interesting thing to note is that, Task IDs 10 and 15 are always 
reading from the partitions they claimed to be reading from (while setting 
partition assignments). 

If my assumptions are correct, there is a bug in the way the mapping 
information is being/passed to worker logs. If not, we need to make changes in 
our docs.

  was:
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 

[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-15 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2514:
---
Description: 
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 in respective executors. 

Logs which emit individual messages do not abide by the above assumption. For 
example in the log mentioned below, Task ID 3 (added code, as a part of 
debugging, to print the Task ID right next to component ID) which runs on 
Executor1 reads from partition 2 (the second value inside the square brackets), 
instead of 4, 5, 6 or 7. 

{noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
default [8topic, 2, 0, null, 1]{noformat}

This behavior has been summarized in the table below : 
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}

[You can find the relevant parts of log file 
here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] 

Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} 
correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
executor2? Are (3 10) not the starting and ending task IDs in Executor1? 

Another interesting thing to note is that, Task IDs 10 and 15 are always 
reading from the partitions they claimed to be reading from (while setting 
partition assignments). 

If my assumptions are correct, there is a bug in the way the mapping 
information is being/passed to worker logs. If not, we need to make changes in 
our docs.

  was:
While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 in respecti

[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-15 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011341#comment-16011341
 ] 

Srishty Agrawal commented on STORM-2514:


[~erikdw], [~Srdo], [~revans2], [~harshach] kindly provide your insight on this 
issue. 

> Incorrect logs for mapping between Kafka partitions and task IDs
> 
>
> Key: STORM-2514
> URL: https://issues.apache.org/jira/browse/STORM-2514
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Reporter: Srishty Agrawal
>
> While working on 
> [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker 
> logs were generated with debug mode on. The information printed about mapping 
> between Task IDs and kafka partitions was contradictory to my assumptions. I 
> ran a topology which used KafkaSpout from the storm-kafka-client module, it 
> had a parallelism hint of 2 (number of executors) and a total of 16 tasks. 
> The log lines mentioned below show assigned mapping between executors and 
> kafka partitions:
> {noformat}
> o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
> Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
> reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
> topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
> o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
> Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] 
> for group kafkaSpoutTestGroup
> o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
> reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
> consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
> topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
> {noformat}
> It is evident that all the tasks in Executor1 (3 10) will be reading from 
> kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be 
> reading from kafka partitions 0, 1, 2 and 3. These log lines are being 
> printed by Tasks with IDs 10 and 15 in respective executors. 
> Logs which emit individual messages do not abide by the above assumption. For 
> example in the log mentioned below, Task ID 3 which runs on Executor1 reads 
> from partition 2 (the second value inside the square brackets), instead of 4, 
> 5, 6 or 7. 
> {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
> default [8topic, 2, 0, null, 1]{noformat}
> This behavior has been summarized in the table below : 
> {noformat}
> Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
> Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
> {noformat}
> [You can find the relevant parts of log file 
> here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351]
>  
> Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 
> 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
> executor2? Are (3 10) not the starting and ending task IDs in Executor1? 
> Another interesting thing to note is that, Task IDs 10 and 15 are always 
> reading from the partitions they claimed to be reading from (while setting 
> partition assignments). 
> If my assumptions are correct, there is a bug in the way the mapping 
> information is being/passed to worker logs. If not, we need to make changes 
> in our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs

2017-05-15 Thread Srishty Agrawal (JIRA)
Srishty Agrawal created STORM-2514:
--

 Summary: Incorrect logs for mapping between Kafka partitions and 
task IDs
 Key: STORM-2514
 URL: https://issues.apache.org/jira/browse/STORM-2514
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-kafka-client
Reporter: Srishty Agrawal


While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], 
the worker logs were generated with debug mode on. The information printed 
about mapping between Task IDs and kafka partitions was contradictory to my 
assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client 
module, it had a parallelism hint of 2 (number of executors) and a total of 16 
tasks. 
The log lines mentioned below show assigned mapping between executors and kafka 
partitions:
{noformat}
o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] 
Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions 
reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, 
topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]]
o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] 
Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for 
group kafkaSpoutTestGroup
o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions 
reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, 
consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, 
topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]]
{noformat}

It is evident that all the tasks in Executor1 (3 10) will be reading from kafka 
partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading 
from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks 
with IDs 10 and 15 in respective executors. 

Logs which emit individual messages do not abide by the above assumption. For 
example in the log mentioned below, Task ID 3 which runs on Executor1 reads 
from partition 2 (the second value inside the square brackets), instead of 4, 
5, 6 or 7. 

{noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 
default [8topic, 2, 0, null, 1]{noformat}

This behavior has been summarized in the table below : 
{noformat}
Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18  Partitions 0, 1, 2, 3
Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7
{noformat}

[You can find the relevant parts of log file 
here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] 

Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} 
correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to 
executor2? Are (3 10) not the starting and ending task IDs in Executor1? 

Another interesting thing to note is that, Task IDs 10 and 15 are always 
reading from the partitions they claimed to be reading from (while setting 
partition assignments). 

If my assumptions are correct, there is a bug in the way the mapping 
information is being/passed to worker logs. If not, we need to make changes in 
our docs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2506) Make Kafka Spout log its assigned partition

2017-05-11 Thread Srishty Agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007350#comment-16007350
 ] 

Srishty Agrawal commented on STORM-2506:


[Link to the pull request|https://github.com/apache/storm/pull/2109]

> Make Kafka Spout log its assigned partition
> ---
>
> Key: STORM-2506
> URL: https://issues.apache.org/jira/browse/STORM-2506
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On using storm-kafka KafkaSpout, the generated worker logs have information 
> about the task index rather than task ID while printing the assigned kafka 
> partitions. It is difficult to fetch the executor ID reading a particular 
> kafka partition if only task index is present in the logs.
> [Example from generated worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
>  There were a total of 16 tasks and 2 executors assigned for this topology.
> The executor IDs as indicated by storm-UI were (5 12) and (13 20). 
> This problem has been eliminated in the KafkaSpout from storm-kafka-client 
> module. Logs generated with the new KafkaSpout print the mapping between 
> executor ID and the kafka parititions. The task ID is still not printed in 
> the log line which prints this mapping. 
> [Example from worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
>  There were a total of 16 tasks and 2 executors for this topology as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2506) Make Kafka Spout log its assigned partition

2017-05-09 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2506:
---
Description: 
On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only task index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.
The executor IDs as indicated by storm-UI were (5 12) and (13 20). 

This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor ID and the kafka parititions. The task ID is still not printed in the 
log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for this topology as well.



  was:
On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only task index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.
The executor IDs as indicated by storm-UI were (5 12) and (13 20). 

This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor ID running KafkaSpout and the kafka parititions. The task ID is still 
not printed in the log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for this topology as well.




> Make Kafka Spout log its assigned partition
> ---
>
> Key: STORM-2506
> URL: https://issues.apache.org/jira/browse/STORM-2506
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log
>
> On using storm-kafka KafkaSpout, the generated worker logs have information 
> about the task index rather than task ID while printing the assigned kafka 
> partitions. It is difficult to fetch the executor ID reading a particular 
> kafka partition if only task index is present in the logs.
> [Example from generated worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
>  There were a total of 16 tasks and 2 executors assigned for this topology.
> The executor IDs as indicated by storm-UI were (5 12) and (13 20). 
> This problem has been eliminated in the KafkaSpout from storm-kafka-client 
> module. Logs generated with the new KafkaSpout print the mapping between 
> executor ID and the kafka parititions. The task ID is still not printed in 
> the log line which prints this mapping. 
> [Example from worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
>  There were a total of 16 tasks and 2 executors for this topology as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2506) Make Kafka Spout log its assigned partition

2017-05-09 Thread Srishty Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srishty Agrawal updated STORM-2506:
---
Description: 
On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only task index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.
The executor IDs as indicated by storm-UI were (5 12) and (13 20). 

This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor ID running KafkaSpout and the kafka parititions. The task ID is still 
not printed in the log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for this topology as well.



  was:
On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only Task Index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.

The executor IDs as indicated by storm-UI were (5 12) and (13 20). 


This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor running KafkaSpout and the kafka parititions it is assigned to read. 
The task ID is still not printed in the log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for the below run:




> Make Kafka Spout log its assigned partition
> ---
>
> Key: STORM-2506
> URL: https://issues.apache.org/jira/browse/STORM-2506
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-kafka, storm-kafka-client
>Reporter: Srishty Agrawal
>Priority: Minor
>  Labels: information, log
>
> On using storm-kafka KafkaSpout, the generated worker logs have information 
> about the task index rather than task ID while printing the assigned kafka 
> partitions. It is difficult to fetch the executor ID reading a particular 
> kafka partition if only task index is present in the logs.
> [Example from generated worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
>  There were a total of 16 tasks and 2 executors assigned for this topology.
> The executor IDs as indicated by storm-UI were (5 12) and (13 20). 
> This problem has been eliminated in the KafkaSpout from storm-kafka-client 
> module. Logs generated with the new KafkaSpout print the mapping between 
> executor ID running KafkaSpout and the kafka parititions. The task ID is 
> still not printed in the log line which prints this mapping. 
> [Example from worker.log while using storm-kafka module's 
> KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
>  There were a total of 16 tasks and 2 executors for this topology as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2506) Make Kafka Spout log its assigned partition

2017-05-09 Thread Srishty Agrawal (JIRA)
Srishty Agrawal created STORM-2506:
--

 Summary: Make Kafka Spout log its assigned partition
 Key: STORM-2506
 URL: https://issues.apache.org/jira/browse/STORM-2506
 Project: Apache Storm
  Issue Type: Task
  Components: storm-kafka, storm-kafka-client
Reporter: Srishty Agrawal
Priority: Minor


On using storm-kafka KafkaSpout, the generated worker logs have information 
about the task index rather than task ID while printing the assigned kafka 
partitions. It is difficult to fetch the executor ID reading a particular kafka 
partition if only Task Index is present in the logs.

[Example from generated worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f].
 There were a total of 16 tasks and 2 executors assigned for this topology.

The executor IDs as indicated by storm-UI were (5 12) and (13 20). 


This problem has been eliminated in the KafkaSpout from storm-kafka-client 
module. Logs generated with the new KafkaSpout print the mapping between 
executor running KafkaSpout and the kafka parititions it is assigned to read. 
The task ID is still not printed in the log line which prints this mapping. 

[Example from worker.log while using storm-kafka module's 
KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b].
 There were a total of 16 tasks and 2 executors for the below run:





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)