[jira] [Commented] (STORM-3060) Configuration mapping between storm-kafka & storm-kafka-client
[ https://issues.apache.org/jira/browse/STORM-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466630#comment-16466630 ] Srishty Agrawal commented on STORM-3060: Created [a GitHub gist|https://gist.github.com/srishtyagrawal/850b0c3f661cf3c620c27f314791224b] with all the information I had about this configuration map and it was reviewed by [~Srdo]. The changes were then tracked in https://github.com/apache/storm/pull/2637. > Configuration mapping between storm-kafka & storm-kafka-client > -- > > Key: STORM-3060 > URL: https://issues.apache.org/jira/browse/STORM-3060 > Project: Apache Storm > Issue Type: Documentation > Components: storm-kafka, storm-kafka-client >Reporter: Srishty Agrawal >Assignee: Srishty Agrawal >Priority: Minor > Labels: storm-kafka-client > > A document which contains mapping of configurations from {{storm-kafka > SpoutConfig}} to {{storm-kafka-client KafkaSpoutConfig}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3060) Configuration mapping between storm-kafka & storm-kafka-client
Srishty Agrawal created STORM-3060: -- Summary: Configuration mapping between storm-kafka & storm-kafka-client Key: STORM-3060 URL: https://issues.apache.org/jira/browse/STORM-3060 Project: Apache Storm Issue Type: Documentation Components: storm-kafka, storm-kafka-client Reporter: Srishty Agrawal Assignee: Srishty Agrawal A document which contains mapping of configurations from {{storm-kafka SpoutConfig}} to {{storm-kafka-client KafkaSpoutConfig}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458903#comment-16458903 ] Srishty Agrawal edited comment on STORM-2514 at 4/30/18 7:17 PM: - [~Srdo] Sorry, I had forgotten about this ticket altogether. I see that the PR that you had linked earlier has been fixed. I can keep this in my `To Do list` and will update it when I start working on it again. was (Author: srishtyagraw...@gmail.com): [~Srdo] Sorry, I had forgotten about this ticket altogether. I can keep it in my `To Do list`, will update it when I start working on it again. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal >Assignee: Hugo Louro >Priority: Major > Attachments: NewClass.java, worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458903#comment-16458903 ] Srishty Agrawal commented on STORM-2514: [~Srdo] Sorry, I had forgotten about this ticket altogether. I can keep it in my `To Do list`, will update it when I start working on it again. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal >Assignee: Hugo Louro >Priority: Major > Attachments: NewClass.java, worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2877) Introduce an option to configure pagination in Storm UI
[ https://issues.apache.org/jira/browse/STORM-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313708#comment-16313708 ] Srishty Agrawal commented on STORM-2877: Link to PR : https://github.com/apache/storm/pull/2505 > Introduce an option to configure pagination in Storm UI > > > Key: STORM-2877 > URL: https://issues.apache.org/jira/browse/STORM-2877 > Project: Apache Storm > Issue Type: Improvement > Components: storm-ui >Affects Versions: 1.x >Reporter: Srishty Agrawal >Assignee: Srishty Agrawal >Priority: Minor > Labels: pull-request-available, storm > Time Spent: 10m > Remaining Estimate: 0h > > The current pagination default value for Storm UI is hard-coded to be 20. > Pagination has been introduced in Storm 1.x. Having 20 items in the list > restricts searching through the page. It will be beneficial to have a > configuration option, say {{ui.pagination}}, which will set the default > pagination value when Storm UI loads. This option can be added to > {{storm.yaml}} along with other configurations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (STORM-2877) Introduce an option to configure pagination in Storm UI
Srishty Agrawal created STORM-2877: -- Summary: Introduce an option to configure pagination in Storm UI Key: STORM-2877 URL: https://issues.apache.org/jira/browse/STORM-2877 Project: Apache Storm Issue Type: Improvement Components: storm-ui Affects Versions: 1.x Reporter: Srishty Agrawal Assignee: Srishty Agrawal Priority: Minor The current pagination default value for Storm UI is hard-coded to be 20. Pagination has been introduced in Storm 1.x. Having 20 items in the list restricts searching through the page. It will be beneficial to have a configuration option, say {{ui.pagination}}, which will set the default pagination value when Storm UI loads. This option can be added to {{storm.yaml}} along with other configurations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (STORM-2153) New Metrics Reporting API
[ https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219 ] Srishty Agrawal edited comment on STORM-2153 at 6/20/17 11:25 PM: -- We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1+}} and were wondering if there are chances of new metrics framework being backported to {{v1.1.x}} in the future? was (Author: srishtyagraw...@gmail.com): We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and were wondering if there are chances of new metrics framework being backported to {{v1.1.x}} in the future? > New Metrics Reporting API > - > > Key: STORM-2153 > URL: https://issues.apache.org/jira/browse/STORM-2153 > Project: Apache Storm > Issue Type: Improvement >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz > > This is a proposal to provide a new metrics reporting API based on [Coda > Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA > Dropwizard/Yammer metrics). > h2. Background > In a [discussion on the dev@ mailing list | > http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e] > a number of community and PMC members recommended replacing Storm’s metrics > system with a new API as opposed to enhancing the existing metrics system. > Some of the objections to the existing metrics API include: > # Metrics are reported as an untyped Java object, making it very difficult to > reason about how to report it (e.g. is it a gauge, a counter, etc.?) > # It is difficult to determine if metrics coming into the consumer are > pre-aggregated or not. > # Storm’s metrics collection occurs through a specialized bolt, which in > addition to potentially affecting system performance, complicates certain > types of aggregation when the parallelism of that bolt is greater than one. > In the discussion on the developer mailing list, there is growing consensus > for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics > library. This approach has the following benefits: > # Coda Hale’s metrics library is very stable, performant, well thought out, > and widely adopted among open source projects (e.g. Kafka). > # The metrics library provides many existing metric types: Meters, Gauges, > Counters, Histograms, and more. > # The library has a pluggable “reporter” API for publishing metrics to > various systems, with existing implementations for: JMX, console, CSV, SLF4J, > Graphite, Ganglia. > # Reporters are straightforward to implement, and can be reused by any > project that uses the metrics library (i.e. would have broader application > outside of Storm) > As noted earlier, the metrics library supports pluggable reporters for > sending metrics data to other systems, and implementing a reporter is fairly > straightforward (an example reporter implementation can be found here). For > example if someone develops a reporter based on Coda Hale’s metrics, it could > not only be used for pushing Storm metrics, but also for any system that used > the metrics library, such as Kafka. > h2. Scope of Effort > The effort to implement a new metrics API for Storm can be broken down into > the following development areas: > # Implement API for Storms internal worker metrics: latencies, queue sizes, > capacity, etc. > # Implement API for user defined, topology-specific metrics (exposed via the > {{org.apache.storm.task.TopologyContext}} class) > # Implement API for storm daemons: nimbus, supervisor, etc. > h2. Relationship to Existing Metrics > This would be a new API that would not affect the existing metrics API. Upon > completion, the old metrics API would presumably be deprecated, but kept in > place for backward compatibility. > Internally the current metrics API uses Storm bolts for the reporting > mechanism. The proposed metrics API would not depend on any of Storm's > messaging capabilities and instead use the [metrics library's built-in > reporter mechanism | > http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This > would allow users to use existing {{Reporter}} implementations which are not > Storm-specific, and would simplify the process of collecting metrics. > Compared to Storm's {{IMetricCollector}} interface, implementing a reporter > for the metrics library is much more straightforward (an example can be found > [here | > https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java]. > The new metrics capability would not use or affect the ZooKeeper-based > metrics used by Storm UI. > h2. Relationship to JStorm Metrics > [TBD] > h2. Target Branches > [TBD] > h2. Perform
[jira] [Comment Edited] (STORM-2153) New Metrics Reporting API
[ https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219 ] Srishty Agrawal edited comment on STORM-2153 at 6/20/17 11:23 PM: -- We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and were wondering if there are chances of new metrics framework being backported to {{v1.1.x}} in the future? was (Author: srishtyagraw...@gmail.com): We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and were wondering if there are chances of new metrics framework being backported to {{v1.1.0}} in the future? > New Metrics Reporting API > - > > Key: STORM-2153 > URL: https://issues.apache.org/jira/browse/STORM-2153 > Project: Apache Storm > Issue Type: Improvement >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz > > This is a proposal to provide a new metrics reporting API based on [Coda > Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA > Dropwizard/Yammer metrics). > h2. Background > In a [discussion on the dev@ mailing list | > http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e] > a number of community and PMC members recommended replacing Storm’s metrics > system with a new API as opposed to enhancing the existing metrics system. > Some of the objections to the existing metrics API include: > # Metrics are reported as an untyped Java object, making it very difficult to > reason about how to report it (e.g. is it a gauge, a counter, etc.?) > # It is difficult to determine if metrics coming into the consumer are > pre-aggregated or not. > # Storm’s metrics collection occurs through a specialized bolt, which in > addition to potentially affecting system performance, complicates certain > types of aggregation when the parallelism of that bolt is greater than one. > In the discussion on the developer mailing list, there is growing consensus > for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics > library. This approach has the following benefits: > # Coda Hale’s metrics library is very stable, performant, well thought out, > and widely adopted among open source projects (e.g. Kafka). > # The metrics library provides many existing metric types: Meters, Gauges, > Counters, Histograms, and more. > # The library has a pluggable “reporter” API for publishing metrics to > various systems, with existing implementations for: JMX, console, CSV, SLF4J, > Graphite, Ganglia. > # Reporters are straightforward to implement, and can be reused by any > project that uses the metrics library (i.e. would have broader application > outside of Storm) > As noted earlier, the metrics library supports pluggable reporters for > sending metrics data to other systems, and implementing a reporter is fairly > straightforward (an example reporter implementation can be found here). For > example if someone develops a reporter based on Coda Hale’s metrics, it could > not only be used for pushing Storm metrics, but also for any system that used > the metrics library, such as Kafka. > h2. Scope of Effort > The effort to implement a new metrics API for Storm can be broken down into > the following development areas: > # Implement API for Storms internal worker metrics: latencies, queue sizes, > capacity, etc. > # Implement API for user defined, topology-specific metrics (exposed via the > {{org.apache.storm.task.TopologyContext}} class) > # Implement API for storm daemons: nimbus, supervisor, etc. > h2. Relationship to Existing Metrics > This would be a new API that would not affect the existing metrics API. Upon > completion, the old metrics API would presumably be deprecated, but kept in > place for backward compatibility. > Internally the current metrics API uses Storm bolts for the reporting > mechanism. The proposed metrics API would not depend on any of Storm's > messaging capabilities and instead use the [metrics library's built-in > reporter mechanism | > http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This > would allow users to use existing {{Reporter}} implementations which are not > Storm-specific, and would simplify the process of collecting metrics. > Compared to Storm's {{IMetricCollector}} interface, implementing a reporter > for the metrics library is much more straightforward (an example can be found > [here | > https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java]. > The new metrics capability would not use or affect the ZooKeeper-based > metrics used by Storm UI. > h2. Relationship to JStorm Metrics > [TBD] > h2. Target Branches > [TBD] > h2. Perfor
[jira] [Commented] (STORM-2153) New Metrics Reporting API
[ https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056219#comment-16056219 ] Srishty Agrawal commented on STORM-2153: We are planning to upgrade the Storm version from {{v0.9.6}} to {{v1.1.0}} and were wondering if there are chances of new metrics framework being backported to {{v1.1.0}} in the future? > New Metrics Reporting API > - > > Key: STORM-2153 > URL: https://issues.apache.org/jira/browse/STORM-2153 > Project: Apache Storm > Issue Type: Improvement >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz > > This is a proposal to provide a new metrics reporting API based on [Coda > Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA > Dropwizard/Yammer metrics). > h2. Background > In a [discussion on the dev@ mailing list | > http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e] > a number of community and PMC members recommended replacing Storm’s metrics > system with a new API as opposed to enhancing the existing metrics system. > Some of the objections to the existing metrics API include: > # Metrics are reported as an untyped Java object, making it very difficult to > reason about how to report it (e.g. is it a gauge, a counter, etc.?) > # It is difficult to determine if metrics coming into the consumer are > pre-aggregated or not. > # Storm’s metrics collection occurs through a specialized bolt, which in > addition to potentially affecting system performance, complicates certain > types of aggregation when the parallelism of that bolt is greater than one. > In the discussion on the developer mailing list, there is growing consensus > for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics > library. This approach has the following benefits: > # Coda Hale’s metrics library is very stable, performant, well thought out, > and widely adopted among open source projects (e.g. Kafka). > # The metrics library provides many existing metric types: Meters, Gauges, > Counters, Histograms, and more. > # The library has a pluggable “reporter” API for publishing metrics to > various systems, with existing implementations for: JMX, console, CSV, SLF4J, > Graphite, Ganglia. > # Reporters are straightforward to implement, and can be reused by any > project that uses the metrics library (i.e. would have broader application > outside of Storm) > As noted earlier, the metrics library supports pluggable reporters for > sending metrics data to other systems, and implementing a reporter is fairly > straightforward (an example reporter implementation can be found here). For > example if someone develops a reporter based on Coda Hale’s metrics, it could > not only be used for pushing Storm metrics, but also for any system that used > the metrics library, such as Kafka. > h2. Scope of Effort > The effort to implement a new metrics API for Storm can be broken down into > the following development areas: > # Implement API for Storms internal worker metrics: latencies, queue sizes, > capacity, etc. > # Implement API for user defined, topology-specific metrics (exposed via the > {{org.apache.storm.task.TopologyContext}} class) > # Implement API for storm daemons: nimbus, supervisor, etc. > h2. Relationship to Existing Metrics > This would be a new API that would not affect the existing metrics API. Upon > completion, the old metrics API would presumably be deprecated, but kept in > place for backward compatibility. > Internally the current metrics API uses Storm bolts for the reporting > mechanism. The proposed metrics API would not depend on any of Storm's > messaging capabilities and instead use the [metrics library's built-in > reporter mechanism | > http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This > would allow users to use existing {{Reporter}} implementations which are not > Storm-specific, and would simplify the process of collecting metrics. > Compared to Storm's {{IMetricCollector}} interface, implementing a reporter > for the metrics library is much more straightforward (an example can be found > [here | > https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java]. > The new metrics capability would not use or affect the ZooKeeper-based > metrics used by Storm UI. > h2. Relationship to JStorm Metrics > [TBD] > h2. Target Branches > [TBD] > h2. Performance Implications > [TBD] > h2. Metrics Namespaces > [TBD] > h2. Metrics Collected > *Worker* > || Namespace || Metric Type || Description || > *Nimbus* > || Namespace || Metric Type || Description || > *Supervisor* > || Namespace || Metric Type || Description || > h2. User-Define
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037423#comment-16037423 ] Srishty Agrawal commented on STORM-2514: Sorry [~Srdo] I completely missed your comment. No, I have not been able to dedicate enough time to test the ManualPartitioner, will update the ticket with results when I am able to get it working. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal >Assignee: Hugo Louro > Attachments: NewClass.java, worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2530: --- Description: Include taskID in logs generated by trident Kafka Spout. This is to maintain consistency between the plain Kafka Spout and trident Kafka Spout. Refer to [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506] for detailed information about exact log lines where taskID field was added. (was: Include taskID in logs generated by trident Kafka Spout. This is to maintain consistency between the plain Kafka Spout and trident Kafka Spout. ) > Make trident Kafka Spout log its assigned partition > > > Key: STORM-2530 > URL: https://issues.apache.org/jira/browse/STORM-2530 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client, trident >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log, trident > > Include taskID in logs generated by trident Kafka Spout. This is to maintain > consistency between the plain Kafka Spout and trident Kafka Spout. Refer to > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506] for detailed > information about exact log lines where taskID field was added. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2530: --- Labels: information log trident (was: information log) Component/s: trident > Make trident Kafka Spout log its assigned partition > > > Key: STORM-2530 > URL: https://issues.apache.org/jira/browse/STORM-2530 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client, trident >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log, trident > > Include taskID in logs generated by trident Kafka Spout. This is to maintain > consistency between the plain Kafka Spout and trident Kafka Spout. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2530) Make trident Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2530: --- Description: Include taskID in logs generated by trident Kafka Spout. This is to maintain consistency between the plain Kafka Spout and trident Kafka Spout. (was: On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only task index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor ID and the kafka parititions. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for this topology as well. ) > Make trident Kafka Spout log its assigned partition > > > Key: STORM-2530 > URL: https://issues.apache.org/jira/browse/STORM-2530 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client, trident >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log, trident > > Include taskID in logs generated by trident Kafka Spout. This is to maintain > consistency between the plain Kafka Spout and trident Kafka Spout. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2530) Make trident Kafka Spout log its assigned partition
Srishty Agrawal created STORM-2530: -- Summary: Make trident Kafka Spout log its assigned partition Key: STORM-2530 URL: https://issues.apache.org/jira/browse/STORM-2530 Project: Apache Storm Issue Type: Task Components: storm-kafka, storm-kafka-client Reporter: Srishty Agrawal Priority: Minor On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only task index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor ID and the kafka parititions. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for this topology as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018177#comment-16018177 ] Srishty Agrawal commented on STORM-2514: Thanks for the explanation [~Srdo]. Yes, I agree that any combination of partition and offset number should refer to the same message. I am still struggling to get the example working with the ManualPartitioner(using RoundRobinManualPartitioner). I will be posting the results soon. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal > Attachments: NewClass.java, worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013262#comment-16013262 ] Srishty Agrawal commented on STORM-2514: [~Srdo] the logs are as expected if the number of tasks are equivalent to number of executors. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal > Attachments: worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239 ] Srishty Agrawal edited comment on STORM-2514 at 5/16/17 11:08 PM: -- Thanks for the clarification. Please find answers to your questions inline: *How did you get the task lists you posted?* {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} The above table has been derived from the worker logs. For instance, if you search for {{kafkaspout 3}}, which means Task ID 3, you will find results for messages which are being read from partitions 0, 1, 2 and 3. This table seems to convey that {noformat} 5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat} If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 10, then only these tasks should be reading from partitions 4, 5, 6 or 7. *I don't think it's exactly right that all the tasks in executor1 (3 10) will be reading from topics 4,6,5,7.* Sorry for the confusion, I meant only tasks in Executor1 should read from the partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I agree that there is a possibility that not all the tasks will end up reading from partitions 4, 5, 6 and 7. *Each task has its own spout instance (and corresponding KafkaConsumer), so only task 10 is actually assigned those partitions at the start of the log. I'm not sure how task 3 is even emitting anything?* Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to read from partitions 4, 5, 6 and 7. If there are more tasks than the number of partitions there should be one to one mapping between tasks and partitions is what I remember reading in the docs. Hence it is not surprising for me that task 3 is reading from a Kafka partition, although it does not seem to read from the assigned Kafka partition (according to the {{Setting newly assigned}} log line ). *I also noticed that many tasks are emitting what appears to be the same message?* {noformat} e.g. 2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1] {noformat} Are you deriving this information from the fact that all of the above messages have the same message ID (0) ? *Is the start of the log the first occurrence of partition assignments in the log?* No, I have randomly taken a segment from the worker logs. I am attaching the full worker.log file in this ticket. *Are you using automatic or manual subscription to Kafka?* I did not understand the question. I am running Kafka on my local machine and using a KafkaSpout to read a topic from this local instance of Kafka. was (Author: srishtyagraw...@gmail.com): Thanks for the clarification. Please find answers to your questions inline: *How did you get the task lists you posted?* {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} The above table has been derived from the worker logs. For instance, if you search for {{kafkaspout 3}}, which means Task ID 3, you will find results for messages which are being read from partitions 0, 1, 2 and 3. This table seems to convey that {noformat} 5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat} If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 10, then only these tasks should be reading from partitions 4, 5, 6 or 7. *I don't think it's exactly right that all the tasks in executor1 (3 10) will be reading from topics 4,6,5,7. * Sorry for the confusion, I meant only tasks in Executor1 should read from the partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I agree that there is a possibility that not all the tasks will end up reading from partitions 4, 5, 6 and 7. *Each task has its own spout instance (and corresponding KafkaConsumer), so only task 10 is actually assigned those partitions at the start of the log. I'm not sure how task 3 is even emitting anything?* Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to read from partitions 4, 5, 6 and 7. If there are more tasks than
[jira] [Comment Edited] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239 ] Srishty Agrawal edited comment on STORM-2514 at 5/16/17 11:08 PM: -- Thanks for the clarification. Please find answers to your questions inline: *How did you get the task lists you posted?* {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} The above table has been derived from the worker logs. For instance, if you search for {{kafkaspout 3}}, which means Task ID 3, you will find results for messages which are being read from partitions 0, 1, 2 and 3. This table seems to convey that {noformat} 5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat} If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 10, then only these tasks should be reading from partitions 4, 5, 6 or 7. *I don't think it's exactly right that all the tasks in executor1 (3 10) will be reading from topics 4,6,5,7. * Sorry for the confusion, I meant only tasks in Executor1 should read from the partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I agree that there is a possibility that not all the tasks will end up reading from partitions 4, 5, 6 and 7. *Each task has its own spout instance (and corresponding KafkaConsumer), so only task 10 is actually assigned those partitions at the start of the log. I'm not sure how task 3 is even emitting anything?* Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to read from partitions 4, 5, 6 and 7. If there are more tasks than the number of partitions there should be one to one mapping between tasks and partitions is what I remember reading in the docs. Hence it is not surprising for me that task 3 is reading from a Kafka partition, although it does not seem to read from the assigned Kafka partition (according to the {{Setting newly assigned}} log line ). *I also noticed that many tasks are emitting what appears to be the same message?* {noformat} e.g. 2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1] {noformat} Are you deriving this information from the fact that all of the above messages have the same message ID (0) ? *Is the start of the log the first occurrence of partition assignments in the log?* No, I have randomly taken a segment from the worker logs. I am attaching the full worker.log file in this ticket. *Are you using automatic or manual subscription to Kafka?* I did not understand the question. I am running Kafka on my local machine and using a KafkaSpout to read a topic from this local instance of Kafka. was (Author: srishtyagraw...@gmail.com): Thanks for the clarification. Please find answers to your questions inline: *How did you get the task lists you posted?* {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} The above table has been derived from the worker logs. For instance, if you search for {{kafkaspout 3}}, which means Task ID 3, you will find results for messages which are being read from partitions 0, 1, 2 and 3. This table seems to convey that {noformat} 5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat} If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 10, then only these tasks should be reading from partitions 0, 1, 2 or 3. *I don't think it's exactly right that all the tasks in executor1 (3 10) will be reading from topics 4,6,5,7. * Sorry for the confusion, I meant only tasks in Executor1 should read from the partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I agree that there is a possibility that not all the tasks will end up reading from partitions 4, 5, 6 and 7. *Each task has its own spout instance (and corresponding KafkaConsumer), so only task 10 is actually assigned those partitions at the start of the log. I'm not sure how task 3 is even emitting anything?* Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to read from partitions 4, 5, 6 and 7. If there are more tasks tha
[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2514: --- Attachment: worker.log Attaching the original worker.log file. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal > Attachments: worker.log > > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 > (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks > in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. > These log lines are being printed by Tasks with IDs 10 and 15 in respective > executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 (added code, as a part of > debugging STORM-2506, to print the Task ID right next to component ID) which > runs on Executor1 reads from partition 2 (the second value inside the square > brackets), instead of 4, 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013239#comment-16013239 ] Srishty Agrawal commented on STORM-2514: Thanks for the clarification. Please find answers to your questions inline: *How did you get the task lists you posted?* {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} The above table has been derived from the worker logs. For instance, if you search for {{kafkaspout 3}}, which means Task ID 3, you will find results for messages which are being read from partitions 0, 1, 2 and 3. This table seems to convey that {noformat} 5, 6, 10, 12, 13, 14, 16, 17 correspond to executor1 and 3, 4, 7, 8, 9, 11, 15, 18 correspond to executor2 {noformat} If I assume that Executor1 should only be running tasks 3, 4, 5, 6, 7, 8, 9, 10, then only these tasks should be reading from partitions 0, 1, 2 or 3. *I don't think it's exactly right that all the tasks in executor1 (3 10) will be reading from topics 4,6,5,7. * Sorry for the confusion, I meant only tasks in Executor1 should read from the partitions 4, 5, 6 or 7. I have re-worded the same in the description above. I agree that there is a possibility that not all the tasks will end up reading from partitions 4, 5, 6 and 7. *Each task has its own spout instance (and corresponding KafkaConsumer), so only task 10 is actually assigned those partitions at the start of the log. I'm not sure how task 3 is even emitting anything?* Shouldn’t 3 more tasks (from Executor1) apart from task ID 10 be assigned to read from partitions 4, 5, 6 and 7. If there are more tasks than the number of partitions there should be one to one mapping between tasks and partitions is what I remember reading in the docs. Hence it is not surprising for me that task 3 is reading from a Kafka partition, although it does not seem to read from the assigned Kafka partition (according to the {{Setting newly assigned}} log line ). *I also noticed that many tasks are emitting what appears to be the same message?* {noformat} e.g. 2017-05-03 13:50:47.655 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 18 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.658 o.a.s.d.task Thread-8-kafkaspout-executor[11 18] [INFO] Emitting: kafkaspout 11 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.660 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 7 default [8topic, 2, 0, null, 1] 2017-05-03 13:50:47.670 o.a.s.d.task Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 8 default [8topic, 2, 0, null, 1] {noformat} Are you deriving this information from the fact that all of the above messages have the same message ID (0) ? *Is the start of the log the first occurrence of partition assignments in the log?* No, I have randomly taken a segment from the worker logs. I am attaching the full worker.log file in this ticket. *Are you using automatic or manual subscription to Kafka?* I did not understand the question. I am running Kafka on my local machine and using a KafkaSpout to read a topic from this local instance of Kafka. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment
[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2514: --- Description: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that only tasks (with ID 3, 4, 5, 6, 7, 8, 9, 10) in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respective executors. Logs which emit individual messages do not abide by the above assumption. For example in the log mentioned below, Task ID 3 (added code, as a part of debugging STORM-2506, to print the Task ID right next to component ID) which runs on Executor1 reads from partition 2 (the second value inside the square brackets), instead of 4, 5, 6 or 7. {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 default [8topic, 2, 0, null, 1]{noformat} This behavior has been summarized in the table below : {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} [You can find the relevant parts of log file here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to executor2? Are (3 10) not the starting and ending task IDs in Executor1? Another interesting thing to note is that, Task IDs 10 and 15 are always reading from the partitions they claimed to be reading from (while setting partition assignments). If my assumptions are correct, there is a bug in the way the mapping information is being/passed to worker logs. If not, we need to make changes in our docs. was: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being print
[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2514: --- Description: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respective executors. Logs which emit individual messages do not abide by the above assumption. For example in the log mentioned below, Task ID 3 (added code, as a part of debugging STORM-2506, to print the Task ID right next to component ID) which runs on Executor1 reads from partition 2 (the second value inside the square brackets), instead of 4, 5, 6 or 7. {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 default [8topic, 2, 0, null, 1]{noformat} This behavior has been summarized in the table below : {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} [You can find the relevant parts of log file here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to executor2? Are (3 10) not the starting and ending task IDs in Executor1? Another interesting thing to note is that, Task IDs 10 and 15 are always reading from the partitions they claimed to be reading from (while setting partition assignments). If my assumptions are correct, there is a bug in the way the mapping information is being/passed to worker logs. If not, we need to make changes in our docs. was: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15
[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2514: --- Description: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respective executors. Logs which emit individual messages do not abide by the above assumption. For example in the log mentioned below, Task ID 3 (added code, as a part of debugging STOMR-2506, to print the Task ID right next to component ID) which runs on Executor1 reads from partition 2 (the second value inside the square brackets), instead of 4, 5, 6 or 7. {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 default [8topic, 2, 0, null, 1]{noformat} This behavior has been summarized in the table below : {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} [You can find the relevant parts of log file here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to executor2? Are (3 10) not the starting and ending task IDs in Executor1? Another interesting thing to note is that, Task IDs 10 and 15 are always reading from the partitions they claimed to be reading from (while setting partition assignments). If my assumptions are correct, there is a bug in the way the mapping information is being/passed to worker logs. If not, we need to make changes in our docs. was: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15
[jira] [Updated] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2514: --- Description: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respective executors. Logs which emit individual messages do not abide by the above assumption. For example in the log mentioned below, Task ID 3 (added code, as a part of debugging, to print the Task ID right next to component ID) which runs on Executor1 reads from partition 2 (the second value inside the square brackets), instead of 4, 5, 6 or 7. {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 default [8topic, 2, 0, null, 1]{noformat} This behavior has been summarized in the table below : {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} [You can find the relevant parts of log file here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to executor2? Are (3 10) not the starting and ending task IDs in Executor1? Another interesting thing to note is that, Task IDs 10 and 15 are always reading from the partitions they claimed to be reading from (while setting partition assignments). If my assumptions are correct, there is a bug in the way the mapping information is being/passed to worker logs. If not, we need to make changes in our docs. was: While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respecti
[jira] [Commented] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
[ https://issues.apache.org/jira/browse/STORM-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011341#comment-16011341 ] Srishty Agrawal commented on STORM-2514: [~erikdw], [~Srdo], [~revans2], [~harshach] kindly provide your insight on this issue. > Incorrect logs for mapping between Kafka partitions and task IDs > > > Key: STORM-2514 > URL: https://issues.apache.org/jira/browse/STORM-2514 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Reporter: Srishty Agrawal > > While working on > [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker > logs were generated with debug mode on. The information printed about mapping > between Task IDs and kafka partitions was contradictory to my assumptions. I > ran a topology which used KafkaSpout from the storm-kafka-client module, it > had a parallelism hint of 2 (number of executors) and a total of 16 tasks. > The log lines mentioned below show assigned mapping between executors and > kafka partitions: > {noformat} > o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] > Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions > reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, > topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] > o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] > Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] > for group kafkaSpoutTestGroup > o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions > reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, > consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, > topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] > {noformat} > It is evident that all the tasks in Executor1 (3 10) will be reading from > kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be > reading from kafka partitions 0, 1, 2 and 3. These log lines are being > printed by Tasks with IDs 10 and 15 in respective executors. > Logs which emit individual messages do not abide by the above assumption. For > example in the log mentioned below, Task ID 3 which runs on Executor1 reads > from partition 2 (the second value inside the square brackets), instead of 4, > 5, 6 or 7. > {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 > default [8topic, 2, 0, null, 1]{noformat} > This behavior has been summarized in the table below : > {noformat} > Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 > Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 > {noformat} > [You can find the relevant parts of log file > here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] > > Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, > 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to > executor2? Are (3 10) not the starting and ending task IDs in Executor1? > Another interesting thing to note is that, Task IDs 10 and 15 are always > reading from the partitions they claimed to be reading from (while setting > partition assignments). > If my assumptions are correct, there is a bug in the way the mapping > information is being/passed to worker logs. If not, we need to make changes > in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2514) Incorrect logs for mapping between Kafka partitions and task IDs
Srishty Agrawal created STORM-2514: -- Summary: Incorrect logs for mapping between Kafka partitions and task IDs Key: STORM-2514 URL: https://issues.apache.org/jira/browse/STORM-2514 Project: Apache Storm Issue Type: Bug Components: storm-kafka-client Reporter: Srishty Agrawal While working on [STORM-2506|https://issues.apache.org/jira/browse/STORM-2506], the worker logs were generated with debug mode on. The information printed about mapping between Task IDs and kafka partitions was contradictory to my assumptions. I ran a topology which used KafkaSpout from the storm-kafka-client module, it had a parallelism hint of 2 (number of executors) and a total of 16 tasks. The log lines mentioned below show assigned mapping between executors and kafka partitions: {noformat} o.a.k.c.c.i.ConsumerCoordinator Thread-12-kafkaspout-executor[3 10] [INFO] Setting newly assigned partitions [8topic-4, 8topic-6, 8topic-5, 8topic-7] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-12-kafkaspout-executor[3 10] [INFO] Partitions reassignment. [taskID=10, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@108e79ce, topic-partitions=[8topic-4, 8topic-6, 8topic-5, 8topic-7]] o.a.k.c.c.i.ConsumerCoordinator Thread-8-kafkaspout-executor[11 18] [INFO] Setting newly assigned partitions [8topic-2, 8topic-1, 8topic-3, 8topic-0] for group kafkaSpoutTestGroup o.a.s.k.s.KafkaSpout Thread-8-kafkaspout-executor[11 18] [INFO] Partitions reassignment. [taskID=15, consumer-group=kafkaSpoutTestGroup, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@2dc37126, topic-partitions=[8topic-2, 8topic-1, 8topic-3, 8topic-0]] {noformat} It is evident that all the tasks in Executor1 (3 10) will be reading from kafka partitions 4, 5, 6 and 7. Similarly, tasks in Executor2 (11 18) will be reading from kafka partitions 0, 1, 2 and 3. These log lines are being printed by Tasks with IDs 10 and 15 in respective executors. Logs which emit individual messages do not abide by the above assumption. For example in the log mentioned below, Task ID 3 which runs on Executor1 reads from partition 2 (the second value inside the square brackets), instead of 4, 5, 6 or 7. {noformat}Thread-12-kafkaspout-executor[3 10] [INFO] Emitting: kafkaspout 3 default [8topic, 2, 0, null, 1]{noformat} This behavior has been summarized in the table below : {noformat} Task IDs --- 3, 4, 7, 8, 9, 11, 15, 18 Partitions 0, 1, 2, 3 Task IDs --- 5, 6, 10, 12, 13, 14, 16, 17 - Partition 4, 5, 6, 7 {noformat} [You can find the relevant parts of log file here.|https://gist.github.com/srishtyagrawal/f7c53db6b8391e2c3bd522afc93b5351] Am I misunderstanding something here? Do tasks {{5, 6, 10, 12, 13, 14, 16, 17}} correspond to executor1 and {{3, 4, 7, 8, 9, 11, 15, 18}} correspond to executor2? Are (3 10) not the starting and ending task IDs in Executor1? Another interesting thing to note is that, Task IDs 10 and 15 are always reading from the partitions they claimed to be reading from (while setting partition assignments). If my assumptions are correct, there is a bug in the way the mapping information is being/passed to worker logs. If not, we need to make changes in our docs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2506) Make Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007350#comment-16007350 ] Srishty Agrawal commented on STORM-2506: [Link to the pull request|https://github.com/apache/storm/pull/2109] > Make Kafka Spout log its assigned partition > --- > > Key: STORM-2506 > URL: https://issues.apache.org/jira/browse/STORM-2506 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log > Time Spent: 10m > Remaining Estimate: 0h > > On using storm-kafka KafkaSpout, the generated worker logs have information > about the task index rather than task ID while printing the assigned kafka > partitions. It is difficult to fetch the executor ID reading a particular > kafka partition if only task index is present in the logs. > [Example from generated worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. > There were a total of 16 tasks and 2 executors assigned for this topology. > The executor IDs as indicated by storm-UI were (5 12) and (13 20). > This problem has been eliminated in the KafkaSpout from storm-kafka-client > module. Logs generated with the new KafkaSpout print the mapping between > executor ID and the kafka parititions. The task ID is still not printed in > the log line which prints this mapping. > [Example from worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. > There were a total of 16 tasks and 2 executors for this topology as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2506) Make Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2506: --- Description: On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only task index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor ID and the kafka parititions. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for this topology as well. was: On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only task index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor ID running KafkaSpout and the kafka parititions. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for this topology as well. > Make Kafka Spout log its assigned partition > --- > > Key: STORM-2506 > URL: https://issues.apache.org/jira/browse/STORM-2506 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log > > On using storm-kafka KafkaSpout, the generated worker logs have information > about the task index rather than task ID while printing the assigned kafka > partitions. It is difficult to fetch the executor ID reading a particular > kafka partition if only task index is present in the logs. > [Example from generated worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. > There were a total of 16 tasks and 2 executors assigned for this topology. > The executor IDs as indicated by storm-UI were (5 12) and (13 20). > This problem has been eliminated in the KafkaSpout from storm-kafka-client > module. Logs generated with the new KafkaSpout print the mapping between > executor ID and the kafka parititions. The task ID is still not printed in > the log line which prints this mapping. > [Example from worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. > There were a total of 16 tasks and 2 executors for this topology as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2506) Make Kafka Spout log its assigned partition
[ https://issues.apache.org/jira/browse/STORM-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srishty Agrawal updated STORM-2506: --- Description: On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only task index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor ID running KafkaSpout and the kafka parititions. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for this topology as well. was: On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only Task Index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor running KafkaSpout and the kafka parititions it is assigned to read. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for the below run: > Make Kafka Spout log its assigned partition > --- > > Key: STORM-2506 > URL: https://issues.apache.org/jira/browse/STORM-2506 > Project: Apache Storm > Issue Type: Task > Components: storm-kafka, storm-kafka-client >Reporter: Srishty Agrawal >Priority: Minor > Labels: information, log > > On using storm-kafka KafkaSpout, the generated worker logs have information > about the task index rather than task ID while printing the assigned kafka > partitions. It is difficult to fetch the executor ID reading a particular > kafka partition if only task index is present in the logs. > [Example from generated worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. > There were a total of 16 tasks and 2 executors assigned for this topology. > The executor IDs as indicated by storm-UI were (5 12) and (13 20). > This problem has been eliminated in the KafkaSpout from storm-kafka-client > module. Logs generated with the new KafkaSpout print the mapping between > executor ID running KafkaSpout and the kafka parititions. The task ID is > still not printed in the log line which prints this mapping. > [Example from worker.log while using storm-kafka module's > KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. > There were a total of 16 tasks and 2 executors for this topology as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2506) Make Kafka Spout log its assigned partition
Srishty Agrawal created STORM-2506: -- Summary: Make Kafka Spout log its assigned partition Key: STORM-2506 URL: https://issues.apache.org/jira/browse/STORM-2506 Project: Apache Storm Issue Type: Task Components: storm-kafka, storm-kafka-client Reporter: Srishty Agrawal Priority: Minor On using storm-kafka KafkaSpout, the generated worker logs have information about the task index rather than task ID while printing the assigned kafka partitions. It is difficult to fetch the executor ID reading a particular kafka partition if only Task Index is present in the logs. [Example from generated worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/e068bd934094d685a08ece5924e3e71f]. There were a total of 16 tasks and 2 executors assigned for this topology. The executor IDs as indicated by storm-UI were (5 12) and (13 20). This problem has been eliminated in the KafkaSpout from storm-kafka-client module. Logs generated with the new KafkaSpout print the mapping between executor running KafkaSpout and the kafka parititions it is assigned to read. The task ID is still not printed in the log line which prints this mapping. [Example from worker.log while using storm-kafka module's KafkaSpout|https://gist.github.com/srishtyagrawal/04939d3ac9931ead24b89405a5f8be6b]. There were a total of 16 tasks and 2 executors for the below run: -- This message was sent by Atlassian JIRA (v6.3.15#6346)