[jira] [Commented] (STORM-2693) Topology submission or kill takes too much time when topologies grow to a few hundred

2017-09-25 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180275#comment-16180275
 ] 

Jungtaek Lim commented on STORM-2693:
-

[~danny0405]
The patch is meant to be compatible with current. When we move to metrics V2 
(also built-in) completely, we could get rid of metrics in heartbeat.
Indeed you're right about current status, so let's forget about metrics V2 for 
now and go ahead.

> Topology submission or kill takes too much time when topologies grow to a few 
> hundred
> -
>
> Key: STORM-2693
> URL: https://issues.apache.org/jira/browse/STORM-2693
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Affects Versions: 0.9.6, 1.0.2, 1.1.0, 1.0.3
>Reporter: Yuzhao Chen
>  Labels: pull-request-available
> Attachments: 2FA30CD8-AF15-4352-992D-A67BD724E7FB.png, 
> D4A30D40-25D5-4ACF-9A96-252EBA9E6EF6.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Now for a storm cluster with 40 hosts [with 32 cores/128G memory] and 
> hundreds of topologies, nimbus submission and killing will take about minutes 
> to finish. For example, for a cluster with 300 hundred of topologies,it will 
> take about 8 minutes to submit a topology, this affect our efficiency 
> seriously.
> So, i check out the nimbus code and find two factor that will effect nimbus 
> submission/killing time for a scheduling round:
> * read existing-assignments from zookeeper for every topology [will take 
> about 4 seconds for a 300 topologies cluster]
> * read all the workers heartbeats and update the state to nimbus cache [will 
> take about 30 seconds for a 300 topologies cluster]
> the key here is that Storm now use zookeeper to collect heartbeats [not RPC], 
> and also keep physical plan [assignments] using zookeeper which can be 
> totally local in nimbus.
> So, i think we should make some changes to storm's heartbeats and assignments 
> management.
> For assignment promotion:
> 1. nimbus will put the assignments in local disk
> 2. when restart or HA leader trigger nimbus will recover assignments from zk 
> to local disk
> 3. nimbus will tell supervisor its assignment every time through RPC every 
> scheduling round
> 4. supervisor will sync assignments at fixed time
> For heartbeats promotion:
> 1. workers will report executors ok or wrong to supervisor at fixed time
> 2. supervisor will report workers heartbeats to nimbus at fixed time
> 3. if supervisor die, it will tell nimbus through runtime hook
> or let nimbus find it through aware supervisor if is survive 
> 4. let supervisor decide if worker is running ok or invalid , supervisor will 
> tell nimbus which executors of every topology are ok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2755) Sharing data between Bolts

2017-09-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179307#comment-16179307
 ] 

Stig Rohde Døssing commented on STORM-2755:
---

Hi Oussama,

Yes, this is what stream groupings are for. Please see 
https://storm.apache.org/releases/2.0.0-SNAPSHOT/Concepts.html under the 
"Stream Groupings" header. With an appropriate grouping you can make sure that 
all related data goes to the same bolt instance. I would recommend you start by 
looking at the fields grouping. If you'd like a code example, the 
WordCountTopology uses a field grouping here 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java#L88

> Sharing data between Bolts
> --
>
> Key: STORM-2755
> URL: https://issues.apache.org/jira/browse/STORM-2755
> Project: Apache Storm
>  Issue Type: Question
>Reporter: Oussama BEN LAMINE
>
> Hello, 
> i am facing a problem and your help will be really appreciated. 
> so, i am using storm 1.0.1, i prepared a topology that transfers data throw a 
> preparation bolt then to a python prediction bolt. 
> in prediction bolt i am collecting data in real time and creating group of 
> data then sending them to prediction function. 
> this topology works perfectly in one machine where the machine can process 
> all the data, but not  in distributed cluster as each machine work separately 
> and some data needed for grouping can be processed by different machine.
> Is there a way that bolts communicate with each other and so we can group 
> data even if they are on different machines? 
> can you please advice and thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2757) Links are broken when logviewer https port is used

2017-09-25 Thread Ethan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Li updated STORM-2757:

Description: 
Some links are broken when logviewer.https.port is configured.  

For example,  with the configuration:

{code:java}
logviewer.https.port: 9093
logviewer.https.keystore.type: "JKS"
logviewer.https.keystore.path: "/keystore-path"
logviewer.https.keystore.password: "xx"
logviewer.https.key.password: "xx"
{code}

We will get:
[^screenshot-1.png]

The logLink is still using http port. However, it's not reachable. 


 

  was:
Some links are broken when logviewer.https.port is configured.  

For example,  with the configuration:

{code:java}
logviewer.https.port: 9093
logviewer.https.keystore.type: "JKS"
logviewer.https.keystore.path: "/keystore-path"
logviewer.https.keystore.password: "xx"
logviewer.https.key.password: "xx"
{code}




 


> Links are broken when logviewer https port is used
> --
>
> Key: STORM-2757
> URL: https://issues.apache.org/jira/browse/STORM-2757
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
> Attachments: screenshot-1.png
>
>
> Some links are broken when logviewer.https.port is configured.  
> For example,  with the configuration:
> {code:java}
> logviewer.https.port: 9093
> logviewer.https.keystore.type: "JKS"
> logviewer.https.keystore.path: "/keystore-path"
> logviewer.https.keystore.password: "xx"
> logviewer.https.key.password: "xx"
> {code}
> We will get:
> [^screenshot-1.png]
> The logLink is still using http port. However, it's not reachable. 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2757) Links are broken when logviewer https port is used

2017-09-25 Thread Ethan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Li updated STORM-2757:

Attachment: screenshot-1.png

> Links are broken when logviewer https port is used
> --
>
> Key: STORM-2757
> URL: https://issues.apache.org/jira/browse/STORM-2757
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
> Attachments: screenshot-1.png
>
>
> Some links are broken when logviewer.https.port is configured.  
> For example,  with the configuration:
> {code:java}
> logviewer.https.port: 9093
> logviewer.https.keystore.type: "JKS"
> logviewer.https.keystore.path: "/keystore-path"
> logviewer.https.keystore.password: "xx"
> logviewer.https.key.password: "xx"
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2757) Links are broken when logviewer https port is used

2017-09-25 Thread Ethan Li (JIRA)
Ethan Li created STORM-2757:
---

 Summary: Links are broken when logviewer https port is used
 Key: STORM-2757
 URL: https://issues.apache.org/jira/browse/STORM-2757
 Project: Apache Storm
  Issue Type: Bug
Reporter: Ethan Li
Assignee: Ethan Li
 Attachments: screenshot-1.png

Some links are broken when logviewer.https.port is configured.  

For example,  with the configuration:

{code:java}
logviewer.https.port: 9093
logviewer.https.keystore.type: "JKS"
logviewer.https.keystore.path: "/keystore-path"
logviewer.https.keystore.password: "xx"
logviewer.https.key.password: "xx"
{code}




 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2083) Blacklist Scheduler

2017-09-25 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2083:
--
Labels: blacklist pull-request-available scheduling  (was: blacklist 
scheduling)

> Blacklist Scheduler
> ---
>
> Key: STORM-2083
> URL: https://issues.apache.org/jira/browse/STORM-2083
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Howard Lee
>  Labels: blacklist, pull-request-available, scheduling
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> My company has gone through a fault in production, in which a critical switch 
> causes unstable network for a set of machines with package loss rate of 
> 30%-50%. In such fault, the supervisors and workers on the machines are not 
> definitely dead, which is easy to handle. Instead they are still alive but 
> very unstable. They lost heartbeat to the nimbus occasionally. The nimbus, in 
> such circumstance, will still assign jobs to these machines, but will soon 
> find them invalid again, result in a very slow convergence to stable status.
> To deal with such unstable cases, we intend to implement a blacklist 
> scheduler, which will add the unstable nodes (supervisors, slots) to the 
> blacklist temporarily, and resume them later. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2755) Sharing data between Bolts

2017-09-25 Thread Oussama BEN LAMINE (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178710#comment-16178710
 ] 

Oussama BEN LAMINE commented on STORM-2755:
---

Good morning, 
any help please?

> Sharing data between Bolts
> --
>
> Key: STORM-2755
> URL: https://issues.apache.org/jira/browse/STORM-2755
> Project: Apache Storm
>  Issue Type: Question
>Reporter: Oussama BEN LAMINE
>
> Hello, 
> i am facing a problem and your help will be really appreciated. 
> so, i am using storm 1.0.1, i prepared a topology that transfers data throw a 
> preparation bolt then to a python prediction bolt. 
> in prediction bolt i am collecting data in real time and creating group of 
> data then sending them to prediction function. 
> this topology works perfectly in one machine where the machine can process 
> all the data, but not  in distributed cluster as each machine work separately 
> and some data needed for grouping can be processed by different machine.
> Is there a way that bolts communicate with each other and so we can group 
> data even if they are on different machines? 
> can you please advice and thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2693) Topology submission or kill takes too much time when topologies grow to a few hundred

2017-09-25 Thread Yuzhao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178703#comment-16178703
 ] 

Yuzhao Chen commented on STORM-2693:


[~kabhwan]
I have read STORM-2153 and  didn't find any promotion for heartbeats part. So i 
decide to re-design it which can be done individually apart from the metrics V2 
thing.

> Topology submission or kill takes too much time when topologies grow to a few 
> hundred
> -
>
> Key: STORM-2693
> URL: https://issues.apache.org/jira/browse/STORM-2693
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Affects Versions: 0.9.6, 1.0.2, 1.1.0, 1.0.3
>Reporter: Yuzhao Chen
>  Labels: pull-request-available
> Attachments: 2FA30CD8-AF15-4352-992D-A67BD724E7FB.png, 
> D4A30D40-25D5-4ACF-9A96-252EBA9E6EF6.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Now for a storm cluster with 40 hosts [with 32 cores/128G memory] and 
> hundreds of topologies, nimbus submission and killing will take about minutes 
> to finish. For example, for a cluster with 300 hundred of topologies,it will 
> take about 8 minutes to submit a topology, this affect our efficiency 
> seriously.
> So, i check out the nimbus code and find two factor that will effect nimbus 
> submission/killing time for a scheduling round:
> * read existing-assignments from zookeeper for every topology [will take 
> about 4 seconds for a 300 topologies cluster]
> * read all the workers heartbeats and update the state to nimbus cache [will 
> take about 30 seconds for a 300 topologies cluster]
> the key here is that Storm now use zookeeper to collect heartbeats [not RPC], 
> and also keep physical plan [assignments] using zookeeper which can be 
> totally local in nimbus.
> So, i think we should make some changes to storm's heartbeats and assignments 
> management.
> For assignment promotion:
> 1. nimbus will put the assignments in local disk
> 2. when restart or HA leader trigger nimbus will recover assignments from zk 
> to local disk
> 3. nimbus will tell supervisor its assignment every time through RPC every 
> scheduling round
> 4. supervisor will sync assignments at fixed time
> For heartbeats promotion:
> 1. workers will report executors ok or wrong to supervisor at fixed time
> 2. supervisor will report workers heartbeats to nimbus at fixed time
> 3. if supervisor die, it will tell nimbus through runtime hook
> or let nimbus find it through aware supervisor if is survive 
> 4. let supervisor decide if worker is running ok or invalid , supervisor will 
> tell nimbus which executors of every topology are ok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2153) New Metrics Reporting API

2017-09-25 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2153:
--
Labels: pull-request-available  (was: )

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Performance Implications
> [TBD]
> h2. Metrics Namespaces
> [TBD]
> h2. Metrics Collected
> *Worker*
> || Namespace || Metric Type || Description ||
> *Nimbus*
> || Namespace || Metric Type || Description ||
> *Supervisor*
> || Namespace || Metric Type || Description ||
> h2. User-Defined Metrics
> [TBD]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)