[jira] [Created] (STORM-2318) Look at combining FiledNameTopicSelector and FieldIndexTopicSelector

2017-01-23 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2318:
--

 Summary: Look at combining FiledNameTopicSelector and 
FieldIndexTopicSelector
 Key: STORM-2318
 URL: https://issues.apache.org/jira/browse/STORM-2318
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-kafka-client
Reporter: Robert Joseph Evans


Especially in 2.x it would be nice to combine the logic of 
FiledNameTopicSelector and FieldIndexTopicSelector so that I can select by both 
index and/or name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2317) Once STORM-2225 goes in remove deprecated classes from master

2017-01-23 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2317:
--

 Summary: Once STORM-2225 goes in remove deprecated classes from 
master
 Key: STORM-2317
 URL: https://issues.apache.org/jira/browse/STORM-2317
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


We should remove the deprecated APIs from 2.x and only leave them in 1.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2228) KafkaSpout does not replay properly when a topic maps to multiple streams

2017-01-18 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828165#comment-15828165
 ] 

Robert Joseph Evans commented on STORM-2228:


Sorry I got the review comments fixed, but STORM-2236 went into master and it 
is not going to be a simple merge by any means.

> KafkaSpout does not replay properly when a topic maps to multiple streams
> -
>
> Key: STORM-2228
> URL: https://issues.apache.org/jira/browse/STORM-2228
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 1.0.0, 2.0.0, 1.0.1, 1.0.2, 1.1.0, 1.0.3
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Blocker
>
> In the example.
> KafkaSpoutTopologyMainNamedTopics.java
> The code creates a TuplesBuilder and a KafkaSpoutStreams
> {code}
> protected KafkaSpoutTuplesBuilder getTuplesBuilder() {
> return new KafkaSpoutTuplesBuilderNamedTopics.Builder<>(
> new TopicsTest0Test1TupleBuilder(TOPICS[0], 
> TOPICS[1]),
> new TopicTest2TupleBuilder(TOPICS[2]))
> .build();
> }
> protected KafkaSpoutStreams getKafkaSpoutStreams() {
> final Fields outputFields = new Fields("topic", "partition", "offset", 
> "key", "value");
> final Fields outputFields1 = new Fields("topic", "partition", "offset");
> return new KafkaSpoutStreamsNamedTopics.Builder(outputFields, STREAMS[0], 
> new String[]{TOPICS[0], TOPICS[1]})  // contents of topics test, test1, sent 
> to test_stream
> .addStream(outputFields, STREAMS[0], new String[]{TOPICS[2]})  // 
> contents of topic test2 sent to test_stream
> .addStream(outputFields1, STREAMS[2], new String[]{TOPICS[2]})  
> // contents of topic test2 sent to test2_stream
> .build();
> }
> {code}
> Essentially the code is trying to take {{TOPICS\[0]}}, {{TOPICS\[1]}}, and 
> {{TOPICS\[2]}} translate them to {{Fields("topic", "partition", "offset", 
> "key", "value")}} and output them on {{STREAMS\[0]}}. Then just for 
> {{TOPICS\[2]}} they want it to be output as {{Fields("topic", "partition", 
> "offset")}} to {{STREAMS\[2]}}.  (Don't know what happened to {{STREAMS\[1]}})
> There are two issues here.  First with how the TupleBuilder and the 
> SpoutStreams are split up, but coupled {{STREAMS\[2]}} is actually getting 
> the full "topic" "partition" "offset" "key" "value", but this minor.  The 
> real issue is that the code uses the same KafkaSpoutMessageId for all the 
> tuples emitted to both {{STREAMS\[1]}} and {{STREAMS\[2]}}.
> https://git.corp.yahoo.com/storm/storm/blob/5bcbb8d6d700d0d238d23f8f6d3976667aaedab9/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L284-L304
> The code, however, is written to assume that it will only ever get one 
> ack/fail for a given KafkaSpoutMessageId.  This means that if one of the 
> emitted tuple trees succeed and then the other fails, the failure will not 
> result in anything being replayed!  This violates how storm is intended to 
> work.
> I discovered this as a part of STORM-2225, and I am fine with fixing it on 
> STORM-2225 (I would just remove support for that functionality because there 
> are other ways of doing this correctly).  But that would not maintain 
> backwards compatibility and I am not sure it would be appropriate for 1.x 
> releases.  I really would like to have feedback from others on this.
> I can put something into 1.x where it will throw an exception if acking is 
> enabled and this situation is present, but I don't want to spend the time 
> tying to do reference counting on the number of tuples actually emitted.  If 
> someone else wants to do that I would be happy to turn this JIRA over to them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2214) Cache Kerberos tickets for long lived Daemons

2017-01-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812534#comment-15812534
 ] 

Robert Joseph Evans commented on STORM-2214:


Actually there was a change in 2.x a function in AuthUtils was renamed 
{{s/PullConfig/pullConfig/}} So if you want the change I can backport it, but 
it will be a 1 character difference.

> Cache Kerberos tickets for long lived Daemons
> -
>
> Key: STORM-2214
> URL: https://issues.apache.org/jira/browse/STORM-2214
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The UI and supervisors authenticate to nimbus through kerberos.  But they 
> initiate a new connection each time they do so, and even use a new Subject 
> for each connection.  This means that the client must fetch a new service 
> ticket each time and adds a lot of load on the KDC.
> In order to reuse the subject between sessions we propose cacheing the Login 
> instead of creating a new one each time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2214) Cache Kerberos tickets for long lived Daemons

2017-01-06 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804698#comment-15804698
 ] 

Robert Joseph Evans commented on STORM-2214:


No, If you want it else where I am happy to do it.  We did an audit of our code 
base recently and have been making sure that we push back all of the changes to 
open source.  This is one of the patches that was missed.  We have been running 
with code like this for several years now so it should be solid to go into any 
currently active branch.

> Cache Kerberos tickets for long lived Daemons
> -
>
> Key: STORM-2214
> URL: https://issues.apache.org/jira/browse/STORM-2214
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The UI and supervisors authenticate to nimbus through kerberos.  But they 
> initiate a new connection each time they do so, and even use a new Subject 
> for each connection.  This means that the client must fetch a new service 
> ticket each time and adds a lot of load on the KDC.
> In order to reuse the subject between sessions we propose cacheing the Login 
> instead of creating a new one each time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2278) Allow max number of disruptor queue flusher threads to be configurable.

2017-01-05 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2278:
--

 Summary: Allow max number of disruptor queue flusher threads to be 
configurable.
 Key: STORM-2278
 URL: https://issues.apache.org/jira/browse/STORM-2278
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-core
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


We have a customer that has very bursty traffic.  Having a maximum of 100 
threads for flushing the disruptor queues ends up creating a lot of CPU load 
when these bursts happen.  They really want to be able to set the maximum 
number of threads lower to reduce the CPU utilization and are OK with the 
increased latency that comes with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (STORM-2243) Add ip address to supervisor id for easier debugging

2017-01-05 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2243.

   Resolution: Fixed
Fix Version/s: 2.0.0

Thanks [~abellina],

I merged this into master.

> Add ip address to supervisor id for easier debugging
> 
>
> Key: STORM-2243
> URL: https://issues.apache.org/jira/browse/STORM-2243
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The supervisor id is a simple UUID currently. If there are issues with the 
> heartbeat messages from supervisor, looking at Zookeeper makes it hard to 
> find which node is having issues. Adding the ip address to the supervisor id 
> makes debugging these cases much simpler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-2269) Dynamic reconfiguration for the nodes in Nimbus/Pacemaker clusters

2017-01-04 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2269:
---
Assignee: Sachin Goyal

> Dynamic reconfiguration for the nodes in Nimbus/Pacemaker clusters
> --
>
> Key: STORM-2269
> URL: https://issues.apache.org/jira/browse/STORM-2269
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Sachin Goyal
>Assignee: Sachin Goyal
>  Labels: nimbus, pacemaker
>
> Reference: https://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html
> It would be nice to have a similar functionality for Nimbus/Pacemaker 
> clusters too.
> As that would eliminate the need for restarting servers in the 
> Nimbus/Pacemaker clusters whenever a node exits or joins these clusters.
> --
> Reply from Bobby Evans on the dev group:
> --
> There is nothing for that right now on pacemaker.
> You can do it with nimbus so long as at least one of the original nodes is 
> still up.
> But in either case it would not be too difficult to make it all fully 
> functional.
> The two critical pieces would be in giving the workers and daemons a way to 
> reload these specific configs dynamically.
> Then it would be documenting the order of operations to be sure nothing goes 
> wrong.
> *Adding Pacemaker Node(s)*
> # bring up the new node(s).
> # update nimbus configs to start reading from the new nodes.
> # update all of the worker nodes to let workers start writing to the new node.
> *Removing Pacemaker Node(s)*
> # Shut down pacemaker nodes/update configs on workers (order should not 
> matter so long as there are enough pacemaker nodes up to handle the load)
> # update the nimbus configs to not try and read from the old nodes
> *Adding new Nimbus Node(s)*
> # Bring up the new nimbus with the new config.
> # update all of the other nodes (including any machines that clients come 
> from) with new config (order does not matter)
> *Removing Nimbus Node(s)*
> # Shut down the old nodes and update the configs on all the boxes in any 
> order you want.
> # This should just work so long as you have at least one nimbus node still up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2104) New Kafka spout crashes if partitions are reassigned while tuples are in-flight

2016-12-08 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732959#comment-15732959
 ] 

Robert Joseph Evans commented on STORM-2104:


I merged the pull request into master branch, but it has some minor merge 
conflicts and java 7 incompatibilities.  If you want this to go into 1.1 you 
will need to create a new pull request for that.  Overall it is really good.

> New Kafka spout crashes if partitions are reassigned while tuples are 
> in-flight
> ---
>
> Key: STORM-2104
> URL: https://issues.apache.org/jira/browse/STORM-2104
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The new KafkaSpout may throw NPEs if partitions are reassigned while tuples 
> are in-flight. The ack function assumes that the spout instance is always 
> responsible for tuples it emitted, which isn't true if partitions were 
> reassigned since the tuple was emitted. The fail function also assumes that 
> failed tuples should be replayed, which is useless if the tuple is for a 
> partition the spout isn't assigned, since it then can't commit the tuple if 
> it succeeds. Both functions should check that the spout instance is 
> responsible for the incoming tuple before scheduling it for retry or adding 
> it to the acked list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2225) Kafka New API make simple things simple

2016-11-29 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2225:
--

 Summary: Kafka New API make simple things simple
 Key: STORM-2225
 URL: https://issues.apache.org/jira/browse/STORM-2225
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-kafka-client
Affects Versions: 1.0.0, 2.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


The Kafka spouts in storm-kafka-client use the new API and are very extendable, 
but doing very simple things take way too many lines of code.

For example to create a KafkaTridentSpoutOpaque you need the following code 
(from the example).

{code}
private KafkaTridentSpoutOpaque 
newKafkaTridentSpoutOpaque() {
return new KafkaTridentSpoutOpaque<>(new KafkaTridentSpoutManager<>(
newKafkaSpoutConfig(
newKafkaSpoutStreams(;
}

private KafkaSpoutConfig 
newKafkaSpoutConfig(KafkaSpoutStreams kafkaSpoutStreams) {
return new KafkaSpoutConfig.Builder<>(newKafkaConsumerProps(),
kafkaSpoutStreams, newTuplesBuilder(), newRetryService())
.setOffsetCommitPeriodMs(10_000)
.setFirstPollOffsetStrategy(EARLIEST)
.setMaxUncommittedOffsets(250)
.build();
}

protected Map newKafkaConsumerProps() {
Map props = new HashMap<>();
props.put(KafkaSpoutConfig.Consumer.BOOTSTRAP_SERVERS, 
"127.0.0.1:9092");
props.put(KafkaSpoutConfig.Consumer.GROUP_ID, "kafkaSpoutTestGroup");
props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("max.partition.fetch.bytes", 200);
return props;
}

protected KafkaSpoutTuplesBuilder newTuplesBuilder() {
return new KafkaSpoutTuplesBuilderNamedTopics.Builder<>(
new TopicsTupleBuilder(TOPIC_1, TOPIC_2))
.build();
}

protected KafkaSpoutRetryService newRetryService() {
return new KafkaSpoutRetryExponentialBackoff(new 
KafkaSpoutRetryExponentialBackoff.TimeInterval(500L, TimeUnit.MICROSECONDS),
KafkaSpoutRetryExponentialBackoff.TimeInterval.milliSeconds(2),
Integer.MAX_VALUE, 
KafkaSpoutRetryExponentialBackoff.TimeInterval.seconds(10));
}

protected KafkaSpoutStreams newKafkaSpoutStreams() {
return new KafkaSpoutStreamsNamedTopics.Builder(new Fields("str"), new 
String[]{"test-trident","test-trident-1"}).build();
}

protected static class TopicsTupleBuilder extends 
KafkaSpoutTupleBuilder {
public TopicsTupleBuilder(String... topics) {
super(topics);
}
@Override
public List buildTuple(ConsumerRecord consumerRecord) {
return new Values(consumerRecord.value());
}
}
{code}

All of this so I can have a trident spout that reads  values 
from "localhost:9092" on the topics "test-trident" and "test-trident-1" and 
outputting the value as the field "str".

I shouldn't need 50 lines of code for something I can explain in 3 lines of 
test.  It feels like we need to have some better defaults, and less overhead on 
a lot of these things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1765) KafkaBolt with new producer api should be part of storm-kaka-client

2016-11-29 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705809#comment-15705809
 ] 

Robert Joseph Evans commented on STORM-1765:


[~hmclouro],

Is this something you still intend to do?

> KafkaBolt with new producer api should be part of storm-kaka-client
> ---
>
> Key: STORM-1765
> URL: https://issues.apache.org/jira/browse/STORM-1765
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Hugo Louro
>
> During the discussion of storm-kaka-client we agreed on following
> "We talked about this some more and it seems to me it would make more sense 
> to leave this new spout in storm-kafka-client (or whatever you want to call 
> it) and move the KafkaBolt which uses the new producer api over here also. 
> That way this component only needs to depend on the new kafka-clients java 
> api and not on the entire scala kafka core. We can make the old storm-kafka 
> depend on this component so it still picks up the bolt so if anyone is using 
> that its still works. We can deprecate the old KafkaSpout but keep it around 
> for people using older versions of Kafka - tgravescs"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1311) port backtype.storm.ui.core to java

2016-11-29 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705296#comment-15705296
 ] 

Robert Joseph Evans commented on STORM-1311:


I don't have a time frame, I just was starting on doing DRPC and I wanted to be 
sure that things were consistent.  I have a pull request up for DRPC at 
https://github.com/apache/storm/pull/1800 and would love some feedback on it.  
I am not using Jackson, because DRPC is not JSON based.  But I am use Jersey 
2.24.1, and I would like things to be mostly consistent if possible.

> port backtype.storm.ui.core to java
> ---
>
> Key: STORM-1311
> URL: https://issues.apache.org/jira/browse/STORM-1311
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Hugo Louro
>  Labels: java-migration, jstorm-merger
>
> User Interface + REST -> java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2217) Finish porting drpc to java

2016-11-23 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2217:
--

 Summary: Finish porting drpc to java
 Key: STORM-2217
 URL: https://issues.apache.org/jira/browse/STORM-2217
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-core
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


drpc.clj is all but gone.  All that remains is main and the REST API.  We 
should finish translating it all to java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1280) port backtype.storm.daemon.logviewer to java

2016-11-23 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691335#comment-15691335
 ] 

Robert Joseph Evans commented on STORM-1280:


[~csivaguru],

Have you started on this yet?  If so are you using Jersey?  I am hoping to 
finish up the HTTP DRPC server moving it to java and wanted to base my changes 
on what others are doing.  If not that is OK I'll setup the basics myself.

> port backtype.storm.daemon.logviewer to java
> 
>
> Key: STORM-1280
> URL: https://issues.apache.org/jira/browse/STORM-1280
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Sivaguru Kannan
>  Labels: java-migration, jstorm-merger
>
> This is providing a UI for accessing and searching logs.  hiccup will need to 
> be replaced, possibly with just hard coded HTML + escaping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (STORM-2212) Remove Redundant Declarations in Maven POM Files

2016-11-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2212.

   Resolution: Fixed
Fix Version/s: 2.0.0

Thanks [~hmclouro],

I merged this into master.  Keep up the good work.

> Remove Redundant Declarations in Maven POM Files
> 
>
> Key: STORM-2212
> URL: https://issues.apache.org/jira/browse/STORM-2212
> Project: Apache Storm
>  Issue Type: Bug
>  Components: build
>Reporter: Hugo Louro
>Assignee: Hugo Louro
> Fix For: 2.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This redundant declarations cause warnings and make the build files easier to 
> extend and maintain



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (STORM-2210) ShuffleGrouping does not produce even distribution

2016-11-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2210.

   Resolution: Fixed
 Assignee: Kevin Peek
Fix Version/s: 1.0.3
   1.1.0
   2.0.0

Thanks [~kevpeek],

I merged this into master, 1.x-branch and 1.0.x-branch.  Keep up the good work.

> ShuffleGrouping does not produce even distribution
> --
>
> Key: STORM-2210
> URL: https://issues.apache.org/jira/browse/STORM-2210
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.2
>Reporter: Kevin Peek
>Assignee: Kevin Peek
>Priority: Critical
> Fix For: 2.0.0, 1.1.0, 1.0.3
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When testing the ShuffleGrouping in a multithreaded environment, it produces 
> an extremely uneven distribution.
> This appears to be a result of the Collection.shuffle call here. 
> https://github.com/apache/storm/blob/1.0.x-branch/storm-core/src/jvm/org/apache/storm/grouping/ShuffleGrouping.java#L58
> Because current was set to zero before the shuffle, other threads are able to 
> access the arrayList while it is being shuffled.
> Stephen's gist here includes a test that results in a very uneven 
> distribution of taskIds from the ShuffleGrouping: 
> https://gist.github.com/Crim/61537958df65a5e13b3844b2d5e28cde
> I would have expected the taskIds from the ShuffleGrouping to be almost 
> uniformly distributed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (STORM-1306) port backtype.storm.testing4j to java

2016-11-18 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reassigned STORM-1306:
--

Assignee: Robert Joseph Evans  (was: Abhishek Agarwal)

> port  backtype.storm.testing4j to java
> --
>
> Key: STORM-1306
> URL: https://issues.apache.org/jira/browse/STORM-1306
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Labels: java-migration, jstorm-merger
>
> backtype.storm.Testing class for use from java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (STORM-1281) port backtype.storm.testing to java

2016-11-18 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reassigned STORM-1281:
--

Assignee: Robert Joseph Evans  (was: Abhishek Agarwal)

> port  backtype.storm.testing to java
> 
>
> Key: STORM-1281
> URL: https://issues.apache.org/jira/browse/STORM-1281
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Labels: java-migration, jstorm-merger
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> lots of helper functions/macros for running tests.  Some might need to stay 
> in cojure with a java equivalent that can be used when other tests are ported 
> over,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2193) Storm UI/Logviewer passing in params in wrong order to FilterConfiguration

2016-11-08 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2193:
--

 Summary: Storm UI/Logviewer passing in params in wrong order to 
FilterConfiguration
 Key: STORM-2193
 URL: https://issues.apache.org/jira/browse/STORM-2193
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 2.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker


FilterConfiguration has a few constructors and the order of the params on one 
of them seems off, but both the ui and logviewer are passing the params in the 
wrong order to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2190) Topology submission blocked behind scheduling

2016-11-07 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2190:
--

 Summary: Topology submission blocked behind scheduling
 Key: STORM-2190
 URL: https://issues.apache.org/jira/browse/STORM-2190
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 0.10.0, 1.0.0, 2.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


The submit-lock in nimbus seems to protect some very large and slow sections of 
code.  As more and more topologies are submitted scheduling can take longer and 
longer to complete making submitting a topology take increasingly longer.  But 
most of scheduling does not need to be protected by this lock.  Only a small 
section of the scheduler pulls state from zookeeper that the lock protects 
elsewhere.

We should split this lock up and protect scheduling separate from protecting 
StormBase stored in zk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1281) port backtype.storm.testing to java

2016-11-05 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639711#comment-15639711
 ] 

Robert Joseph Evans commented on STORM-1281:


[~abhishek.agarwal],

now that we have pull requests open for nimbus and worker are you planning on 
working on this, or can I take a crack at it?

> port  backtype.storm.testing to java
> 
>
> Key: STORM-1281
> URL: https://issues.apache.org/jira/browse/STORM-1281
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Abhishek Agarwal
>  Labels: java-migration, jstorm-merger
>
> lots of helper functions/macros for running tests.  Some might need to stay 
> in cojure with a java equivalent that can be used when other tests are ported 
> over,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-31 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622806#comment-15622806
 ] 

Robert Joseph Evans commented on STORM-2175:


[~Srdo],

I updated the code with your suggestions.  If you could try again that would be 
great.

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> at sun.reflect.GeneratedMethodAccessor303.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:668)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> and
> {code}
> java.lang.IllegalStateException: Timer is not active
> at org.apache.storm.timer$check_active_BANG_.invoke(timer.clj:87)
> at org.apache.storm.timer$cancel_timer.invoke(timer.clj:120)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:682)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> [~Srdo] is still working on getting a reproducible use case for us. But I 
> will try to reproduce/fix it myself in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-31 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622401#comment-15622401
 ] 

Robert Joseph Evans commented on STORM-2175:


I am fine with giving them more time to shut down.  I am even OK with throwing 
an exception if it does not shut down within the desired amount of time.  We 
already limit the amount of time a worker has before it kills itself to 1 
second, so I can see why something that normally is at most once needs to be at 
least once for local mode can be confusing.

I'll close the pull requests and update them accordingly.

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> at sun.reflect.GeneratedMethodAccessor303.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:668)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> and
> {code}
> java.lang.IllegalStateException: Timer is not active
> at org.apache.storm.timer$check_active_BANG_.invoke(timer.clj:87)
> at org.apache.storm.timer$cancel_timer.invoke(timer.clj:120)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:682)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> [~Srdo] is still working on getting a reproducible use case for us. But I 
> will try to reproduce/fix it myself in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1278) port backtype.storm.daemon.worker to java

2016-10-31 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622351#comment-15622351
 ] 

Robert Joseph Evans commented on STORM-1278:


Cool, it had been several months and I just wanted to be sure it wasn't dropped.

> port backtype.storm.daemon.worker to java
> -
>
> Key: STORM-1278
> URL: https://issues.apache.org/jira/browse/STORM-1278
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Abhishek Agarwal
>  Labels: java-migration, jstorm-merger
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/worker
>  as an example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-28 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616570#comment-15616570
 ] 

Robert Joseph Evans commented on STORM-2175:


I put up two pull requests, one for master and another for 1.x.  I manually 
forced Interrupted exceptions for the test [~Srdo] posted and verified that 
this works.

I also found a number of leaked threads from nimbus and a few other possible 
places. I'll file a follow on JIRA for them, because they are not causing 
issues right now.

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> at sun.reflect.GeneratedMethodAccessor303.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:668)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> and
> {code}
> java.lang.IllegalStateException: Timer is not active
> at org.apache.storm.timer$check_active_BANG_.invoke(timer.clj:87)
> at org.apache.storm.timer$cancel_timer.invoke(timer.clj:120)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:682)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> [~Srdo] is still working on getting a reproducible use case for us. But I 
> will try to reproduce/fix it myself in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-28 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616424#comment-15616424
 ] 

Robert Joseph Evans commented on STORM-2175:


Yes, that is just what I am doing, and also making ProcessSimulator eat 
InterruptedExceptions (because a test shouldn't fail for an exception that we 
expect to happen and ignore in other parts of the code).

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> at sun.reflect.GeneratedMethodAccessor303.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:668)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> and
> {code}
> java.lang.IllegalStateException: Timer is not active
> at org.apache.storm.timer$check_active_BANG_.invoke(timer.clj:87)
> at org.apache.storm.timer$cancel_timer.invoke(timer.clj:120)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:682)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> [~Srdo] is still working on getting a reproducible use case for us. But I 
> will try to reproduce/fix it myself in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-28 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616279#comment-15616279
 ] 

Robert Joseph Evans commented on STORM-2175:


OK I was looking through the logs and found what happened (and honestly I think 
it is actually a good thing).

When shutting down a local mode cluster inside testing.clj we have.

{code}
  (doseq [s @(:supervisors cluster-map)]
(.shutdownAllWorkers s)
;; race condition here? will it launch the workers again?
(.close s))
  (ProcessSimulator/killAllProcesses)
{code}

NOTE: the comment above is not what is causing this issue.

So all of the supervisors are shut down, first by killing all of the worker 
processes and then by closing the supervisor.
After all of the supervisors are shut down, just to be sure, we then kill all 
of the processes still registered with the process simulator.

The code for killing all of the workers in a supervisor is the following...

{code}
public synchronized void shutdownAllWorkers() {
for (Slot slot: slots.values()) {
slot.setNewAssignment(null);
}

for (Slot slot: slots.values()) {
try {
int count = 0;
while (slot.getMachineState() != MachineState.EMPTY) {
if (count > 10) {
LOG.warn("DONE waiting for {} to finish {}", slot, 
slot.getMachineState());
break;
}
if (Time.isSimulating()) {
Time.advanceTime(1000);
Thread.sleep(100);
} else {
Time.sleep(100);
}
count++;
}
} catch (Exception e) {
LOG.error("Error trying to shutdown workers in {}", slot, e);
}
}
}
{code}

It tells the Slot that it is not assigned anything any more and waits for it to 
kill the worker under it.  I saw in the logs that for the one worker in 
question it timed out ("DONE waiting for...") and went on to kill/shut down 
other things. 

The ProcessSimulator code to kill local mode worker is.

{code}
/**
 * Kill a process
 *
 * @param pid
 */
public static void killProcess(String pid) {
synchronized (lock) {
LOG.info("Begin killing process " + pid);
Shutdownable shutdownHandle = processMap.get(pid);
if (shutdownHandle != null) {
shutdownHandle.shutdown();
}
processMap.remove(pid);
LOG.info("Successfully killed process " + pid);
}
}

/**
 * Kill all processes
 */
public static void killAllProcesses() {
Set pids = processMap.keySet();
for (String pid : pids) {
killProcess(pid);
}
}
{code}

Inside the logs I don't see a corresponding "Successfully killed process " for 
the "Begin killing process ".  This means that an exception was thrown during 
the shutdown process.  Looking at the code the only exception that could have 
caused this is an InterruptedException or something caused by it (or else the 
entire processes would have exited).

This means that the ProcessSimulator partially shut down the worker, then the 
worker threw the exception, and ProcessSimulator failed to remove the worker 
from the map.  That way when the follow on code to kill anything registered 
with the process simulator that might have leaked is called it ends up trying 
to shoot the process yet again, which the code we found does not like.

Everything seems perfectly reasonable, except for the part where some of our 
data structures don't like to be shut down multiple times.  This seems against 
what most other software does.  Exactly once is hard so lets make sure at least 
once works.

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> 

[jira] [Updated] (STORM-2175) Supervisor V2 can possibly shut down workers twice in local mode

2016-10-28 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2175:
---
Affects Version/s: 1.1.0

> Supervisor V2 can possibly shut down workers twice in local mode
> 
>
> Key: STORM-2175
> URL: https://issues.apache.org/jira/browse/STORM-2175
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>
> See https://github.com/apache/storm/pull/1697#issuecomment-256456889
> {code}
> java.lang.NullPointerException
> at 
> org.apache.storm.utils.DisruptorQueue$FlusherPool.stop(DisruptorQueue.java:110)
> at 
> org.apache.storm.utils.DisruptorQueue$Flusher.close(DisruptorQueue.java:293)
> at 
> org.apache.storm.utils.DisruptorQueue.haltWithInterrupt(DisruptorQueue.java:410)
> at 
> org.apache.storm.disruptor$halt_with_interrupt_BANG_.invoke(disruptor.clj:77)
> at 
> org.apache.storm.daemon.executor$mk_executor$reify__4923.shutdown(executor.clj:412)
> at sun.reflect.GeneratedMethodAccessor303.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:668)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> and
> {code}
> java.lang.IllegalStateException: Timer is not active
> at org.apache.storm.timer$check_active_BANG_.invoke(timer.clj:87)
> at org.apache.storm.timer$cancel_timer.invoke(timer.clj:120)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify__5552$shutdown_STAR___5572.invoke(worker.clj:682)
> at 
> org.apache.storm.daemon.worker$fn__5550$exec_fn__1372__auto__$reify$reify__5598.shutdown(worker.clj:706)
> at org.apache.storm.ProcessSimulator.killProcess(ProcessSimulator.java:66)
> at 
> org.apache.storm.ProcessSimulator.killAllProcesses(ProcessSimulator.java:79)
> at 
> org.apache.storm.testing$kill_local_storm_cluster.invoke(testing.clj:207)
> at org.apache.storm.testing4j$_withLocalCluster.invoke(testing4j.clj:93)
> at org.apache.storm.Testing.withLocalCluster(Unknown Source)
> {code}
> [~Srdo] is still working on getting a reproducible use case for us. But I 
> will try to reproduce/fix it myself in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (STORM-1985) Provide a tool for showing and killing corrupted topology

2016-10-25 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans closed STORM-1985.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Thanks [~bkamal],

I merged this into master.

> Provide a tool for showing and killing corrupted topology
> -
>
> Key: STORM-1985
> URL: https://issues.apache.org/jira/browse/STORM-1985
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Reporter: Jungtaek Lim
>Assignee: Kamal
>  Labels: newbie
> Fix For: 2.0.0
>
> Attachments: AdminCommands.java, proposal_admin_tool_design.docx
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> After STORM-1976, Nimbus doesn't clean up corrupted topologies.
> (corrupted topology means the topology whose codes are not available on 
> blobstore.)
> Also after STORM-1977, no Nimbus is gaining leadership if one or more 
> topologies are corrupted, which means all nimbuses will be no-op.
> So we should provide a tool to kill specific topology without accessing 
> leader nimbus (because there's no leader nimbus at that time). The tool 
> should also determine which topologies are corrupted, and show its list or 
> clean up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2153) New Metrics Reporting API

2016-10-21 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595879#comment-15595879
 ] 

Robert Joseph Evans commented on STORM-2153:


I have not dug into the internals of how dropwizard represents a histogram.  
But all of the histogram implementations that I know of allow for simple 
aggregations over time.  They usually play games with how to reduce the size by 
bucketizing values.  Look at HDR histogram and how it lets you merge multiple 
sub-histograms together to pull out percentiles after they are merged.  Meters 
typically use different aggregations from counters, because a meter is a set 
value, I don't have to worry about how much it has changed over time.  Counters 
can roll over/reset and detecting which one happened can sometimes be difficult 
to get right.

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the 

[jira] [Created] (STORM-2171) blob recovery on a single host results in deadlock

2016-10-21 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2171:
--

 Summary: blob recovery on a single host results in deadlock
 Key: STORM-2171
 URL: https://issues.apache.org/jira/browse/STORM-2171
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 2.0.0
Reporter: Robert Joseph Evans


It might be more versions but I have only tested this on 2.x.

Essentially when trying to find replicas to copy blobs from LocalFSBlobStore 
does not exclude itself.  This results in a deadlock where it is holding a lock 
trying to download the blob, and at the same time has done a request back to 
itself trying to download the blob, but it will never finish because it is 
blocked on the same lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2153) New Metrics Reporting API

2016-10-20 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592490#comment-15592490
 ] 

Robert Joseph Evans commented on STORM-2153:


I would rather keep it in nimbus.  I know the primary use case is for the UI, 
but I also want to be abel to query it from nimbus so the scheduler can use the 
metrics for elasticity.  Architecturally and from a security standpoint I would 
rather keep it with nimbus.  The UI could be running on a separate box from 
nimbus that is more open in some ways and less open in others.  I could easily 
see a setup where the UI is not directly accessible from the worker nodes.  But 
I really don't care where it is so long as we have it be separately 
configurable by both host and port.

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship 

[jira] [Commented] (STORM-2153) New Metrics Reporting API

2016-10-19 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589782#comment-15589782
 ] 

Robert Joseph Evans commented on STORM-2153:


I agree that we eventually want the metrics to be HA.  It should be fairly 
simple to set up a periodic sync to the other nimbus instances, (and I think 
rocks DB supports versioning so we might be able to get a lot of it for free.  
In the worst case we would lose some metrics which I don't see as being that 
critical.

I would say lets start out with no HA for metrics and then add it in 
afterwards.  It is no worse then what we have now where we can lose metrics 
when a worker goes down.

> New Metrics Reporting API
> -
>
> Key: STORM-2153
> URL: https://issues.apache.org/jira/browse/STORM-2153
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: P. Taylor Goetz
>
> This is a proposal to provide a new metrics reporting API based on [Coda 
> Hale's metrics library | http://metrics.dropwizard.io/3.1.0/] (AKA 
> Dropwizard/Yammer metrics).
> h2. Background
> In a [discussion on the dev@ mailing list | 
> http://mail-archives.apache.org/mod_mbox/storm-dev/201610.mbox/%3ccagx0urh85nfh0pbph11pmc1oof6htycjcxsxgwp2nnofukq...@mail.gmail.com%3e]
>   a number of community and PMC members recommended replacing Storm’s metrics 
> system with a new API as opposed to enhancing the existing metrics system. 
> Some of the objections to the existing metrics API include:
> # Metrics are reported as an untyped Java object, making it very difficult to 
> reason about how to report it (e.g. is it a gauge, a counter, etc.?)
> # It is difficult to determine if metrics coming into the consumer are 
> pre-aggregated or not.
> # Storm’s metrics collection occurs through a specialized bolt, which in 
> addition to potentially affecting system performance, complicates certain 
> types of aggregation when the parallelism of that bolt is greater than one.
> In the discussion on the developer mailing list, there is growing consensus 
> for replacing Storm’s metrics API with a new API based on Coda Hale’s metrics 
> library. This approach has the following benefits:
> # Coda Hale’s metrics library is very stable, performant, well thought out, 
> and widely adopted among open source projects (e.g. Kafka).
> # The metrics library provides many existing metric types: Meters, Gauges, 
> Counters, Histograms, and more.
> # The library has a pluggable “reporter” API for publishing metrics to 
> various systems, with existing implementations for: JMX, console, CSV, SLF4J, 
> Graphite, Ganglia.
> # Reporters are straightforward to implement, and can be reused by any 
> project that uses the metrics library (i.e. would have broader application 
> outside of Storm)
> As noted earlier, the metrics library supports pluggable reporters for 
> sending metrics data to other systems, and implementing a reporter is fairly 
> straightforward (an example reporter implementation can be found here). For 
> example if someone develops a reporter based on Coda Hale’s metrics, it could 
> not only be used for pushing Storm metrics, but also for any system that used 
> the metrics library, such as Kafka.
> h2. Scope of Effort
> The effort to implement a new metrics API for Storm can be broken down into 
> the following development areas:
> # Implement API for Storms internal worker metrics: latencies, queue sizes, 
> capacity, etc.
> # Implement API for user defined, topology-specific metrics (exposed via the 
> {{org.apache.storm.task.TopologyContext}} class)
> # Implement API for storm daemons: nimbus, supervisor, etc.
> h2. Relationship to Existing Metrics
> This would be a new API that would not affect the existing metrics API. Upon 
> completion, the old metrics API would presumably be deprecated, but kept in 
> place for backward compatibility.
> Internally the current metrics API uses Storm bolts for the reporting 
> mechanism. The proposed metrics API would not depend on any of Storm's 
> messaging capabilities and instead use the [metrics library's built-in 
> reporter mechanism | 
> http://metrics.dropwizard.io/3.1.0/manual/core/#man-core-reporters]. This 
> would allow users to use existing {{Reporter}} implementations which are not 
> Storm-specific, and would simplify the process of collecting metrics. 
> Compared to Storm's {{IMetricCollector}} interface, implementing a reporter 
> for the metrics library is much more straightforward (an example can be found 
> [here | 
> https://github.com/dropwizard/metrics/blob/3.2-development/metrics-core/src/main/java/com/codahale/metrics/ConsoleReporter.java].
> The new metrics capability would not use or affect the ZooKeeper-based 
> metrics used by Storm UI.
> h2. Relationship to JStorm Metrics
> [TBD]
> h2. Target Branches
> [TBD]
> h2. Performance Implications
> 

[jira] [Commented] (STORM-1276) port backtype.storm.daemon.nimbus to java

2016-10-01 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15538684#comment-15538684
 ] 

Robert Joseph Evans commented on STORM-1276:


[~basti.lj],

It has been several months now.  Would it be possible for you to post what you 
have done so far?  If not I am happy to take this over for you.

> port backtype.storm.daemon.nimbus to java
> -
>
> Key: STORM-1276
> URL: https://issues.apache.org/jira/browse/STORM-1276
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: Robert Joseph Evans
>Assignee: Basti Liu
>  Labels: java-migration, jstorm-merger
> Attachments: Nimbus_Diff.pdf
>
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/nimbus
> as a possible example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (STORM-2130) 1.0.x does not compile (BaseConfigurationDeclarer)

2016-09-29 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans closed STORM-2130.
--
Resolution: Duplicate

Actually I just needed to pull in STORM-1913 t the 1.0.x branch and it fixed 
the issue.  So no need for a separate JIRA/Pull request. 

> 1.0.x does not compile (BaseConfigurationDeclarer)
> --
>
> Key: STORM-2130
> URL: https://issues.apache.org/jira/browse/STORM-2130
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.3
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Blocker
>
> Looks like we missed pulling something into 1.0.x-branch because if I use the 
> version of 
> storm-core/src/jvm/org/apache/storm/topology/BaseConfigurationDeclarer.java 
> from 1.x-branch everything works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-2130) 1.0.x does not compile (BaseConfigurationDeclarer)

2016-09-29 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2130:
--

 Summary: 1.0.x does not compile (BaseConfigurationDeclarer)
 Key: STORM-2130
 URL: https://issues.apache.org/jira/browse/STORM-2130
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 1.0.3
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker


Looks like we missed pulling something into 1.0.x-branch because if I use the 
version of 
storm-core/src/jvm/org/apache/storm/topology/BaseConfigurationDeclarer.java 
from 1.x-branch everything works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-2124) Show requested cpu/memory for each component on topology and component page

2016-09-29 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532919#comment-15532919
 ] 

Robert Joseph Evans commented on STORM-2124:


Also merged it to 1.x-branch.

> Show requested cpu/memory for each component on topology and component page
> ---
>
> Key: STORM-2124
> URL: https://issues.apache.org/jira/browse/STORM-2124
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
> Fix For: 2.0.0, 1.1.0
>
>
> Work done by [~jerrypeng] at Yahoo to add requested cpu and memory per 
> component in both the topology and component pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (STORM-2017) ShellBolt stops reporting task ids

2016-09-28 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2017.

   Resolution: Fixed
Fix Version/s: 1.0.3
   1.1.0
   2.0.0

Thanks [~kluoto],

I merged this into master, 1.x-branch and 1.0.x-branch.  Keep up the good work.

> ShellBolt stops reporting task ids
> --
>
> Key: STORM-2017
> URL: https://issues.apache.org/jira/browse/STORM-2017
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.1, 1.0.3
>Reporter: Lasse Kiviluoto
>Assignee: Lasse Kiviluoto
> Fix For: 2.0.0, 1.1.0, 1.0.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After running enough flow throw ShellBolt in some cases after tens of minutes 
> ShellBolt stopped reporting task ids. After this error condition no new task 
> ids where reported back. When acking of the tuples processed by the bolt 
> where set in callback related to arrival of the task ids all tuple trees 
> going through the bolt would fail after reporting stopped. ShellBolt will 
> continue to operate new tuples and respond to heartbeats.
> After running some tests and making some changes to the code. I have 
> following hypothesis for the reason:
> org.apache.storm.utils.ShellBoltMessageQueue has two queues one being for 
> taskIds and the other for bolt messages.
> taskIds queue is implemented by LinkedList and bolt msg queue 
> LinkedBlockingQueue. Both of the queues are operated similarly.
> One major difference between the structures is that LinkedList is not 
> synchronized.
> In the code:
> ShellBoltMessageQueue.java:58 add method is used without holding the lock. 
> Where as ShellBoltMessageQueue.java:110 uses the poll method with the lock. 
> As in ShellBolt BoltReaderRunnable and BoltWriterRunnable are run 
> concurrently this can lead to race condition.
> If I move the ShellBoltMessageQueue.java:58 inside the lock and run the test 
> in similar fashion it seems to solve the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-2017) ShellBolt stops reporting task ids

2016-09-28 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2017:
---
Assignee: Lasse Kiviluoto

> ShellBolt stops reporting task ids
> --
>
> Key: STORM-2017
> URL: https://issues.apache.org/jira/browse/STORM-2017
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.1, 1.0.3
>Reporter: Lasse Kiviluoto
>Assignee: Lasse Kiviluoto
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After running enough flow throw ShellBolt in some cases after tens of minutes 
> ShellBolt stopped reporting task ids. After this error condition no new task 
> ids where reported back. When acking of the tuples processed by the bolt 
> where set in callback related to arrival of the task ids all tuple trees 
> going through the bolt would fail after reporting stopped. ShellBolt will 
> continue to operate new tuples and respond to heartbeats.
> After running some tests and making some changes to the code. I have 
> following hypothesis for the reason:
> org.apache.storm.utils.ShellBoltMessageQueue has two queues one being for 
> taskIds and the other for bolt messages.
> taskIds queue is implemented by LinkedList and bolt msg queue 
> LinkedBlockingQueue. Both of the queues are operated similarly.
> One major difference between the structures is that LinkedList is not 
> synchronized.
> In the code:
> ShellBoltMessageQueue.java:58 add method is used without holding the lock. 
> Where as ShellBoltMessageQueue.java:110 uses the poll method with the lock. 
> As in ShellBolt BoltReaderRunnable and BoltWriterRunnable are run 
> concurrently this can lead to race condition.
> If I move the ShellBoltMessageQueue.java:58 inside the lock and run the test 
> in similar fashion it seems to solve the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (STORM-2117) Supervisor V2 with local mode extracts resources directory to topology root directory instead of temporary directory

2016-09-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans closed STORM-2117.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

thanks [~kabhwan],

I merged this into master and will pull it into my 1.x pull request.

> Supervisor V2 with local mode extracts resources directory to topology root 
> directory instead of temporary directory
> 
>
> Key: STORM-2117
> URL: https://issues.apache.org/jira/browse/STORM-2117
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Critical
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> 20:43:57.692 [timer] INFO  o.a.s.d.nimbus - Setting new assignment for 
> topology id storm-sql-1-1474544636: 
> #org.apache.storm.daemon.common.Assignment{:master-code-dir 
> "/var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T//ba7ef925-0dd5-4ddd-91ad-02ff877b99ef",
>  :node->host {"35e30049-e731-40e7-a093-2ac22d9efa34" "192.168.0.9"}, 
> :executor->node+port {[4 4] ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024], [3 
> 3] ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024], [6 6] 
> ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024], [2 2] 
> ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024], [5 5] 
> ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024], [1 1] 
> ["35e30049-e731-40e7-a093-2ac22d9efa34" 1024]}, :executor->start-time-secs 
> {[1 1] 1474544637, [2 2] 1474544637, [3 3] 1474544637, [4 4] 1474544637, [5 
> 5] 1474544637, [6 6] 1474544637}, :worker->resources 
> {["35e30049-e731-40e7-a093-2ac22d9efa34" 1024] [0.0 0.0 0.0]}}
> 20:43:57.824 [Async Localizer] INFO  o.a.c.f.i.CuratorFrameworkImpl - Starting
> 20:43:57.825 [SLOT_1024] INFO  o.a.s.d.s.Slot - STATE EMPTY msInState: 1016 
> -> WAITING_FOR_BASIC_LOCALIZATION msInState: 0
> 20:43:57.827 [Async Localizer] INFO  o.a.s.b.FileBlobStoreImpl - Creating new 
> blob store based in 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/ba7ef925-0dd5-4ddd-91ad-02ff877b99ef/blobs
> 20:43:57.830 [Async Localizer-EventThread] INFO  
> o.a.c.f.s.ConnectionStateManager - State change: CONNECTED
> 20:43:57.846 [Curator-Framework-0] INFO  o.a.c.f.i.CuratorFrameworkImpl - 
> backgroundOperationsLoop exiting
> 20:43:57.920 [Async Localizer] INFO  o.a.s.l.AsyncLocalizer - Extracting 
> resources from jar at 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/lib/ant-javafx.jar
>  to 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/043fbb03-fbdc-4271-bbae-923f1138745d/supervisor/tmp/0271204f-6007-4571-b8e7-4a7aa124f274/resources
> 20:43:57.941 [Async Localizer] WARN  o.a.s.l.AsyncLocalizer - Failed to 
> download basic resources for topology-id storm-sql-1-1474544636
> 20:43:57.941 [Async Localizer] INFO  o.a.s.d.s.AdvancedFSOps - Deleting path 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/043fbb03-fbdc-4271-bbae-923f1138745d/supervisor/tmp/0271204f-6007-4571-b8e7-4a7aa124f274
> 20:43:57.944 [Async Localizer] INFO  o.a.s.d.s.AdvancedFSOps - Deleting path 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/043fbb03-fbdc-4271-bbae-923f1138745d/supervisor/stormdist/storm-sql-1-1474544636
> 20:43:57.947 [Async Localizer] WARN  o.a.s.l.AsyncLocalizer - Caught 
> Exception While Downloading (rethrowing)... 
> java.nio.file.FileSystemException: 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/043fbb03-fbdc-4271-bbae-923f1138745d/supervisor/tmp/0271204f-6007-4571-b8e7-4a7aa124f274
>  -> 
> /var/folders/t7/dd7tgll1435bm2kbsdy35888gn/T/043fbb03-fbdc-4271-bbae-923f1138745d/supervisor/stormdist/storm-sql-1-1474544636:
>  Directory not empty
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) 
> ~[?:1.8.0_66]
>   at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) 
> ~[?:1.8.0_66]
>   at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396) ~[?:1.8.0_66]
>   at 
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262) 
> ~[?:1.8.0_66]
>   at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_66]
>   at 
> org.apache.storm.daemon.supervisor.AdvancedFSOps.moveDirectoryPreferAtomic(AdvancedFSOps.java:159)
>  ~[classes/:?]
>   at 
> org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:144)
>  [classes/:?]
>   at 
> org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:98)
>  [classes/:?]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_66]
>   at 
> 

[jira] [Created] (STORM-2122) Supervisor V2 can use a lot more memory

2016-09-23 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-2122:
--

 Summary: Supervisor V2 can use a lot more memory
 Key: STORM-2122
 URL: https://issues.apache.org/jira/browse/STORM-2122
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


When launching a worker the supervisor will read in the complete topology to 
get a small amount of data out of it.  This can be very large.  In the past we 
would launch the workers one at a time, so we only needed enough memory to read 
it in once, now we are doing it in parallel, and lots of these tend to show up 
very close to one another.  We need to find a way to make it so that we read 
the data one, and ideally cache it someplace that everyone can share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-2109) Under supervisor V2 SUPERVISOR_MEMORY_CAPACITY_MB and SUPERVISOR_CPU_CAPACITY must be Doubles

2016-09-21 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2109:
---
Priority: Critical  (was: Major)

> Under supervisor V2 SUPERVISOR_MEMORY_CAPACITY_MB and SUPERVISOR_CPU_CAPACITY 
> must be Doubles
> -
>
> Key: STORM-2109
> URL: https://issues.apache.org/jira/browse/STORM-2109
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> Just found this rolling out Supervisor V2 to staging env, but it is a simple 
> fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-2018) Simplify Threading Model of the Supervisor

2016-09-19 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated STORM-2018:
---
Fix Version/s: 2.0.0

> Simplify Threading Model of the Supervisor
> --
>
> Key: STORM-2018
> URL: https://issues.apache.org/jira/browse/STORM-2018
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Fix For: 2.0.0
>
> Attachments: Slot.dot, Slot.svg
>
>  Time Spent: 40h 10m
>  Remaining Estimate: 0h
>
> We have been trying to roll out CGROUP enforcement and right now are running 
> into a number of race conditions in the supervisor.  When using CGROUPS the 
> timing of some operations are different and are exposing issues that we would 
> not see without this.
> In order to make progress with testing/deploying CGROUP and RAS we are going 
> to try and refactor the supervisor to have a simpler threading model, but 
> likely with more threads.  We will base the code off of the java code 
> currently in master, and may replace that in the 2.0 release, but plan on 
> having it be a part of 1.x too, if it truly is more stable.
> I will try to keep this JIRA up to date with what we are doing and the 
> architecture to keep the community informed.  We need to move quickly to meet 
> some of our company goals but will not just shove this in.  We welcome any 
> feedback on the design and code before it goes into the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5