[jira] [Resolved] (STORM-2807) Integration test should shut down topologies immediately after the test

2017-11-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing resolved STORM-2807.
---
   Resolution: Fixed
Fix Version/s: 1.1.2
   1.2.0
   2.0.0

> Integration test should shut down topologies immediately after the test
> ---
>
> Key: STORM-2807
> URL: https://issues.apache.org/jira/browse/STORM-2807
> Project: Apache Storm
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 1.1.1
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0, 1.1.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The integration test kills topologies with the default 30 second timeout. 
> This is unnecessary and delays the following tests, because the killed 
> topology is still occupying worker slots.
> When the integration test kills topologies, it tries sending the kill message 
> to Nimbus once, and may fail quietly. This breaks following tests, because 
> the default Storm install has only 4 worker slots, and the test topologies 
> each take up 3. When a topology is not shut down, it prevents the following 
> topologies from being assigned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2807) Integration test should shut down topologies immediately after the test

2017-11-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2807:
--
Affects Version/s: 1.1.1

> Integration test should shut down topologies immediately after the test
> ---
>
> Key: STORM-2807
> URL: https://issues.apache.org/jira/browse/STORM-2807
> Project: Apache Storm
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 1.1.1
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0, 1.1.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The integration test kills topologies with the default 30 second timeout. 
> This is unnecessary and delays the following tests, because the killed 
> topology is still occupying worker slots.
> When the integration test kills topologies, it tries sending the kill message 
> to Nimbus once, and may fail quietly. This breaks following tests, because 
> the default Storm install has only 4 worker slots, and the test topologies 
> each take up 3. When a topology is not shut down, it prevents the following 
> topologies from being assigned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2525) Fix flaky integration tests

2017-11-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2525:
--
Labels: pull-request-available  (was: )

> Fix flaky integration tests
> ---
>
> Key: STORM-2525
> URL: https://issues.apache.org/jira/browse/STORM-2525
> Project: Apache Storm
>  Issue Type: Bug
>  Components: integration-test
>Affects Versions: 2.0.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The integration tests fail fairly often, e.g. 
> https://travis-ci.org/apache/storm/jobs/233690012. The tests should be fixed 
> so they're more reliable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2722) JMSSpout test fails way too often

2017-11-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2722:
--
Affects Version/s: 1.1.1

> JMSSpout test fails way too often
> -
>
> Key: STORM-2722
> URL: https://issues.apache.org/jira/browse/STORM-2722
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-jms
>Affects Versions: 2.0.0, 1.1.1
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.storm.jms.spout.JmsSpoutTest.testFailure(JmsSpoutTest.java:62)
> {code}
> Which corresponds to 
> https://github.com/apache/storm/blob/d6e5e6d4e0a20c4c9f0ce0e3000e730dcb4700da/external/storm-jms/src/test/java/org/apache/storm/jms/spout/JmsSpoutTest.java?utf8=%E2%9C%93#L62



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2722) JMSSpout test fails way too often

2017-11-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2722:
--
Fix Version/s: 1.2.0

> JMSSpout test fails way too often
> -
>
> Key: STORM-2722
> URL: https://issues.apache.org/jira/browse/STORM-2722
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-jms
>Affects Versions: 2.0.0, 1.1.1
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.storm.jms.spout.JmsSpoutTest.testFailure(JmsSpoutTest.java:62)
> {code}
> Which corresponds to 
> https://github.com/apache/storm/blob/d6e5e6d4e0a20c4c9f0ce0e3000e730dcb4700da/external/storm-jms/src/test/java/org/apache/storm/jms/spout/JmsSpoutTest.java?utf8=%E2%9C%93#L62



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2722) JMSSpout test fails way too often

2017-11-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248480#comment-16248480
 ] 

Stig Rohde Døssing commented on STORM-2722:
---

Pulled this back to 1.x-branch since the test also exists there.

> JMSSpout test fails way too often
> -
>
> Key: STORM-2722
> URL: https://issues.apache.org/jira/browse/STORM-2722
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-jms
>Affects Versions: 2.0.0, 1.1.1
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:92)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at org.junit.Assert.assertTrue(Assert.java:54)
>   at 
> org.apache.storm.jms.spout.JmsSpoutTest.testFailure(JmsSpoutTest.java:62)
> {code}
> Which corresponds to 
> https://github.com/apache/storm/blob/d6e5e6d4e0a20c4c9f0ce0e3000e730dcb4700da/external/storm-jms/src/test/java/org/apache/storm/jms/spout/JmsSpoutTest.java?utf8=%E2%9C%93#L62



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2809) Integration test is failing consistently and topologies sometimes fail to start workers

2017-11-11 Thread JIRA
Stig Rohde Døssing created STORM-2809:
-

 Summary: Integration test is failing consistently and topologies 
sometimes fail to start workers
 Key: STORM-2809
 URL: https://issues.apache.org/jira/browse/STORM-2809
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Stig Rohde Døssing
 Fix For: 2.0.0


The integration test has been failing fairly consistently since 
https://github.com/apache/storm/pull/2363. I tried running the test outside a 
VM with a locally installed Storm setup, and it has failed every time for me.

Most runs seem to fail in ways that make it look like the integration test is 
just flaky (e.g. tuple windows not matching the calculated window), but in at 
least a few tests I saw the topology get submitted to Nimbus followed by about 
3 minutes of nothing happening. The workers never started and the supervisor 
didn't seem aware of the scheduling. The only evidence that the topology was 
submitted was in the Nimbus log. This still happens even if the test topologies 
are killed with a timeout of 0, so there should be slots free for the next test 
immediately.

I tried reverting https://github.com/apache/storm/pull/2363 and it seems to 
make the integration test pass much more often. Over 5 runs there was still an 
instance of a supervisor failing to start the workers, but the other 4 passed.

We should try to fix whatever is causing the supervisor to fail to start 
workers, and get the integration test more stable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2535) test-reset-timeout is flaky. Replace with a more reliable test.

2017-11-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2535:
--
Labels: pull-request-available  (was: )

> test-reset-timeout is flaky. Replace with a more reliable test.
> ---
>
> Key: STORM-2535
> URL: https://issues.apache.org/jira/browse/STORM-2535
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> test-reset-timeout is flaky, because the Time.sleep calls in the test bolt 
> can race with the calls to advanceClusterTime in the main thread. Also the 
> test breaks if the spout's pending map gets rotated at an unlucky time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (STORM-2549) The fix for STORM-2343 is incomplete, and the spout can still get stuck on failed tuples

2017-11-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing resolved STORM-2549.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

> The fix for STORM-2343 is incomplete, and the spout can still get stuck on 
> failed tuples
> 
>
> Key: STORM-2549
> URL: https://issues.apache.org/jira/browse/STORM-2549
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Example:
> Say maxUncommittedOffsets is 10, maxPollRecords is 5, and the committedOffset 
> is 0.
> The spout will initially emit up to offset 10, because it is allowed to poll 
> until numNonRetriableTuples is >= maxUncommittedOffsets
> The spout will be allowed to emit another 5 tuples if offset 10 fails, so if 
> that happens, offsets 10-14 will get emitted. If offset 1 fails and 2-14 get 
> acked, the spout gets stuck because it will count the "extra tuples" 11-14 in 
> numNonRetriableTuples.
> An similar case is the one where maxPollRecords doesn't divide 
> maxUncommittedOffsets evenly. If it were 3 in the example above, the spout 
> might just immediately emit offsets 1-12. If 2-12 get acked, offset 1 cannot 
> be reemitted.
> The proposed solution is the following:
> * Enforce maxUncommittedOffsets on a per partition basis (i.e. actual limit 
> will be multiplied by the number of partitions) by always allowing poll for 
> retriable tuples that are within maxUncommittedOffsets tuples of the 
> committed offset. Pause any non-retriable partitions if the partition has 
> passed the maxUncommittedOffsets limit, and some other partition is polling 
> for retries while also at the maxUncommittedOffsets limit. 
> Example of this functionality:
> MaxUncommittedOffsets is 100
> MaxPollRecords is 10
> Committed offset for partition 0 and 1 is 0.
> Partition 0 has emitted 0
> Partition 1 has emitted 0...95, 97, 99, 101, 103 (some offsets compacted away)
> Partition 1, message 99 is retriable
> We check that message 99 is within 100 emitted tuples of offset 0 (it is the 
> 97th tuple after offset 0, so it is)
> We do not pause partition 0 because that partition isn't at the 
> maxUncommittedOffsets limit.
> Seek to offset 99 on partition 1 and poll
> We get back offset 99, 101, 103 and potentially 7 new tuples. Say the lowest 
> of these is at offset 104.
> The spout emits offset 99, filters out 101 and 103 because they were already 
> emitted, and emits the 7 new tuples.
> If offset 104 (or later) become retriable, they are not retried until the 
> committed offset moves. This is because offset 104 is the 101st tuple emitted 
> after offset 0, so it isn't allowed to retry until the committed offset moves.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2810) Storm-hdfs tests are leaking resources

2017-11-11 Thread JIRA
Stig Rohde Døssing created STORM-2810:
-

 Summary: Storm-hdfs tests are leaking resources
 Key: STORM-2810
 URL: https://issues.apache.org/jira/browse/STORM-2810
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-hdfs
Affects Versions: 2.0.0, 1.1.2
Reporter: Stig Rohde Døssing
Assignee: Stig Rohde Døssing


The Storm-hdfs tests are leaking resources, and it seems to be making the tests 
fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2810) Storm-hdfs tests are leaking resources

2017-11-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2810:
--
Labels: pull-request-available  (was: )

> Storm-hdfs tests are leaking resources
> --
>
> Key: STORM-2810
> URL: https://issues.apache.org/jira/browse/STORM-2810
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-hdfs
>Affects Versions: 2.0.0, 1.1.2
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>  Labels: pull-request-available
>
> The Storm-hdfs tests are leaking resources, and it seems to be making the 
> tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2797) LogViewer worker logs broken on Windows

2017-11-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-2797:
--
Labels: pull-request-available  (was: )

> LogViewer worker logs broken on Windows
> ---
>
> Key: STORM-2797
> URL: https://issues.apache.org/jira/browse/STORM-2797
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-webapp
>Affects Versions: 1.x
> Environment: Windows
>Reporter: Lawrence Craft
>Priority: Minor
>  Labels: pull-request-available
> Attachments: logviewer.log
>
>
> LogViewer worker logs are broken on Windows. Attempting to access the log 
> (e.g. 
> http://localhost:8000/log?file=word-topo-5-1509750559%5C6701%5Cworker.log) 
> leads to a 500 Server Error.
> I've attached the LogViewer logs which show the stack trace. The issue is 
> pretty clear from the log: on line 123 of logviewer.clj, the path is split 
> using the path separator as a regex. This is fine on Posix systems as / is a 
> normal character in regex; however, on Windows, backslash is the path 
> separator. As this is also the regex escape character, it is not a valid 
> regular expression.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)