[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986049#comment-15986049 ] wangkui commented on STORM-2489: [~arunmahadevan] I test again, as the same arguments as before, it seems all right. But when I make the spout emit more frequently, the ahead tuples are expired, following test I spout data with out sleep() and use Duration(4sec) window bolt, Timestamp_start-end=1493271145887-1493271149887 expired=1...291972 get=291973...1189440 new=291973...1189440 Recived=897468,RecivedTotal=897468 13:32:34.470 [pool-45-thread-1] WARN o.a.s.w.TimeEvictionPolicy - Possible clock drift or long running computation in window; Previous eviction time: 1493271149887, current eviction time: 1493271154469 Timestamp_start-end=1493271150469-1493271154469 expired=291973...1189440 get=1189441...203 new=1189441...203 Recived=1699363,RecivedTotal=2596831 13:32:39.213 [pool-45-thread-1] WARN o.a.s.w.TimeEvictionPolicy - Possible clock drift or long running computation in window; Previous eviction time: 1493271154469, current eviction time: 1493271159212 SendTotal=5435440 Timestamp_start-end=1493271155212-1493271159212 expired=1189441...203 get=204...5069647 new=204...5069647 Recived=2180844,RecivedTotal=4777675 13:32:42.311 [pool-45-thread-1] WARN o.a.s.w.TimeEvictionPolicy - Possible clock drift or long running computation in window; Previous eviction time: 1493271159212, current eviction time: 1493271162310 Timestamp_start-end=1493271158310-1493271162310 expired=204...5069647 get=5069648...5435440 new=5069648...5435440 Recived=365793,RecivedTotal=5143468 13:32:45.195 [pool-45-thread-1] WARN o.a.s.w.TimeEvictionPolicy - Possible clock drift or long running computation in window; Previous eviction time: 1493271162310, current eviction time: 1493271165194 > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2493) update documents to reflect the changes
Xin Wang created STORM-2493: --- Summary: update documents to reflect the changes Key: STORM-2493 URL: https://issues.apache.org/jira/browse/STORM-2493 Project: Apache Storm Issue Type: Bug Components: documentation Reporter: Xin Wang Assignee: Xin Wang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2493) update documents to reflect the changes
[ https://issues.apache.org/jira/browse/STORM-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Wang updated STORM-2493: Issue Type: Improvement (was: Bug) > update documents to reflect the changes > --- > > Key: STORM-2493 > URL: https://issues.apache.org/jira/browse/STORM-2493 > Project: Apache Storm > Issue Type: Improvement > Components: documentation >Reporter: Xin Wang >Assignee: Xin Wang > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985866#comment-15985866 ] Erik Weathers commented on STORM-2191: -- [~revans2] sweet. :-) I'm just glad that when Mr. Marz added {{get_jars_full}}, he didn't put in some sort method. I'm hopeful this just means it was done for functionality reasons as [I guessed above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15981693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15981693]. I do think we should still address the conflicts we've noted above, and enable some CI test to bail out if a conflict arises, as that is probably the best way to avoid such problems arising in the future. That can be separate, I'll file a ticket for it. I'll send a PR soon with my changes for wildcarding. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 > URL: https://issues.apache.org/jira/browse/STORM-2191 > Project: Apache Storm > Issue Type: Task > Components: storm-core >Affects Versions: 1.0.2 >Reporter: Erik Weathers >Priority: Minor > Labels: cli, command-line > Time Spent: 10m > Remaining Estimate: 0h > > When launching the worker daemon and its wrapping LogWriter daemon, the > commands can become so long that they eclipse the default Linux limit of 4096 > bytes. That results in commands that are cut off in {{ps}} output, and > prevents easily inspecting the system to see even what processes are running. > The specific scenario in which this problem can be easily triggered: *running > Storm on Mesos*. > h5. Details on why it happens: > # using the default Mesos containerizer instead of Docker containers, which > causes the storm-mesos package to be unpacked into the Mesos executor sandbox. > # The ["expand all jars on > classpath"|https://github.com/apache/storm/blob/6dc6407a01d032483edebb1c1b4d8b69a304d81c/bin/storm.py#L114-L140] > functionality in the {{bin/storm.py}} script causes every one of the jars > that storm bundles into its lib directory to be explicitly listed in the > command. > #* e.g., say the mesos work dir is {{/var/run/mesos/work_dir/}} > #* and say that the original classpath argument in the supervisor cmd > includes the following for the {{lib/}} dir in the binary storm package: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/*}} > #* That leads to a hugely expanded classpath argument for the LogWriter and > Worker daemons that get launched: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/asm-4.0.jar:/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/carbonite-1.4.0.jar:...}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (STORM-2490) Lambda support
[ https://issues.apache.org/jira/browse/STORM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Wang resolved STORM-2490. - Resolution: Fixed Fix Version/s: 2.0.0 Merged into master. > Lambda support > -- > > Key: STORM-2490 > URL: https://issues.apache.org/jira/browse/STORM-2490 > Project: Apache Storm > Issue Type: New Feature > Components: storm-client >Reporter: Xin Wang >Assignee: Xin Wang > Fix For: 2.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > In the past, If we want print tuples, we need to write the following code: > {code} > class PritingBolt extends BaseBasicBolt{ > @Override > public void execute(Tuple input, BasicOutputCollector collector) { > System.out.println(input); > } > @Override > public void declareOutputFields(OutputFieldsDeclarer declarer) { > // nothing > } > } > builder.setBolt("bolt2", new PritingBolt()); > {code} > Now, with this patch: > {code} > builder.setBolt("bolt2", tuple -> System.out.println(tuple)); > {code} > The above is just an simplest demo. This patch provides some new methods in > TopologyBuilder to allow you to use Java8 lambda expression: > {code} > setSpout(String id, SerializableSupplier supplier) > setSpout(String id, SerializableSupplier supplier, Number parallelism_hint) > // receiving tuple, and emitting to downstream > setBolt(String id, SerializableBiConsumer > biConsumer, String... fields) > setBolt(String id, SerializableBiConsumer > biConsumer, Number parallelism_hint, String... fields) > // receiving tuple, and never emitting to downstream > setBolt(String id, SerializableConsumer consumer) > setBolt(String id, SerializableConsumer consumer, Number > parallelism_hint) > {code} > Here is another example including the three interface usage: > {code} > // example. spout1: generate random strings > // bolt1: get the first part of a string > // bolt2: output the tuple > builder.setSpout("spout1", () -> UUID.randomUUID().toString()); > builder.setBolt("bolt1", (tuple, collector) -> { > String[] parts = tuple.getStringByField("lambda").split("\\-"); > collector.emit(new Values(parts[0])); > }, "field").shuffleGrouping("spout1"); > builder.setBolt("bolt2", tuple -> > System.out.println(tuple)).shuffleGrouping("bolt1"); > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984827#comment-15984827 ] Robert Joseph Evans commented on STORM-2191: [~erikdw], I thought we were sorting the jars. But looking at the code we are not. I am officially gobsmacked. So my reason for not using {{*}} was completely wrong and I am changing my vote to using the {{*}} 100% no need for my script because as the {{*}} is identical to what we are doing now in terms of functionality. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 > URL: https://issues.apache.org/jira/browse/STORM-2191 > Project: Apache Storm > Issue Type: Task > Components: storm-core >Affects Versions: 1.0.2 >Reporter: Erik Weathers >Priority: Minor > Labels: cli, command-line > Time Spent: 10m > Remaining Estimate: 0h > > When launching the worker daemon and its wrapping LogWriter daemon, the > commands can become so long that they eclipse the default Linux limit of 4096 > bytes. That results in commands that are cut off in {{ps}} output, and > prevents easily inspecting the system to see even what processes are running. > The specific scenario in which this problem can be easily triggered: *running > Storm on Mesos*. > h5. Details on why it happens: > # using the default Mesos containerizer instead of Docker containers, which > causes the storm-mesos package to be unpacked into the Mesos executor sandbox. > # The ["expand all jars on > classpath"|https://github.com/apache/storm/blob/6dc6407a01d032483edebb1c1b4d8b69a304d81c/bin/storm.py#L114-L140] > functionality in the {{bin/storm.py}} script causes every one of the jars > that storm bundles into its lib directory to be explicitly listed in the > command. > #* e.g., say the mesos work dir is {{/var/run/mesos/work_dir/}} > #* and say that the original classpath argument in the supervisor cmd > includes the following for the {{lib/}} dir in the binary storm package: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/*}} > #* That leads to a hugely expanded classpath argument for the LogWriter and > Worker daemons that get launched: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/asm-4.0.jar:/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/carbonite-1.4.0.jar:...}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984827#comment-15984827 ] Robert Joseph Evans edited comment on STORM-2191 at 4/26/17 1:43 PM: - [~erikdw], I thought we were sorting the jars. But looking at the code we are not. I am officially gobsmacked. So my reason for not using '\*' was completely wrong and I am changing my vote to using the '\*' 100% no need for my script because as the '\*' is identical to what we are doing now in terms of functionality. was (Author: revans2): [~erikdw], I thought we were sorting the jars. But looking at the code we are not. I am officially gobsmacked. So my reason for not using {{*}} was completely wrong and I am changing my vote to using the {{*}} 100% no need for my script because as the {{*}} is identical to what we are doing now in terms of functionality. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 > URL: https://issues.apache.org/jira/browse/STORM-2191 > Project: Apache Storm > Issue Type: Task > Components: storm-core >Affects Versions: 1.0.2 >Reporter: Erik Weathers >Priority: Minor > Labels: cli, command-line > Time Spent: 10m > Remaining Estimate: 0h > > When launching the worker daemon and its wrapping LogWriter daemon, the > commands can become so long that they eclipse the default Linux limit of 4096 > bytes. That results in commands that are cut off in {{ps}} output, and > prevents easily inspecting the system to see even what processes are running. > The specific scenario in which this problem can be easily triggered: *running > Storm on Mesos*. > h5. Details on why it happens: > # using the default Mesos containerizer instead of Docker containers, which > causes the storm-mesos package to be unpacked into the Mesos executor sandbox. > # The ["expand all jars on > classpath"|https://github.com/apache/storm/blob/6dc6407a01d032483edebb1c1b4d8b69a304d81c/bin/storm.py#L114-L140] > functionality in the {{bin/storm.py}} script causes every one of the jars > that storm bundles into its lib directory to be explicitly listed in the > command. > #* e.g., say the mesos work dir is {{/var/run/mesos/work_dir/}} > #* and say that the original classpath argument in the supervisor cmd > includes the following for the {{lib/}} dir in the binary storm package: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/*}} > #* That leads to a hugely expanded classpath argument for the LogWriter and > Worker daemons that get launched: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/asm-4.0.jar:/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/carbonite-1.4.0.jar:...}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984624#comment-15984624 ] Arun Mahadevan commented on STORM-2489: --- [~wangkui], when you get some time can you pull the updated changes and test again ? The earlier approach might have issues if the clock continuously drifts and lead to lost and overlapping tuples. I made some changes to address it. > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984440#comment-15984440 ] Erik Weathers edited comment on STORM-2191 at 4/26/17 9:45 AM: --- A few thoughts: [~revans2]: I dug a bit into the kryo clashes and it's because of using {{carbonite}} but also having a direct dependency on {{kryo}}. Though carbonite's kryo dependency gets ignored because of our own kryo dependency existing, carbonite's dependency on {{chill-java}} pulls in {{kryo-shaded}}. kryo-shaded is just kryo with some reflectasm stuff shaded in. I imagine we can work around this problem by either: # excluding kryo-shaded from the carbonite dependency in storm's pom. - OR - # excluding kryo from the carbonite dependency and adding an explicit dependency on kryo-shaded to storm's pom. Notably, the general {{lib/}} and the {{lib-webapp/}} dependencies show another conflict when I run [Bobby's script from above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15982927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15982927] \[1]: {{javax/servlet}}. That conflict arises from {{jetty-server}} doing something weird with its dependencies where it puts the {{javax.servlet}} bits under {{org.eclipse}}, but keeps the internal package path of the classes the same (doesn't relocate). We can presumably overcome this through exclusions too. [~revans2] & [~sriharsha]: I still am coming back to my doubt about the "full jar" expansion being any better than the wildcard. Both rely on running on the same host file system. The current mechanism of doing "full jar" expansion uses both Python and Java methods to get the list of files from the filesystem. My proposed mechanism of having the JVM do the wildcard expansion would switch this to doing whatever the JVM implementation does for listing files, which would presumably be just using the same filesystem primitives. I'm not seeing how the "full jar" expansion produces a better, more predictable ordering of the jars. Finally, I had another wacky idea: is there any possibility of building an uber jar of storm's dependencies? i.e., instead of all these individual 3rd party dependency jars that are being enumerated, we might be able to have a "storm-dependencies.jar" uber jar that just slaps all of the non-shaded dependencies into a single jar without relocating anything. \[1] I had to slightly modify Bobby's script. {{ls}} on my computer shows a "*" suffix for executable files, so I used {{find}} instead. was (Author: erikdw): A few thoughts: [~revans2]: I dug a bit into the kryo clashes and it's because of using {{carbonite}} but also having a direct dependency on {{kryo}}. Though carbonite's kryo dependency gets ignored because of our own kryo dependency existing, carbonite's dependency on {{chill-java}} pulls in {{kryo-shaded}}. kryo-shaded is just kryo with some reflectasm stuff shaded in. I imagine we can work around this problem by either: # excluding kryo-shaded from the carbonite dependency in storm's pom. - OR - # excluding kryo from the carbonite dependency and adding an explicit dependency on kryo-shaded to storm's pom. Notably, the general {{lib/}} and the {{lib-webapp/}} dependencies show another conflict when I run [Bobby's script from above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15982927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15982927] \[1]: {{javax/servlet}}. [~revans2] & [~sriharsha]: I still am coming back to my doubt about the "full jar" expansion being any better than the wildcard. Both rely on running on the same host file system. The current mechanism of doing "full jar" expansion uses both Python and Java methods to get the list of files from the filesystem. My proposed mechanism of having the JVM do the wildcard expansion would switch this to doing whatever the JVM implementation does for listing files, which would presumably be just using the same filesystem primitives. I'm not seeing how the "full jar" expansion produces a better, more predictable ordering of the jars. Finally, I had another wacky idea: is there any possibility of building an uber jar of storm's dependencies? i.e., instead of all these individual 3rd party dependency jars that are being enumerated, we might be able to have a "storm-dependencies.jar" uber jar that just slaps all of the non-shaded dependencies into a single jar without relocating anything. \[1] I had to slightly modify Bobby's script. {{ls}} on my computer shows a "*" suffix for executable files, so I used {{find}} instead. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 >
[jira] [Updated] (STORM-2490) Lambda support
[ https://issues.apache.org/jira/browse/STORM-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin Wang updated STORM-2490: Description: In the past, If we want print tuples, we need to write the following code: {code} class PritingBolt extends BaseBasicBolt{ @Override public void execute(Tuple input, BasicOutputCollector collector) { System.out.println(input); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // nothing } } builder.setBolt("bolt2", new PritingBolt()); {code} Now, with this patch: {code} builder.setBolt("bolt2", tuple -> System.out.println(tuple)); {code} The above is just an simplest demo. This patch provides some new methods in TopologyBuilder to allow you to use Java8 lambda expression: {code} setSpout(String id, SerializableSupplier supplier) setSpout(String id, SerializableSupplier supplier, Number parallelism_hint) // receiving tuple, and emitting to downstream setBolt(String id, SerializableBiConsumer biConsumer, String... fields) setBolt(String id, SerializableBiConsumer biConsumer, Number parallelism_hint, String... fields) // receiving tuple, and never emitting to downstream setBolt(String id, SerializableConsumer consumer) setBolt(String id, SerializableConsumer consumer, Number parallelism_hint) {code} Here is another example including the three interface usage: {code} // example. spout1: generate random strings // bolt1: get the first part of a string // bolt2: output the tuple builder.setSpout("spout1", () -> UUID.randomUUID().toString()); builder.setBolt("bolt1", (tuple, collector) -> { String[] parts = tuple.getStringByField("lambda").split("\\-"); collector.emit(new Values(parts[0])); }, "field").shuffleGrouping("spout1"); builder.setBolt("bolt2", tuple -> System.out.println(tuple)).shuffleGrouping("bolt1"); {code} was: In the past, If we want print tuples, we need to write the following code: {code} class PritingBolt extends BaseBasicBolt{ @Override public void execute(Tuple input, BasicOutputCollector collector) { System.out.println(input); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { // nothing } } builder.setBolt("bolt2", new PritingBolt()); {code} Now, with this patch: {code} builder.setBolt("bolt2", tuple -> System.out.println(tuple)); {code} The above is just an simplest demo. This patch provides some new methods in TopologyBuilder to allow you to use Java8 lambda expression: {code} setSpout(String id, SerializableSupplier supplier) setSpout(String id, SerializableSupplier supplier, Number parallelism_hint) // receiving tuple, and user can decide to emit tuple to downstream or not setBolt(String id, SerializableBiConsumer biConsumer) setBolt(String id, SerializableBiConsumer biConsumer, Number parallelism_hint) // receiving tuple, and never emitting to downstream setBolt(String id, SerializableConsumer consumer) setBolt(String id, SerializableConsumer consumer, Number parallelism_hint) {code} Here is another example including the three interface usage: {code} // example. spout1: generate random strings // bolt1: get the first part of a string // bolt2: output the tuple builder.setSpout("spout1", () -> UUID.randomUUID().toString()); builder.setBolt("bolt1", (tuple, collector) -> { String[] parts = tuple.getStringByField("lambda").split("\\-"); collector.emit(new Values(parts[0])); }).shuffleGrouping("spout1"); builder.setBolt("bolt2", tuple -> System.out.println(tuple)).shuffleGrouping("bolt1"); {code} > Lambda support > -- > > Key: STORM-2490 > URL: https://issues.apache.org/jira/browse/STORM-2490 > Project: Apache Storm > Issue Type: New Feature > Components: storm-client >Reporter: Xin Wang >Assignee: Xin Wang > Time Spent: 3h > Remaining Estimate: 0h > > In the past, If we want print tuples, we need to write the following code: > {code} > class PritingBolt extends BaseBasicBolt{ > @Override > public void execute(Tuple input, BasicOutputCollector collector) { > System.out.println(input); > } > @Override > public void declareOutputFields(OutputFieldsDeclarer declarer) { > // nothing > } > } > builder.setBolt("bolt2", new PritingBolt()); > {code} > Now, with this patch: > {code} > builder.setBolt("bolt2", tuple -> System.out.println(tuple)); > {code} > The above is just an simplest demo. This patch provides some new methods in > TopologyBuilder to allow you to use Java8 lambda expression: > {code} > setSpout(S
[jira] [Comment Edited] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984440#comment-15984440 ] Erik Weathers edited comment on STORM-2191 at 4/26/17 9:25 AM: --- A few thoughts: [~revans2]: I dug a bit into the kryo clashes and it's because of using {{carbonite}} but also having a direct dependency on {{kryo}}. Though carbonite's kryo dependency gets ignored because of our own kryo dependency existing, carbonite's dependency on {{chill-java}} pulls in {{kryo-shaded}}. kryo-shaded is just kryo with some reflectasm stuff shaded in. I imagine we can work around this problem by either: # excluding kryo-shaded from the carbonite dependency in storm's pom. - OR - # excluding kryo from the carbonite dependency and adding an explicit dependency on kryo-shaded to storm's pom. Notably, the general {{lib/}} and the {{lib-webapp/}} dependencies show another conflict when I run [Bobby's script from above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15982927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15982927] \[1]: {{javax/servlet}}. [~revans2] & [~sriharsha]: I still am coming back to my doubt about the "full jar" expansion being any better than the wildcard. Both rely on running on the same host file system. The current mechanism of doing "full jar" expansion uses both Python and Java methods to get the list of files from the filesystem. My proposed mechanism of having the JVM do the wildcard expansion would switch this to doing whatever the JVM implementation does for listing files, which would presumably be just using the same filesystem primitives. I'm not seeing how the "full jar" expansion produces a better, more predictable ordering of the jars. Finally, I had another wacky idea: is there any possibility of building an uber jar of storm's dependencies? i.e., instead of all these individual 3rd party dependency jars that are being enumerated, we might be able to have a "storm-dependencies.jar" uber jar that just slaps all of the non-shaded dependencies into a single jar without relocating anything. \[1] I had to slightly modify Bobby's script. {{ls}} on my computer shows a "*" suffix for executable files, so I used {{find}} instead. was (Author: erikdw): A few thoughts: [~revans2]: I dug a bit into the kryo clashes and it's because of using {{carbonite}} but also having a direct dependency on {{kryo}}. Though carbonite's kryo dependency gets ignored because of our own kryo dependency existing, carbonite's dependency on {{chill-java}} pulls in {{kryo-shaded}}. kryo-shaded is just kryo with some reflectasm stuff shaded in. I imagine we can work around this problem by either: # excluding kryo-shaded from the carbonite dependency in storm's pom. - OR - # excluding kryo from the carbonite dependency and adding an explicit dependency on kryo-shaded to storm's pom. Notably, the general {{lib/}} and the {{lib-webapp/}} dependencies show another conflict when I run [Bobby's script from above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15982927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15982927] \[1]: {{javax/servlet}}. [~revans2] & [~sriharsha]: I still am coming back to my doubt about the "full jar" expansion being any better than the wildcard. Both rely on running on the same host file system. The current mechanism of doing "full jar" expansion uses both Python and Java methods to get the list of files from the filesystem. The proposed (by me) mechanism of having the JVM do the wildcard expansion would switch this to doing whatever the JVM implementation does for listing files, which would presumably be just using the same filesystem primitives. I'm not seeing how the "full jar" expansion produces a better, more predictable ordering of the jars. Finally, I had another wacky idea: is there any possibility of building an uber jar of storm's dependencies? i.e., instead of all these individual 3rd party dependency jars that are being enumerated, we might be able to have a "storm-dependencies.jar" uber jar that just slaps all of the non-shaded dependencies into a single jar without relocating anything. \[1] I had to slightly modify Bobby's script. {{ls}} on my computer shows a "*" suffix for executable files, so I used {{find}} instead. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 > URL: https://issues.apache.org/jira/browse/STORM-2191 > Project: Apache Storm > Issue Type: Task > Components: storm-core >Affects Versions: 1.0.2 >Reporter: Erik Weathers >Priority: Minor > Labels: cli, com
[jira] [Commented] (STORM-2191) shorten classpaths in worker and LogWriter commands
[ https://issues.apache.org/jira/browse/STORM-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984440#comment-15984440 ] Erik Weathers commented on STORM-2191: -- A few thoughts: [~revans2]: I dug a bit into the kryo clashes and it's because of using {{carbonite}} but also having a direct dependency on {{kryo}}. Though carbonite's kryo dependency gets ignored because of our own kryo dependency existing, carbonite's dependency on {{chill-java}} pulls in {{kryo-shaded}}. kryo-shaded is just kryo with some reflectasm stuff shaded in. I imagine we can work around this problem by either: # excluding kryo-shaded from the carbonite dependency in storm's pom. - OR - # excluding kryo from the carbonite dependency and adding an explicit dependency on kryo-shaded to storm's pom. Notably, the general {{lib/}} and the {{lib-webapp/}} dependencies show another conflict when I run [Bobby's script from above|https://issues.apache.org/jira/browse/STORM-2191?focusedCommentId=15982927&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15982927] \[1]: {{javax/servlet}}. [~revans2] & [~sriharsha]: I still am coming back to my doubt about the "full jar" expansion being any better than the wildcard. Both rely on running on the same host file system. The current mechanism of doing "full jar" expansion uses both Python and Java methods to get the list of files from the filesystem. The proposed (by me) mechanism of having the JVM do the wildcard expansion would switch this to doing whatever the JVM implementation does for listing files, which would presumably be just using the same filesystem primitives. I'm not seeing how the "full jar" expansion produces a better, more predictable ordering of the jars. Finally, I had another wacky idea: is there any possibility of building an uber jar of storm's dependencies? i.e., instead of all these individual 3rd party dependency jars that are being enumerated, we might be able to have a "storm-dependencies.jar" uber jar that just slaps all of the non-shaded dependencies into a single jar without relocating anything. \[1] I had to slightly modify Bobby's script. {{ls}} on my computer shows a "*" suffix for executable files, so I used {{find}} instead. > shorten classpaths in worker and LogWriter commands > --- > > Key: STORM-2191 > URL: https://issues.apache.org/jira/browse/STORM-2191 > Project: Apache Storm > Issue Type: Task > Components: storm-core >Affects Versions: 1.0.2 >Reporter: Erik Weathers >Priority: Minor > Labels: cli, command-line > Time Spent: 10m > Remaining Estimate: 0h > > When launching the worker daemon and its wrapping LogWriter daemon, the > commands can become so long that they eclipse the default Linux limit of 4096 > bytes. That results in commands that are cut off in {{ps}} output, and > prevents easily inspecting the system to see even what processes are running. > The specific scenario in which this problem can be easily triggered: *running > Storm on Mesos*. > h5. Details on why it happens: > # using the default Mesos containerizer instead of Docker containers, which > causes the storm-mesos package to be unpacked into the Mesos executor sandbox. > # The ["expand all jars on > classpath"|https://github.com/apache/storm/blob/6dc6407a01d032483edebb1c1b4d8b69a304d81c/bin/storm.py#L114-L140] > functionality in the {{bin/storm.py}} script causes every one of the jars > that storm bundles into its lib directory to be explicitly listed in the > command. > #* e.g., say the mesos work dir is {{/var/run/mesos/work_dir/}} > #* and say that the original classpath argument in the supervisor cmd > includes the following for the {{lib/}} dir in the binary storm package: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/*}} > #* That leads to a hugely expanded classpath argument for the LogWriter and > Worker daemons that get launched: > #** > {{/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/asm-4.0.jar:/var/run/mesos/work_dir/slaves/2357b762-6653-4052-ab9e-f1354d78991b-S12/frameworks/20160509-084241-1086985738-5050-32231-/executors/STORM_TOPOLOGY_ID/runs/e6a1407e-73fd-4be4-8d00-e882117b3391/storm-mesos-0.1.7-storm0.9.6-mesos0.28.2/lib/carbonite-1.4.0.jar:...}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984426#comment-15984426 ] Arun Mahadevan commented on STORM-2489: --- [~wangkui], I made some minor tweak to process the tuples with some time offset, please pull the current changes and test. > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984429#comment-15984429 ] wangkui commented on STORM-2489: [~arunmahadevan] I pull and test again, yes, no tuple lost now. Thanks. > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984369#comment-15984369 ] Arun Mahadevan commented on STORM-2489: --- [~wangkui] thanks for the update. I made some minor changes to the patch today. If you could pull the latest changes and run your tests a few times will be great. > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984368#comment-15984368 ] wangkui commented on STORM-2489: [~arunmahadevan] Eh, it still lose data, following is one test result, and you can see that the 15073 is lost: Timestamp_start-end=1493195565070-1493195570070 expired= get=1...3757 new=1...3757 Recived=3757,RecivedTotal=3757 Timestamp_start-end=1493195570070-1493195575070 expired=1...3757 get=3758...7579 new=3758...7579 Recived=3822,RecivedTotal=7579 Timestamp_start-end=1493195575070-1493195580070 expired=3758...7579 get=7580...11276 new=7580...11276 Recived=3697,RecivedTotal=11276 Timestamp_start-end=1493195580070-1493195585070 expired=7580...11276 get=11277...15072 new=11277...15072 Recived=3796,RecivedTotal=15072 SendTotal=15162 Timestamp_start-end=1493195585070-1493195590070 expired=11277...15073 get=15074...15162 new=15074...15162 Recived=89,RecivedTotal=15161 > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2489) Overlap and data loss on WindowedBolt based on Duration
[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984353#comment-15984353 ] wangkui commented on STORM-2489: [~arunmahadevan] I tested several times with your patch, now it seems all right, no tuples overlap. Thank you! > Overlap and data loss on WindowedBolt based on Duration > --- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 >Reporter: wangkui >Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)