[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296747#comment-14296747 ] Gianmarco De Francisci Morales commented on STORM-632: -- Hi, Is there anything else needed to push this forward? Can we merge it? New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296909#comment-14296909 ] Robert Joseph Evans commented on STORM-632: --- We can merge it in. I have just been pulled off onto other things lately, and have not had the chance to get back to it. New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296939#comment-14296939 ] ASF GitHub Bot commented on STORM-632: -- Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/395 New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289305#comment-14289305 ] ASF GitHub Bot commented on STORM-632: -- GitHub user gdfm opened a pull request: https://github.com/apache/storm/pull/395 STORM-632: New grouping for better load balancing You can merge this pull request into a Git repository by running: $ git pull https://github.com/gdfm/storm STORM-632 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/395.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #395 commit 2c181e9b57ad4f56f8ccca79ca2ceac574492bc1 Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me Date: 2013-12-12T11:35:47Z Update README.markdown commit ab8a77614f26737427f2f8c69bf1e74e169c78a6 Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me Date: 2013-12-12T12:03:40Z add eclipse files to .gitingore commit 1d9bfb38f9f49672df05657bd65935fbb346b588 Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me Date: 2014-12-08T15:15:52Z Merge branch 'master' of github.com:apache/incubator-storm Conflicts: .gitignore README.markdown commit 42398f6ffb7d4b0df14a127edb54cde62a81 Author: Gianmarco De Francisci Morales g...@apache.org Date: 2015-01-23T11:48:46Z Merge branch 'master' of github.com:apache/incubator-storm commit 259c8c25ae7187b3a5fc735a111d14b77d6233c0 Author: Gianmarco De Francisci Morales g...@apache.org Date: 2015-01-23T14:36:08Z Java implementation of partial key grouping + test New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289399#comment-14289399 ] ASF GitHub Bot commented on STORM-632: -- Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/395#issuecomment-71210952 +1 I also filed STORM-637 as a follow on JIRA to finish the integration. New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289422#comment-14289422 ] ASF GitHub Bot commented on STORM-632: -- Github user gdfm commented on the pull request: https://github.com/apache/storm/pull/395#issuecomment-71212234 I always forget them :) Fixed. New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285649#comment-14285649 ] Gianmarco De Francisci Morales commented on STORM-632: -- Hi [~revans2]. I can create a pull request for the current Java code very easily. Any preference on where to put it (which package)? I would also like to have it integrated more tightly, however I don't really know where to start to do it. My knowledge of Clojure is zero and I am not too familiar with the internals of Storm. I'd rather let somebody else take care of that integration. New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-632) New grouping for better load balancing
[ https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284015#comment-14284015 ] Robert Joseph Evans commented on STORM-632: --- [~azaroth] I would love to see this pulled in to Storm. If you want to put up a pull request based on the code in your branch that would be great. My only comment is that it would be nice to have the partial key grouping match the fields grouping in how field names are passed in, but that would take some tighter integration with storm to do that cleanly. If you don't feel comfortable making those changes yourself, please let me know. I cannot promise I'll get to it any time soon though. New grouping for better load balancing -- Key: STORM-632 URL: https://issues.apache.org/jira/browse/STORM-632 Project: Apache Storm Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Hi, We have recently studied the problem of load balancing in Storm [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call Partial Key Grouping). It achieves better load balancing than key grouping while being more scalable than shuffle grouping in terms of memory. In the paper we show a number of mining algorithms that are easy to implement with partial key grouping, and whose performance can benefit from it. We think that it might also be useful for a larger class of algorithms. We don't have experience in Clojure, however partial key grouping is very easy to implement: it requires just a few lines of code in Java when implemented as a custom grouping in Storm [2]. We believe it should be very easy to port from Java. For all these reasons, we believe it will be a nice addition to the standard groupings available in Storm. If the community thinks it's a good idea, we will be happy to offer support in the porting. References: [1] https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf [2] https://github.com/gdfm/partial-key-grouping -- This message was sent by Atlassian JIRA (v6.3.4#6332)