[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-29 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296747#comment-14296747
 ] 

Gianmarco De Francisci Morales commented on STORM-632:
--

Hi,
Is there anything else needed to push this forward? Can we merge it?

 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-29 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296909#comment-14296909
 ] 

Robert Joseph Evans commented on STORM-632:
---

We can merge it in.  I have just been pulled off onto other things lately, and 
have not had the chance to get back to it.

 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296939#comment-14296939
 ] 

ASF GitHub Bot commented on STORM-632:
--

Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/395


 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289305#comment-14289305
 ] 

ASF GitHub Bot commented on STORM-632:
--

GitHub user gdfm opened a pull request:

https://github.com/apache/storm/pull/395

STORM-632: New grouping for better load balancing



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gdfm/storm STORM-632

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #395


commit 2c181e9b57ad4f56f8ccca79ca2ceac574492bc1
Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me
Date:   2013-12-12T11:35:47Z

Update README.markdown

commit ab8a77614f26737427f2f8c69bf1e74e169c78a6
Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me
Date:   2013-12-12T12:03:40Z

add eclipse files to .gitingore

commit 1d9bfb38f9f49672df05657bd65935fbb346b588
Author: Gianmarco De Francisci Morales gdfm+git...@gdfm.me
Date:   2014-12-08T15:15:52Z

Merge branch 'master' of github.com:apache/incubator-storm

Conflicts:
.gitignore
README.markdown

commit 42398f6ffb7d4b0df14a127edb54cde62a81
Author: Gianmarco De Francisci Morales g...@apache.org
Date:   2015-01-23T11:48:46Z

Merge branch 'master' of github.com:apache/incubator-storm

commit 259c8c25ae7187b3a5fc735a111d14b77d6233c0
Author: Gianmarco De Francisci Morales g...@apache.org
Date:   2015-01-23T14:36:08Z

Java implementation of partial key grouping + test




 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289399#comment-14289399
 ] 

ASF GitHub Bot commented on STORM-632:
--

Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/395#issuecomment-71210952
  
+1 I also filed STORM-637 as a follow on JIRA to finish the integration.


 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289422#comment-14289422
 ] 

ASF GitHub Bot commented on STORM-632:
--

Github user gdfm commented on the pull request:

https://github.com/apache/storm/pull/395#issuecomment-71212234
  
I always forget them :)
Fixed.


 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-21 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285649#comment-14285649
 ] 

Gianmarco De Francisci Morales commented on STORM-632:
--

Hi [~revans2]. I can create a pull request for the current Java code very 
easily.
Any preference on where to put it (which package)?
I would also like to have it integrated more tightly, however I don't really 
know where to start to do it.
My knowledge of Clojure is zero and I am not too familiar with the internals of 
Storm.
I'd rather let somebody else take care of that integration.

 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-632) New grouping for better load balancing

2015-01-20 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284015#comment-14284015
 ] 

Robert Joseph Evans commented on STORM-632:
---

[~azaroth]
I would love to see this pulled in to Storm.  If you want to put up a pull 
request based on the code in your branch that would be great.  My only comment 
is that it would be nice to have the partial key grouping match the fields 
grouping in how field names are passed in, but that would take some tighter 
integration with storm to do that cleanly.  If you don't feel comfortable 
making those changes yourself, please let me know.  I cannot promise I'll get 
to it any time soon though.

 New grouping for better load balancing
 --

 Key: STORM-632
 URL: https://issues.apache.org/jira/browse/STORM-632
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales

 Hi,
 We have recently studied the problem of load balancing in Storm [1].
 In particular, we focused on what happens when the key distribution of the 
 stream is skewed when using key grouping.
 We developed a new stream partitioning scheme (which we call Partial Key 
 Grouping). It achieves better load balancing than key grouping while being 
 more scalable than shuffle grouping in terms of memory.
 In the paper we show a number of mining algorithms that are easy to implement 
 with partial key grouping, and whose performance can benefit from it. We 
 think that it might also be useful for a larger class of algorithms.
 We don't have experience in Clojure, however partial key grouping is very 
 easy to implement: it requires just a few lines of code in Java when 
 implemented as a custom grouping in Storm [2].
 We believe it should be very easy to port from Java.
 For all these reasons, we believe it will be a nice addition to the standard 
 groupings available in Storm. If the community thinks it's a good idea, we 
 will be happy to offer support in the porting.
 References:
 [1] 
 https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
 [2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)