[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user markus-h closed the pull request at: https://github.com/apache/flink/pull/598 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-172196328 Hey @markus-h, are there any news regarding this PR? If not, would you mind closing it? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-129223234 Hi @markus-h, I'm so sorry it took me so long to look into this.. I agree with Stephan's comment and also it would be great if we could add this option to gather-sum-apply, too. Would you like to try to rebase? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-101321028 I had a look at this, and it actually looks quite good. The basic idea seems to be that you emit the original vertex if no update happens. It would be nice to not have the `isLastCollected` flag in the user-facing classes. If you could have a dedicated vertex-centric bulk coGroup, with its own output collector, you can track this in the OutputCollector. I think that would be cleanrer with respect to the user-facing API. Otherwise, I think this is a good addition... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-93985063 Interesting idea. Are there use cases that require that, or is that basically to allow for an easy comparison of the bulk vs delta performance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user markus-h commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-93987347 There is no specific usecase, but when you try to process big graphs locally you often run out of memory with delta iterations. But the reason I needed this change is a different one. I am doing research on failure recovery methods in graph analysis. Most Pregel like systems just do a full checkpointing of all vertices. This was way easier to implement with a bulk iteration than with delta iterations in Flink so I decided to just provide gelly with this mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-94002207 Hi @markus-h! I see the point in having a bulk iteration in Gelly, however I'm not sure I would add it as a mode in vertex-centric iteration. VertexCentricIteration implements the Pregel model and it might be confusing to change its semantics like this (or maybe not). Also, I am not sure whether the messaging-vertexUpdate abstraction is what you would like to have in a bulk graph iteration. It might be better to add a bulk graph iteration, where the gathering of neighborhoods is abstracted and the user only provides the step function, i.e. something like the neighborhood methods, but iterative. What do you think? In any case, I think that a use-case / example would really help motivate adding this :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user markus-h commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-94008273 Hi @vasia, thanks for your comments! I thought about this extension in a different way. Whenever you have a graph that is too big to process it with delta iteration you could just turn on bulk mode to get the computation done. It will be a lot slower, but sometimes this might be better then not getting any results. I dont think a dedicated bulk operator would be very useful. People can just use plain Flink if they dont need the Pregel abstraction. And in most cases it would be much slower then using the current solution. You know gelly and its usecases a lot better then me. If you dont think that a mode like this might be userful I am totally find with that. It is a very small change anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/598#issuecomment-94013714 Aha I see! I totally misunderstood your intention :-) I'll take a look as soon as I finish a few more reviews. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1885] [gelly] Added bulk execution mode...
GitHub user markus-h opened a pull request: https://github.com/apache/flink/pull/598 [FLINK-1885] [gelly] Added bulk execution mode to gellys vertex centric iterations See https://issues.apache.org/jira/browse/FLINK-1885 I essentially exchanged the delta iteration with a bulk iteration and made the coGroup of the VertexUpdateUdf kind of an outer join so that the vertices that are not changed in one superstep are kept around in the next one. You can merge this pull request into a Git repository by running: $ git pull https://github.com/markus-h/incubator-flink gellyBulkMode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/598.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #598 commit e3641c88ea260dbb533015adfb6ef44272a2e615 Author: Markus Holzemer markus.holze...@gmx.de Date: 2015-04-13T15:55:03Z Added bulk execution mode to gellys vertex centric iterations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---