[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-72359922 One more thing ;-) Did we collect ICLAs from all people contributing significant parts to Gelly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user cebe commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-72366219 @fhueske why is that needed? [Gelly is Apache 2.0](https://github.com/project-flink/flink-graph/blob/master/LICENSE) licensed and Flink too: https://github.com/apache/flink/blob/master/LICENSE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300209#comment-14300209 ] ASF GitHub Bot commented on FLINK-1201: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-72366621 This is not about the license of the software / code itself. The [ASF homepage](http://www.apache.org/licenses/#clas) says The ASF desires that all contributors of ideas, code, or documentation to the Apache projects complete, sign, and submit (via postal mail, fax or email) an Individual Contributor License Agreement (1) (CLA) [ PDF form ]. The purpose of this agreement is to clearly define the terms under which intellectual property has been contributed to the ASF and thereby allow us to defend the project should there be a legal dispute regarding the software at some future time. A signed CLA is required to be on file before an individual is given commit rights to an ASF project. An ICLA also prevents that major parts of the code base need to be removed / rewritten, if a contributor decides to change the license in the future. Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300151#comment-14300151 ] ASF GitHub Bot commented on FLINK-1201: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-72359922 One more thing ;-) Did we collect ICLAs from all people contributing significant parts to Gelly? Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1462) Add documentation guide for the graph API
[ https://issues.apache.org/jira/browse/FLINK-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300263#comment-14300263 ] Henry Saputra commented on FLINK-1462: -- Would love to see some pictures to visualize how it works underneath. Graph processing usually a bit hard to explain especially with complex structure. I like how Spark GarphX [1] doc explain how it works. [1] https://spark.apache.org/docs/latest/graphx-programming-guide.html Add documentation guide for the graph API - Key: FLINK-1462 URL: https://issues.apache.org/jira/browse/FLINK-1462 Project: Flink Issue Type: Task Components: Documentation Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Labels: documentation We should write a guide for Gelly, describing what methods are provided and how they can be used. It should at least cover the following: graph creation, mutations, transformations, neighborhood functions, vertex-centric iteration, validation and the library methods. We can use a format similar to the Flink streaming guide, which I like a lot :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Corrected some typos in comments (removed doub...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/352 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300784#comment-14300784 ] ASF GitHub Bot commented on FLINK-1419: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-72390974 +1, will merge. Thanks @zentol @tillrohrmann! DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300780#comment-14300780 ] ASF GitHub Bot commented on FLINK-1201: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-72390331 Good catch, Fabian. Yes, please do submit ICLA to secretary before we could continue. On Sunday, February 1, 2015, Vasia Kalavri notificati...@github.com wrote: Right, thanks @fhueske https://github.com/fhueske for bringing this up! @balidani https://github.com/balidani, @andralungu https://github.com/andralungu, @cebe https://github.com/cebe: could you please complete and sign this form https://www.apache.org/licenses/icla.pdf (if you haven't already)? Thank you! — Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/335#issuecomment-72372857. Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1419. -- Resolution: Fixed Fix Version/s: 0.8.1 0.9 Fixed in 563e546236217dace58a8031d56d08a27e08160b DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 0.9, 0.8.1 When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300788#comment-14300788 ] ASF GitHub Bot commented on FLINK-1419: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/339 DistributedCache doesn't preserver files for subsequent operations -- Key: FLINK-1419 URL: https://issues.apache.org/jira/browse/FLINK-1419 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Chesnay Schepler Assignee: Chesnay Schepler When subsequent operations want to access the same files in the DC it frequently happens that the files are not created for the following operation. This is fairly odd, since the DC is supposed to either a) preserve files when another operation kicks in within a certain time window, or b) just recreate the deleted files. Both things don't happen. Increasing the time window had no effect. I'd like to use this issue as a starting point for a more general discussion about the DistributedCache. Currently: 1. all files reside in a common job-specific directory 2. are deleted during the job. One thing that was brought up about Trait 1 is that it basically forbids modification of the files, concurrent access and all. Personally I'm not sure if this a problem. Changing it to a task-specific place solved the issue though. I'm more concerned about Trait #2. Besides the mentioned issue, the deletion is realized with the scheduler, which adds a lot of complexity to the current code. (It really is a pain to work on...) If we moved the deletion to the end of the job it could be done as a clean-up step in the TaskManager, With this we could reduce the DC to a cacheFile(String source) method, the delete method in the TM, and throw out everything else. Also, the current implementation implies that big files may be copied multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1465) GlobalBufferPool reports negative memory allocation
[ https://issues.apache.org/jira/browse/FLINK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300906#comment-14300906 ] Ufuk Celebi commented on FLINK-1465: I guess it is an overflow of some sorts. The OoM exceptions are caught in rethrown to actually give better error messages. Should we keep this or just skip it? Do you know what the misconfiguration was? For the network buffer pool, only page size and number of pages should be relevant. GlobalBufferPool reports negative memory allocation --- Key: FLINK-1465 URL: https://issues.apache.org/jira/browse/FLINK-1465 Project: Flink Issue Type: Bug Components: Local Runtime, TaskManager Affects Versions: 0.9 Reporter: Robert Metzger I've got this error message when starting Flink. It does not really help me. I suspect that my configuration files (which worked with 0.8 aren't working with 0.9 anymore). Still, the exception is reporting weird stuff {code} 11:41:02,516 INFO org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1 - TaskManager successfully registered at JobManager akka.tcp://fl...@cloud-18.dima.tu-berlin.de:39674/user/jo bmanager. 11:41:25,230 ERROR org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1 - Failed to instantiate network environment. java.io.IOException: Failed to instantiate network buffer pool: Could not allocate enough memory segments for GlobalBufferPool (required (Mb): 0, allocated (Mb): -965, missing (Mb): 965). at org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:81) at org.apache.flink.runtime.taskmanager.TaskManager.setupNetworkEnvironment(TaskManager.scala:508) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$finishRegistration(TaskManager.scala:479) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:226) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.YarnTaskManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnTaskManager.scala:32) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:41) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:78) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.OutOfMemoryError: Could not allocate enough memory segments for GlobalBufferPool (required (Mb): 0, allocated (Mb): -965, missing (Mb): 965). at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.init(NetworkBufferPool.java:76) at org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:78) ... 23 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)