[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-02-01 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-72359922
  
One more thing ;-)
Did we collect ICLAs from all people contributing significant parts to 
Gelly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-02-01 Thread cebe
Github user cebe commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-72366219
  
@fhueske why is that needed? [Gelly is Apache 
2.0](https://github.com/project-flink/flink-graph/blob/master/LICENSE) licensed 
and Flink too: https://github.com/apache/flink/blob/master/LICENSE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300209#comment-14300209
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-72366621
  
This is not about the license of the software / code itself. The [ASF 
homepage](http://www.apache.org/licenses/#clas) says

 The ASF desires that all contributors of ideas, code, or documentation to 
the Apache projects complete, sign, and submit (via postal mail, fax or email) 
an Individual Contributor License Agreement (1) (CLA) [ PDF form ]. The purpose 
of this agreement is to clearly define the terms under which intellectual 
property has been contributed to the ASF and thereby allow us to defend the 
project should there be a legal dispute regarding the software at some future 
time. A signed CLA is required to be on file before an individual is given 
commit rights to an ASF project.

An ICLA also prevents that major parts of the code base need to be removed 
/ rewritten, if a contributor decides to change the license in the future.



 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300151#comment-14300151
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-72359922
  
One more thing ;-)
Did we collect ICLAs from all people contributing significant parts to 
Gelly?


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1462) Add documentation guide for the graph API

2015-02-01 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300263#comment-14300263
 ] 

Henry Saputra commented on FLINK-1462:
--

Would love to see some pictures to visualize how it works underneath. Graph 
processing usually a bit hard to explain especially with complex structure.

I like how Spark GarphX [1] doc explain how it works.


[1] https://spark.apache.org/docs/latest/graphx-programming-guide.html

 Add documentation guide for the graph API
 -

 Key: FLINK-1462
 URL: https://issues.apache.org/jira/browse/FLINK-1462
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri
  Labels: documentation

 We should write a guide for Gelly, describing what methods are provided and 
 how they can be used.
 It should at least cover the following: graph creation, mutations, 
 transformations, neighborhood functions, vertex-centric iteration, validation 
 and the library methods.
 We can use a format similar to the Flink streaming guide, which I like a lot 
 :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Corrected some typos in comments (removed doub...

2015-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/352


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300784#comment-14300784
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/339#issuecomment-72390974
  
+1, will merge. Thanks @zentol  @tillrohrmann!


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300780#comment-14300780
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-72390331
  
Good catch, Fabian. Yes, please do submit ICLA to secretary before we could
continue.

On Sunday, February 1, 2015, Vasia Kalavri notificati...@github.com wrote:

 Right, thanks @fhueske https://github.com/fhueske for bringing this up!
 @balidani https://github.com/balidani, @andralungu
 https://github.com/andralungu, @cebe https://github.com/cebe: could
 you please complete and sign this form
 https://www.apache.org/licenses/icla.pdf (if you haven't already)?
 Thank you!

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/335#issuecomment-72372857.



 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-02-01 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1419.
--
   Resolution: Fixed
Fix Version/s: 0.8.1
   0.9

Fixed in 563e546236217dace58a8031d56d08a27e08160b

 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 0.9, 0.8.1


 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations

2015-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300788#comment-14300788
 ] 

ASF GitHub Bot commented on FLINK-1419:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/339


 DistributedCache doesn't preserver files for subsequent operations
 --

 Key: FLINK-1419
 URL: https://issues.apache.org/jira/browse/FLINK-1419
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8, 0.9
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler

 When subsequent operations want to access the same files in the DC it 
 frequently happens that the files are not created for the following operation.
 This is fairly odd, since the DC is supposed to either a) preserve files when 
 another operation kicks in within a certain time window, or b) just recreate 
 the deleted files. Both things don't happen.
 Increasing the time window had no effect.
 I'd like to use this issue as a starting point for a more general discussion 
 about the DistributedCache. 
 Currently:
 1. all files reside in a common job-specific directory
 2. are deleted during the job.
  
 One thing that was brought up about Trait 1 is that it basically forbids 
 modification of the files, concurrent access and all. Personally I'm not sure 
 if this a problem. Changing it to a task-specific place solved the issue 
 though.
 I'm more concerned about Trait #2. Besides the mentioned issue, the deletion 
 is realized with the scheduler, which adds a lot of complexity to the current 
 code. (It really is a pain to work on...) 
 If we moved the deletion to the end of the job it could be done as a clean-up 
 step in the TaskManager, With this we could reduce the DC to a 
 cacheFile(String source) method, the delete method in the TM, and throw out 
 everything else.
 Also, the current implementation implies that big files may be copied 
 multiple times. This may be undesired, depending on how big the files are.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1465) GlobalBufferPool reports negative memory allocation

2015-02-01 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300906#comment-14300906
 ] 

Ufuk Celebi commented on FLINK-1465:


I guess it is an overflow of some sorts. The OoM exceptions are caught in 
rethrown to actually give better error messages. Should we keep this or just 
skip it?

Do you know what the misconfiguration was? For the network buffer pool, only 
page size and number of pages should be relevant.

 GlobalBufferPool reports negative memory allocation
 ---

 Key: FLINK-1465
 URL: https://issues.apache.org/jira/browse/FLINK-1465
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, TaskManager
Affects Versions: 0.9
Reporter: Robert Metzger

 I've got this error message when starting Flink.
 It does not really help me. I suspect that my configuration files (which 
 worked with 0.8 aren't working with 0.9 anymore). Still, the exception is 
 reporting weird stuff
 {code}
 11:41:02,516 INFO  
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - TaskManager successfully registered at JobManager 
 akka.tcp://fl...@cloud-18.dima.tu-berlin.de:39674/user/jo
 bmanager.
 11:41:25,230 ERROR 
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - Failed to instantiate network environment.
 java.io.IOException: Failed to instantiate network buffer pool: Could not 
 allocate enough memory segments for GlobalBufferPool (required (Mb): 0, 
 allocated (Mb): -965, missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:81)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.setupNetworkEnvironment(TaskManager.scala:508)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$finishRegistration(TaskManager.scala:479)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:226)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.yarn.YarnTaskManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnTaskManager.scala:32)
 at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:41)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:78)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.OutOfMemoryError: Could not allocate enough memory 
 segments for GlobalBufferPool (required (Mb): 0, allocated (Mb): -965, 
 missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.init(NetworkBufferPool.java:76)
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:78)
 ... 23 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)