[jira] [Commented] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

2011-10-26 Thread Owen O'Malley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136702#comment-13136702
 ] 

Owen O'Malley commented on GIRAPH-64:
-

I think that we should have a giraph script that works like:

{code}
% giraph -cp my-awesome.jar -i jazz_input -v my.vertex -o jazz_output
{code}

> Create VertexRunner to make it easier to run users' computations
> 
>
> Key: GIRAPH-64
> URL: https://issues.apache.org/jira/browse/GIRAPH-64
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> Currently, if a user wants to implement a Giraph algorithm by extending 
> {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
> interface and bundle it with the Giraph jar (or get Giraph on the classpath 
> and playing nice with the implementation).  For example, what is included in 
> the PageRankBenchmark and what Kohei has done: 
> https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
> we had perhaps a Vertex implementation to be subclassed that already had all 
> the standard Tooling included such that all one had to run would be (assuming 
> the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
> jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
> -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
> This wouldn't work with every algorithm, but would be useful in a large 
> number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

2011-10-26 Thread Owen O'Malley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136704#comment-13136704
 ] 

Owen O'Malley commented on GIRAPH-64:
-

I guess I should expand on that a bit. So the standard giraph main should build 
up the job and submit it and just provide a cli option to override the vertex 
and automatically put both the giraph.jar and my-awesome.jar into the dist 
cache.

> Create VertexRunner to make it easier to run users' computations
> 
>
> Key: GIRAPH-64
> URL: https://issues.apache.org/jira/browse/GIRAPH-64
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> Currently, if a user wants to implement a Giraph algorithm by extending 
> {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
> interface and bundle it with the Giraph jar (or get Giraph on the classpath 
> and playing nice with the implementation).  For example, what is included in 
> the PageRankBenchmark and what Kohei has done: 
> https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
> we had perhaps a Vertex implementation to be subclassed that already had all 
> the standard Tooling included such that all one had to run would be (assuming 
> the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
> jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
> -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
> This wouldn't work with every algorithm, but would be useful in a large 
> number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-61) Worker's early failure will cause the whole system fail

2011-11-15 Thread Owen O'Malley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150652#comment-13150652
 ] 

Owen O'Malley commented on GIRAPH-61:
-

I don't think this is the right direction. Adding a second job is really 
expensive to get working right. (And worse, it will make us more sensitive to 
security vs. non-security.) It seems like a much better fix is to be able to 
use the initial state as a checkpoint.

It would also be good if we could handle worker failure more gracefully, but 
that is a much bigger project.

> Worker's early failure will cause the whole system fail
> ---
>
> Key: GIRAPH-61
> URL: https://issues.apache.org/jira/browse/GIRAPH-61
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp
>Affects Versions: 0.70.0
>Reporter: Zhiwei Gu
>Priority: Critical
>
> When there's early failure happens to a worker, the whole system will fail.
> Observed failed worker:
>State: Creating RPC threads failed
>Result: It will cause the worker fail, however, master has already 
> recorded and reserved these splits to this worker (identified by 
> InetAddress), thus although hadoop reschedule a mapper for this worker, the 
> master still waiting for the old worker's response, finally, the master will 
> fail.
> [Failed worker logs:]
> 2011-10-24 18:19:51,051 INFO org.apache.giraph.graph.BspService: process: 
> vertexRangeAssignmentsReadyChanged (vertex ranges are assigned)
> 2011-10-24 18:19:51,060 INFO org.apache.giraph.graph.BspServiceWorker: 
> startSuperstep: Ready for computation on superstep 1 since worker selection 
> and vertex range assignments are done in 
> /_hadoopBsp/job_201108260911_842943/_applicationAttemptsDir/0/_superstepDir/1/_vertexRangeAssignments
> 2011-10-24 18:19:51,078 INFO org.apache.giraph.graph.BspServiceWorker: 
> getAggregatorValues: no aggregators in 
> /_hadoopBsp/job_201108260911_842943/_applicationAttemptsDir/0/_superstepDir/0/_mergedAggregatorDir
>  on superstep 1
> 2011-10-24 18:19:53,974 INFO org.apache.giraph.graph.GraphMapper: map: 
> totalMem=84213760 maxMem=2067988480 freeMem=65069808
> 2011-10-24 18:19:53,974 INFO org.apache.giraph.comm.BasicRPCCommunications: 
> flush: starting...
> 2011-10-24 18:19:54,022 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=102400 and 
> reduceRetainSize=102400
> 2011-10-24 18:19:54,023 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:597)
>   at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.lang.UNIXProcess.(UNIXProcess.java:103)
>   at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>   at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
>   at org.apache.hadoop.util.Shell.run(Shell.java:182)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
>   at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.access$100(RawLocalFileSystem.java:37)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:417)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:400)
>   at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:275)
>   at 
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> [Failed master logs:]
> 2011-10-24 18:21:47,611 INFO org.apache.giraph.graph.BspServiceMaster: 
> checkWorkers: Only found 399 responses of 400 needed to start superstep 0.  
> Sleeping for 5000 msecs and used 1 of 60 attempts.
> 2011-10-24 18:21:47,615 INFO org.apache.giraph.graph.BspServiceMaster: 
> checkWorkers: No response from partition 279 (could be master)
> 2011-10-24 18:21:52,629 INFO org.apache.giraph.graph.BspServiceMaster: 
> checkWorkers: Only found 399 responses of 400 needed to start superstep 0.  
> Sleeping for 5000 msecs and used 2 of 60 attempts.
> 2011-10-24 1

[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-15 Thread Owen O'Malley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150666#comment-13150666
 ] 

Owen O'Malley commented on GIRAPH-83:
-

I think that simplifying the interface is a great goal and it sounds like 
you're moving in the right direction. 

It seems a bit strange to be defining job wide properties using Vertex: 
use/registerAggregator. I'd think that an object that defines the job would be 
more appropriate.

+1 to moving the implementation details out of Vertex.

> Is Vertex correct yet?
> --
>
> Key: GIRAPH-83
> URL: https://issues.apache.org/jira/browse/GIRAPH-83
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>
> I'm seeing a number of people run into oddities with Vertex and am thinking 
> we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira