[jira] [Resolved] (GIRAPH-971) Simple Giraph Oozie Action module

2016-01-17 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman resolved GIRAPH-971.

Resolution: Won't Fix

old ticket, will come back to this if I end up needing it or anyone else shows 
interest, closing the ticket for now

> Simple Giraph Oozie Action module
> -
>
> Key: GIRAPH-971
> URL: https://issues.apache.org/jira/browse/GIRAPH-971
> Project: Giraph
>  Issue Type: New Feature
>  Components: conf and scripts
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Trivial
> Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch, 
> GIRAPH-971-3.patch
>
>
> Adds 'giraph-oozie' module while will build a JAR to be installed/configured 
> as an Oozie extension as well as added to Giraph runtime deps. Alllows us to  
> write Oozie workflow XML's that include a  Action node.
> Not well tested yet, but module builds fine in default Giraph profile against 
> Hadoop 1.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-969) STATIC_SASL_SYMBOL munge results in compilation errors for yarn profile with hadoop > 2.3.0

2015-01-31 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-969:
---
Attachment: GIRAPH-969-1.patch

This should fix the YARN profile Hadoop-2.6.x build issues.  

> STATIC_SASL_SYMBOL munge results in compilation errors for yarn profile with 
> hadoop > 2.3.0
> ---
>
> Key: GIRAPH-969
> URL: https://issues.apache.org/jira/browse/GIRAPH-969
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: Hadoop 2.3.0 and higher
>Reporter: Philipp Nolte
> Attachments: GIRAPH-969-1.patch
>
>
> The SaslRpcServer.SALS_PROPS field was removed in Hadoop 2.3.0 (see 
> https://issues.apache.org/jira/browse/HADOOP-10451).
> The hadoop_yarn profile uses the STATIC_SASL munge symbol and makes Giraph 
> try to use the SALS_PROPS field.
> This results in a compilation error when running 
> {noformat}
> mvn clean package -Phadoop_yarn -Dhadoop.version=2.5.1
> {noformat}
> {noformat}
> [ERROR] 
> giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyClient.java:[84,68]
> cannot find symbol
>   symbol:   variable SASL_PROPS
>   location: class org.apache.hadoop.security.SaslRpcServer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module

2014-12-17 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-971:
---
Attachment: GIRAPH-971-3.patch

> Simple Giraph Oozie Action module
> -
>
> Key: GIRAPH-971
> URL: https://issues.apache.org/jira/browse/GIRAPH-971
> Project: Giraph
>  Issue Type: New Feature
>  Components: conf and scripts
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Trivial
> Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch, 
> GIRAPH-971-3.patch
>
>
> Adds 'giraph-oozie' module while will build a JAR to be installed/configured 
> as an Oozie extension as well as added to Giraph runtime deps. Alllows us to  
> write Oozie workflow XML's that include a  Action node.
> Not well tested yet, but module builds fine in default Giraph profile against 
> Hadoop 1.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module

2014-12-16 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-971:
---
Attachment: GIRAPH-971-2.patch

Now passes 'mvn verify' on default build profile. Needs more love around 
example workflow and testing.

> Simple Giraph Oozie Action module
> -
>
> Key: GIRAPH-971
> URL: https://issues.apache.org/jira/browse/GIRAPH-971
> Project: Giraph
>  Issue Type: New Feature
>  Components: conf and scripts
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Trivial
> Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch
>
>
> Adds 'giraph-oozie' module while will build a JAR to be installed/configured 
> as an Oozie extension as well as added to Giraph runtime deps. Alllows us to  
> write Oozie workflow XML's that include a  Action node.
> Not well tested yet, but module builds fine in default Giraph profile against 
> Hadoop 1.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module

2014-12-15 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-971:
---
Attachment: GIRAPH-971-1.patch

> Simple Giraph Oozie Action module
> -
>
> Key: GIRAPH-971
> URL: https://issues.apache.org/jira/browse/GIRAPH-971
> Project: Giraph
>  Issue Type: New Feature
>  Components: conf and scripts
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Trivial
> Attachments: GIRAPH-971-1.patch
>
>
> Adds 'giraph-oozie' module while will build a JAR to be installed/configured 
> as an Oozie extension as well as added to Giraph runtime deps. Alllows us to  
> write Oozie workflow XML's that include a  Action node.
> Not well tested yet, but module builds fine in default Giraph profile against 
> Hadoop 1.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-971) Simple Giraph Oozie Action module

2014-12-15 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-971:
--

 Summary: Simple Giraph Oozie Action module
 Key: GIRAPH-971
 URL: https://issues.apache.org/jira/browse/GIRAPH-971
 Project: Giraph
  Issue Type: New Feature
  Components: conf and scripts
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Trivial


Adds 'giraph-oozie' module while will build a JAR to be installed/configured as 
an Oozie extension as well as added to Giraph runtime deps. Alllows us to  
write Oozie workflow XML's that include a  Action node.

Not well tested yet, but module builds fine in default Giraph profile against 
Hadoop 1.2.1.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-959) Giraph's 1.1.0 hadoop_yarn profile can no longer be built with hadoop 2.0.3-alpha

2014-12-15 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247459#comment-14247459
 ] 

Eli Reisman commented on GIRAPH-959:


The original hadoop_yarn profile ran on Hadoop 2.0.3-alpha or newer. As far as 
I know, there were modifications made to the hadoop_yarn profile by LinkedIn 
folks more recently that made it  dependent on Hadoop 2.2.0 or newer, it should 
build fine on those versions. There should be some JIRA tickets documenting the 
discussion around that. There are threads in the mailing list that address it.


> Giraph's 1.1.0 hadoop_yarn profile can no longer be built with hadoop 
> 2.0.3-alpha
> -
>
> Key: GIRAPH-959
> URL: https://issues.apache.org/jira/browse/GIRAPH-959
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.1.0
>Reporter: Philipp Nolte
>  Labels: build, dependencies, hadoop-version
>
> Trying to build giraph release 1.1.0-RC1 for hadoop 2.0.3-alpha with profile 
> hadoop_yarn
> {{$ git clone git://git.apache.org/giraph.git}}
> {{$ git checkout release-1.1.0-RC1}}
> {{$ mvn -Dhadoop.version=2.0.3-alpha -DskipTests -Phadoop_yarn clean package}}
> fails with lots of {{cannot find symbol}} errors:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) 
> on project giraph-core: Compilation failure: Compilation failure:
> [ERROR] 
> /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[49,41]
>  package org.apache.hadoop.yarn.client.api does not exist
> [ERROR] 
> /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[50,41]
>  package org.apache.hadoop.yarn.client.api does not exist
> [ERROR] 
> /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[52,41]
>  cannot find symbol
> [ERROR] symbol:   class YarnException
> [ERROR] location: package org.apache.hadoop.yarn.exceptions
> [ERROR] 
> /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[88,11]
>  cannot find symbol
> [ERROR] symbol:   class YarnClient
> [ERROR] location: class org.apache.giraph.yarn.GiraphYarnClient
> [ERROR] 
> /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[115,52]
>  cannot find symbol
> {noformat}
> This is probably due to missing dependencies in the hadoop_yarn profile. It 
> may also mean, that giraph's hadoop_yarn profile is no longer compatible with 
> hadoop 2.0.3-alpha, as hadoop-yarn-project version 2.0.3-alpha does not 
> include the package org.apache.hadoop.yarn.client.api for example.
> In latter case, the pom.xml comment stating that hadoop_yarn runs on hadoop 
> 2.0.3-alpha by default is deprecated and should be removed to prevent 
> confusion.
> What versions of hadoop can I build giraph version 1.1.0 with the hadoop_yarn 
> profile with?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-811) Infinite ZooKeeper CleanUp

2014-04-19 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974922#comment-13974922
 ] 

Eli Reisman commented on GIRAPH-811:


This is a good solution, are you certain this issue is fixed, I think since the 
transition to Hadoop 2.2.0 YARN support I have seen some more recent reports of 
this problem? This (rather than the >= solution) seems like the right fix if we 
still need one.

[~aching] ^ if people are still reporting cleanup problems due to the "extra" 
master task in Giraph-on-YARN runs, I'd take a look at this patch or a 
variation of it.

> Infinite ZooKeeper CleanUp
> --
>
> Key: GIRAPH-811
> URL: https://issues.apache.org/jira/browse/GIRAPH-811
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp, zookeeper
>Affects Versions: 1.1.0
>Reporter: Alexandre Fonseca
>  Labels: yarn
> Attachments: GIRAPH-811.patch
>
>
> While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT 
> compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never 
> stop even after recognizing that all supersteps had completed and the output 
> had been written to the output directory.
> Looking at the logs, I found that the BspServiceMaster is stuck at the while 
> loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729):
> {code}2013-12-08 03:51:21,698 INFO  [org.apache.giraph.master.MasterThread] 
> master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination 
> of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is 
> now on superstep 4
> 2013-12-08 03:51:21,699 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - 
> setJobState: 
> {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
> superstep 4
> 2013-12-08 03:51:21,753 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: 
> Notifying master its okay to cleanup with 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master
> 2013-12-08 03:51:21,790 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - 
> cleanUpZooKeeper: Node 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already 
> exists, no need to create.
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max 
> workers = 1, split master/worker = true, is YARN-only job = true, total max 
> tasks = 1
> 2013-12-08 03:51:21,792 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - 
> cleanUpZooKeeper: Got 2 of 1 desired children from 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir
> 2013-12-08 03:51:21,793 INFO  [org.apache.giraph.master.MasterThread] 
> master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - 
> cleanedUpZooKeeper: Waiting for the children of 
> /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to 
> change since only got 2 nodes.{code}
> As the last 2 entries show, instead of registering just 1 task ending, it 
> registers 2 and thus it misses the condition on line 1740.
> One solution would be to change the == in line 1740 to a >=. However, the 
> actual issue seems to reside with the BspInputFormat.getMaxTasks() 
> (BspInputFormat.java:51). This function assumes that in a pure yarn execution 
> the total number of tasks will be equal to the maximum number of workers. 
> However, based on GiraphApplicationMaster:167, this is not the case. An extra 
> Master task is launched in addition to all the Worker tasks. 
> BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of 
> a pure yarn execution.
> Compilation:
> {code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code}
> Execution command:
> {code}$HADOOP_PREFIX/bin/hadoop jar 
> ~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner 
> org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
> giraph/input/tiny_graph.txt -vof 
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
> giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj 
> giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete

2014-01-30 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887351#comment-13887351
 ] 

Eli Reisman commented on GIRAPH-747:


Had a chance to look again and my read is this breaks non-YARN. We might need 
to adjust this patch to use another method. I do think this is a real issue and 
we should get something in to fix it.

> BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers 
> to complete
> ---
>
> Key: GIRAPH-747
> URL: https://issues.apache.org/jira/browse/GIRAPH-747
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Fix For: 1.0.0
>
> Attachments: GIRAPH-747.v1.patch
>
>
> In BspServiceMaster, the function cleanUpZooKeeper should wait for the number 
> of workers and masters to complete. However, it appears that maxTasks only 
> takes workers into consideration. Consequently, the worker straggler may fail 
> to report to the ZooKeeper due to the path gets removed too early. This will 
> cause No lease on path File does not exist exception at runtime.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete

2014-01-30 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886989#comment-13886989
 ] 

Eli Reisman commented on GIRAPH-747:


Hey, reviewing this. I recall this issue I thought I was shimming this number 
somewhere else? The reason is that BspServiceMaster is also used by non-YARN 
and I didn't want to break or alter the shared code.

Could another non-YARN Giraph committer take a look and see if this change is 
safe? If not we should def commit this. If so, maybe another (ugh) munge flag 
here will suffice?


> BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers 
> to complete
> ---
>
> Key: GIRAPH-747
> URL: https://issues.apache.org/jira/browse/GIRAPH-747
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Fix For: 1.0.0
>
> Attachments: GIRAPH-747.v1.patch
>
>
> In BspServiceMaster, the function cleanUpZooKeeper should wait for the number 
> of workers and masters to complete. However, it appears that maxTasks only 
> takes workers into consideration. Consequently, the worker straggler may fail 
> to report to the ZooKeeper due to the path gets removed too early. This will 
> cause No lease on path File does not exist exception at runtime.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete

2014-01-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886195#comment-13886195
 ] 

Eli Reisman commented on GIRAPH-747:


I'll review and commit this, thanks again!


> BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers 
> to complete
> ---
>
> Key: GIRAPH-747
> URL: https://issues.apache.org/jira/browse/GIRAPH-747
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Fix For: 1.0.0
>
> Attachments: GIRAPH-747.v1.patch
>
>
> In BspServiceMaster, the function cleanUpZooKeeper should wait for the number 
> of workers and masters to complete. However, it appears that maxTasks only 
> takes workers into consideration. Consequently, the worker straggler may fail 
> to report to the ZooKeeper due to the path gets removed too early. This will 
> cause No lease on path File does not exist exception at runtime.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (GIRAPH-819) Number of containers required for a job

2014-01-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886157#comment-13886157
 ] 

Eli Reisman commented on GIRAPH-819:


Thanks, sorry it took so long I will be happy to try this patch out tonight. 
good catch. If Mohammed approves also I will commit it.



> Number of containers required for a job
> ---
>
> Key: GIRAPH-819
> URL: https://issues.apache.org/jira/browse/GIRAPH-819
> Project: Giraph
>  Issue Type: Bug
>  Components: lib, mapreduce
>Affects Versions: 1.1.0
>Reporter: Rafal Wojdyla
>  Labels: patch
> Fix For: 1.1.0
>
> Attachments: GIRAPH-819.patch
>
>
> Java 1.6.x
> Giraph trunk - revert java 1.7 support.
> Hadoop 2.2.0.x
> Job submission fails due to:
> {noformat}
> 13/11/28 12:02:14 INFO yarn.GiraphYarnClient: Running Client
> 13/11/28 12:02:14 INFO client.RMProxy: Connecting to ResourceManager at 
> master/192.168.1.100:8045
> 13/11/28 12:02:15 INFO yarn.GiraphYarnClient: Got node report from ASM for, 
> nodeId=kreator:46477, nodeAddresskreator:8042, nodeRackName/default-rack, 
> nodeNumContainers7
> 13/11/28 12:02:15 INFO yarn.GiraphYarnClient: Got node report from ASM for, 
> nodeId=exotica:46645, nodeAddressexotica:8042, nodeRackName/default-rack, 
> nodeNumContainers8
> Exception in thread "main" java.lang.RuntimeException: Giraph job requires 2 
> containers to run; cluster only hosts 15
>   at 
> org.apache.giraph.yarn.GiraphYarnClient.checkPerNodeResourcesAvailable(GiraphYarnClient.java:230)
>   at 
> org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:125)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (GIRAPH-794) add support for generic hadoop1 and hadoop2 profiles

2013-12-08 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842580#comment-13842580
 ] 

Eli Reisman commented on GIRAPH-794:


i like this idea, does anyone have a problem with this?

> add support for generic hadoop1 and hadoop2 profiles
> 
>
> Key: GIRAPH-794
> URL: https://issues.apache.org/jira/browse/GIRAPH-794
> Project: Giraph
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.0.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: 1.1.0
>
> Attachments: 
> 0001-GIRAPH-794.-add-support-for-generic-hadoop1-and-hado.patch
>
>
> I would like to propose that as part of Giraph 1.1.0 we introduce generic 
> hadoop1 and hadoop2 profiles that would be expected to track latest releases 
> on hadoop 1.x and hadoop 2.x codelines (currently these are 1.2.1 and 2.2.0). 
> These profiles will be the ones used to publish Giraph maven artifacts. 
> Following the convention established by HBase I propose that we bake hadoop1 
> and hadoop2 tokens into a version.
> Thus every release of Giraph starting from 1.1.0 will deploy the following 
> versions:
>*  (same as -hadoop1)
>* -hadoop1
>* -hadoop2
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-11-05 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-730:
---

Attachment: GIRAPH-730-2-suggestion.patch

> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730-2-suggestion.patch, GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-11-05 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-730:
---

Attachment: (was: GIRAPH-730-2.patch)

> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API

2013-11-05 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814479#comment-13814479
 ] 

Eli Reisman commented on GIRAPH-737:


Committed, thanks so much Muhammad, great work!

> Giraph Application Master: move to new and stable YARN API
> --
>
> Key: GIRAPH-737
> URL: https://issues.apache.org/jira/browse/GIRAPH-737
> Project: Giraph
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: GIRAPH-737-2.patch, GIRAPH-737-3.patch, 
> GIRAPH-737.WIP.patch
>
>
> Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a 
> Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn 
> significantly *overhauled* its APIs and associated coding patterns. The new 
> beta version is 2.1.x and I was told by Yarn-dev that current APIs will not 
> change much.
> In the above circumstances, we need to substantially overhaul Giraph AM as 
> well to accommodate with the new Yarn API. Moreover, in newer YARN API, 
> supporting kerberos security in AM becomes easier and more transparent.
> Potential impact:
> The upcoming Girpah AM will not work with earlier alpha Hadoop versions such 
> as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, 
> the more prevalent way of Giraph processing (MR-based) should continue to 
> work.
> 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API

2013-10-26 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-737:
---

Attachment: GIRAPH-737-2.patch

This is the most recent patch Muhammad uploaded to RB. I am posting here for 
convenience.

When you build the very first time using
{code}mvn -Phadoop_yarn -Dhadoop.version=2.1.1-SNAPSHOT clean package 
-Dtest=TestFilters -DfailIfNoTests=false{code}

then the patch builds fine. Once the full build has completed, one can run a 
more vanilla:
{code}mvn -Phadoop_yarn -Dhadoop.version=2.1.1-SNAPSHOT clean verify{code}

will build flawlessly. The bad news: we still have 342 check style issues to 
resolve. Once Muhammad uploads 737-3 patch with the checkstyle issues fixed, 
we're ready to commit. Excited to get this checked in!

> Giraph Application Master: move to new and stable YARN API
> --
>
> Key: GIRAPH-737
> URL: https://issues.apache.org/jira/browse/GIRAPH-737
> Project: Giraph
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: GIRAPH-737-2.patch, GIRAPH-737.WIP.patch
>
>
> Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a 
> Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn 
> significantly *overhauled* its APIs and associated coding patterns. The new 
> beta version is 2.1.x and I was told by Yarn-dev that current APIs will not 
> change much.
> In the above circumstances, we need to substantially overhaul Giraph AM as 
> well to accommodate with the new Yarn API. Moreover, in newer YARN API, 
> supporting kerberos security in AM becomes easier and more transparent.
> Potential impact:
> The upcoming Girpah AM will not work with earlier alpha Hadoop versions such 
> as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, 
> the more prevalent way of Giraph processing (MR-based) should continue to 
> work.
> 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API

2013-10-10 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792207#comment-13792207
 ] 

Eli Reisman commented on GIRAPH-737:


I'm +1 on moving forward with this, thanks Avery! I originally wanted to be as 
backwards compatible as possible but the YARN API's have evolved so much I 
think moving forward with this will be a big win.

> Giraph Application Master: move to new and stable YARN API
> --
>
> Key: GIRAPH-737
> URL: https://issues.apache.org/jira/browse/GIRAPH-737
> Project: Giraph
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
>
> Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a 
> Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn 
> significantly *overhauled* its APIs and associated coding patterns. The new 
> beta version is 2.1.x and I was told by Yarn-dev that current APIs will not 
> change much.
> In the above circumstances, we need to substantially overhaul Giraph AM as 
> well to accommodate with the new Yarn API. Moreover, in newer YARN API, 
> supporting kerberos security in AM becomes easier and more transparent.
> Potential impact:
> The upcoming Girpah AM will not work with earlier alpha Hadoop versions such 
> as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, 
> the more prevalent way of Giraph processing (MR-based) should continue to 
> work.
> 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-10-07 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788341#comment-13788341
 ] 

Eli Reisman commented on GIRAPH-730:


Oops -- comment above seems to have cut off the top of my original text. 
Missing part:

I am uploading a "suggestion patch" in case Chaun is working something else now 
that does what we talked about above -- just synchronize the creation of the 
LOCAL_RESOURCES map, not every call to get a reference to it.

> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730-2.patch, GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-10-07 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-730:
---

Attachment: GIRAPH-730-2.patch

I'll upload the patch here, but if Chaun is still working this problem, I'm 
happy to leave it to him to fix this problem. I am in agreement now that if we 
catch this fix at the getTaskResourceMap() level we can solve the problem for 
now. Great work, Chaun!

One issue: i'm having trouble building Giraph (under any profile, even default) 
right now to test this...is the build broken right now? I'm on a clean trunk 
repo...?


> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730-2.patch, GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-08-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755667#comment-13755667
 ] 

Eli Reisman commented on GIRAPH-730:


My question is this: the AM is running as a single thread, and then makes a 
request for all the containers it needs in one lump. In my tests, what happened 
was this callback (by the RM giving the local AM all the containers it was 
asked for) returned the whole bunch of containers at once, but this call is 
made asynchronously.

However, once the callback produced all the requested containers (always in a 
single asynchronous callback), the same single AM thread is what iterated 
through the collection of containers, one at a time, populating them with 
metadata and the resource map in buildContainerLaunchContext. So there was no 
concurrency issue.

BUT, I think now that you are running on a larger cluster and asking for more 
containers, they are being returned in smaller groups. Perhaps you ask for 500 
workers and instead you get two asynchronous callbacks from the RM, one with 
200 and one with 300 containers, and both of _those_ asyncronous calls 
returning the groups of containers are now racing into 
buildContainerLaunchContext (etc) and this is where the concurrency issue 
arises?

YARN certainly does not guarantee you can get back all the containers you ask 
for at once, although in my tests I didn't see any behavior but this. If at 
your scale you are seeing this problem, we need to address it. Good catch!

If this is what is happening (you have logged one AM ask for X containers 
resulting in more than one asynchronous callback returning A, B, and C # of 
containers, where A+B+C = X) then we need to fix this.

But, I do think we should not risk going with a partial solution. If what 
you're describing and what I am describing above match up, we really should 
just eliminate this risk now by protecting buildContainerContext, or concurrent 
attempts to populate the launch container contexts with id's etc could 
overwrite each other, and containers could be lost on the AM side this way.

It doesn't mean we have to just slap a "synchronized" block on to 
buildContainerLaunchContext, maybe something more subtle could work. But we 
probably should address the problem so that all the risk is gone.

What do you think? Maybe try another patch that addresses all possible race 
risks here? Also, if the race you're seeing is not as I have described here, 
please let me know what the real concern is, maybe I missed your point.

Thanks, great work!

> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API

2013-08-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755664#comment-13755664
 ] 

Eli Reisman commented on GIRAPH-737:


Originally, the concept was to be compatible with as much of the 2.0.x-alpha 
Hadoop line as possible. To this end, I attempted to use as much of the "old" 
YARN API as I could get away with, figuring we could update later and end back 
compatibility if we ever wanted to.

Now that the 2.1-beta line is out, I think it makes a lot of sense to 
reevaluate this and move forward, refactoring to the newer YARN API and perhaps 
even abandon 2.0.x Hadoop in favor of the 2.1 beta line.

Need to see some code (of course) but I'm fully +1 on the idea. Anyone else 
want to chime in here?


> Giraph Application Master: move to new and stable YARN API
> --
>
> Key: GIRAPH-737
> URL: https://issues.apache.org/jira/browse/GIRAPH-737
> Project: Giraph
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
>
> Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a 
> Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn 
> significantly *overhauled* its APIs and associated coding patterns. The new 
> beta version is 2.1.x and I was told by Yarn-dev that current APIs will not 
> change much.
> In the above circumstances, we need to substantially overhaul Giraph AM as 
> well to accommodate with the new Yarn API. Moreover, in newer YARN API, 
> supporting kerberos security in AM becomes easier and more transparent.
> Potential impact:
> The upcoming Girpah AM will not work with earlier alpha Hadoop versions such 
> as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, 
> the more prevalent way of Giraph processing (MR-based) should continue to 
> work.
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-08-23 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749300#comment-13749300
 ] 

Eli Reisman commented on GIRAPH-730:


Let me put it another way: I thought that the single thread of the AM is the 
only one that calls getTaskResourceMap or buildContainerLaunchContext for that 
matter, what are these other threads?

If getTaskResourceMap is not thread safe, how are the other values set in 
buildContainerLanchContext not subject to the same race condition? should we be 
synchronizing there?



> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-08-23 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749294#comment-13749294
 ] 

Eli Reisman commented on GIRAPH-730:


Hey Chuan than makes a lot of sense, thank you. I think the part I was confused 
about is that I don't think getTaskResourceMap is called asynchronously, I 
thought the first call (the one that actually initializes the map) was returned 
and completed before the first thread is launched, and the remaining calls 
would be reading an immutable object (not by declaration but by convention 
only) so would be essentially thread safe.

If you guys are sure I'm wrong thats good enough for me, we can commit this. I 
have not been keeping up with the mailing list as I should -- have you noticed 
anyone else reproducing this problem? Did this patch solve the problem for you?

If so, let me know and we should move forward with it. Good catch!


> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading

2013-08-23 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749010#comment-13749010
 ] 

Eli Reisman commented on GIRAPH-730:


Hi Chaun,

I'm having a bit of trouble finding the concurrency issue here. The 
LOCAL_RESOURCES is a common resource map that is initialized once per AM 
instance (which is hopefully a singleton at this point!) and reused, unchanged, 
for each task launched. It is this object that is returned from 
getTaskResourceMap to buildContainerLaunchContext, which returns the launch 
context, populating the ContainerLaunchContext before any threads are run or 
submitted. from then on, the method is just returning a reference to the same 
map each call.

If there is a concurrency issue, it might be more likely attributed to 
buildLaunchContainerContext. But I'm not really seeing one.

If you are certain this is a concurrency issue and the syncronization fix is 
the only thing verified to work, I'd try this:

I think the null check at the top of getTaskResourceMap is atomic by nature, 
you could just add a syncronization block around the map construction portion. 
I think returning the unchanging (and essentially immutable) map singleton in 
the loop of containers after that will be thread safe.



> GiraphApplicationMaster race condition in resource loading
> --
>
> Key: GIRAPH-730
> URL: https://issues.apache.org/jira/browse/GIRAPH-730
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: Giraph with Yarn
>Reporter: Chuan Lei
>Assignee: Chuan Lei
> Attachments: GIRAPH-730.v1.patch
>
>
> In GiraphApplicationMaster.java, getTaskResourceMap function is not 
> multi-thread safe, which causes the application master fail to distribute the 
> resources (jar, configuration file, etc.) to each container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-717) HiveJythonRunner with support for pure Jython value types.

2013-07-26 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721195#comment-13721195
 ] 

Eli Reisman commented on GIRAPH-717:


+1

> HiveJythonRunner with support for pure Jython value types.
> --
>
> Key: GIRAPH-717
> URL: https://issues.apache.org/jira/browse/GIRAPH-717
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> This adds support for pure Jython jobs. Currently this runner is hooked up to 
> work with Hive. I'll make it more generic later.
> Running a Jython job is simply:
> HIVE_HOME=
> HADOOP_HOME=
> $HIVE_HOME/bin/hive --service jar  
> org.apache.giraph.hive.jython.HiveJythonRunner jython1.py [jython2.py] ...
> You can pass in any number of scripts. They will be parsed in order and sent 
> to all the workers using DistributedCache.
> There are examples and testsĀ in the diff. Here is one example:
> launcher: https://gist.github.com/nitay/a62e0a5d369a5e701fa3
> worker: https://gist.github.com/nitay/7834fd2b059527e65a36
> There are a few pieces to a Jython job, I'll go over each part here.
> The HiveJythonRunner will call a function called "prepare(job)" from the 
> Jython scripts. This is the entry point for configuring your job.
> In this configuration you setup everything, such as your graph types (those 
> IVEMM writables) and sets up the Hive vertex/edge inputs and output. Each 
> graph type is one of the following:
> 1) A Java type. For example the user can specify simply IntWritable
> 2) A Jython type that implements Writable. In the example above the message 
> value implements Writable.
> 3) A pure Jython type. The Java code will wrap these objects in a Writable 
> wrapper that serializes Jython values using Pickle (jython IO framework).
> Your computation must implement JythonComputation. Note that this does not 
> actually implement Computation, but rather is a separate class so that we can 
> wrap all the types passed in with a wrapper that implements Writable. The 
> methods are named the same so that the user does not notice anything.
> For Hive usage - if your value type is a primitive e.g. IntWritable or 
> LongWritable, then you need not do anything. The Java code will automatically 
> read/write the Hive table specified and convert between Hive types and the 
> primitive Writable. The vertex_id type in the example works like this.
> IfĀ your value is a custom Jython type, you must create classes which 
> implement JythonHiveReader/JythonHiveWriter (or JythonHiveIO which is both). 
> These objects read/write Jython types from Hive. There are wrappers in the 
> Java code which take HiveIO data normally used in giraph-hive and turns them 
> into Jython types. This means, for example, that getMap() will return a 
> Jython dictionary instead of a Java Map.
> There is also a PageRankBenchmark (from previous diff) implemented in Jython. 
> Here's a run for comparison / sanity check:
> PageRankBenchmark with 10 workers, 100M vertices, 10B edges, 10 compute 
> threads
> trunk:
>   https://gist.github.com/nitay/3170fa3b575d4d2e22a9
>   total time: 302466
> with this diff:
>   https://gist.github.com/nitay/a52b6d1d64e50ab9829e
>   total time: 306517
> in jython:
>   https://gist.github.com/nitay/3f2e758b2933c3521727
>   total time: 434730
> So we see that existing things are not affected (is there something else I 
> should test?) and that Jython has around 40% overhead.
> ReviewBoard: https://reviews.apache.org/r/12543/ (Sorry it's a big one, hard 
> to split up :/)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-719) Typo fixes for strings in GiraphYarnClient.java

2013-07-26 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721193#comment-13721193
 ] 

Eli Reisman commented on GIRAPH-719:


+1 on this, thank you! 

> Typo fixes for strings in GiraphYarnClient.java
> ---
>
> Key: GIRAPH-719
> URL: https://issues.apache.org/jira/browse/GIRAPH-719
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nicholas Karkoulias
>Priority: Trivial
> Attachments: GIRAPH-719-1.patch
>
>
> Two trivial fixes in Strings with user messages (file GiraphYarnClient.java).
> I'm attaching a patch that can be applied to the current trunk (commit 
> 4caffaf2b0).
> First-time JIRA user, so tell me if I'm doing anything wrong. :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-706) Hybrid management of configuration options

2013-07-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700269#comment-13700269
 ] 

Eli Reisman commented on GIRAPH-706:


This is a great idea, and would make a great newbie JIRA for someone who wants 
to get involved with Giraph.

> Hybrid management of configuration options
> --
>
> Key: GIRAPH-706
> URL: https://issues.apache.org/jira/browse/GIRAPH-706
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Armando Miraglia
>
> While checking the source code (specially under the formats package in 
> giraph-core) I realized that many configuration options are managed using 
> hadoop Configuration instead of the more appropriate *ConfOption classes. 
> This causes the unavailability of such configuration in the documentation as 
> well as an hybrid management of the configurations in the source code.
> I think that the project should be reviewed to make all the configuration use 
> the common *ConfOption API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-707) Giraph could probably support Hadoop 2.0.x-alpha line using a single build profile

2013-07-04 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-707:
--

 Summary: Giraph could probably support Hadoop 2.0.x-alpha line 
using a single build profile
 Key: GIRAPH-707
 URL: https://issues.apache.org/jira/browse/GIRAPH-707
 Project: Giraph
  Issue Type: Improvement
Reporter: Eli Reisman
Priority: Trivial


The title says it all. Other that switching the "hadoop.version" Maven property 
setting, these basically all do the same stuff from the build perspective and 
are starting to cluster up our POM.xml

On the other hand, this adds verbosity and another layer of complexity to our 
build command line. Instead of:

{code}mvn -Phadoop_2.0.3 clean install{code}

we would have:

{code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_2_alpha clean install{code}

as the user would still need to pick out a Hadoop-2.0.x to build against. 

Alternately, we could just make the decision "its an alpha release" and always 
point -Phadoop_2_alpha to the newest release. This could cause some confusion 
among users during a new Hadoop-2.0.x release, but then all Hadoop-2.x builds 
would look like:

{code}mvn -Phadoop_2_alpha clean install{code}

If anyone cares, please post your opinions or a patch according to your 
particular inclination. This will be an easy fix, whatever we decide.

Or we can do nothing. Thats fine too.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-07-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700249#comment-13700249
 ] 

Eli Reisman commented on GIRAPH-687:


Ping. Can I get someone to take a peek at this, folks are asking to build 
against 2.0.5-alpha :)


> Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
> ---
>
> Key: GIRAPH-687
> URL: https://issues.apache.org/jira/browse/GIRAPH-687
> Project: Giraph
>  Issue Type: New Feature
>  Components: build
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-687-1.patch
>
>
> Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes:
> mvn -Phadoop_2.0.4 clean verify
> mvn -Phadoop_2.0.5 clean verify

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-704) Specialized message stores

2013-07-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700245#comment-13700245
 ] 

Eli Reisman commented on GIRAPH-704:


This is great, +1 on the patch, its big but the changes go together and this 
has been well tested. Great work!



> Specialized message stores
> --
>
> Key: GIRAPH-704
> URL: https://issues.apache.org/jira/browse/GIRAPH-704
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-704.patch
>
>
> I was investigating with where the time/CPU is going in some applications, 
> and receiving messages on server side turned out to be one of the most 
> expensive things we do. We should provide better implementations using 
> primitive maps whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-06-26 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694094#comment-13694094
 ] 

Eli Reisman commented on GIRAPH-687:


Tried this again today, it works for -Phadoop_yarn builds as well as standard 
-Phadoop_2.0.x builds for use w/MRv2 interface.


> Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
> ---
>
> Key: GIRAPH-687
> URL: https://issues.apache.org/jira/browse/GIRAPH-687
> Project: Giraph
>  Issue Type: New Feature
>  Components: build
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-687-1.patch
>
>
> Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes:
> mvn -Phadoop_2.0.4 clean verify
> mvn -Phadoop_2.0.5 clean verify

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line

2013-06-26 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-688:
---

Description: 
This makes the hadoop-yarn branch build again against all compatible Hadoop 
versions, warns (in a crude but accurate way) what to do if user did not set 
hadoop.version at the mvn command line...and passes mvn clean verify etc.

I have removed a hardcoded version setting and replaced it with the 
destined-to-fail warning to allow/force folks to stay on top of which version 
they will build against (the 2.x Hadoop line is growing quickly!)

The correct way (thanks Eugene!) to build our YARN branch against any 
compatible Hadoop, as of now, is this:

mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install

Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
POM.XML files to see the various profiles we support for newer Hadoops, and 
select the hadoop.version you see in your favorite to build, as shown above.

Thats it. Enjoy.


  was:
This makes the hadoop-yarn branch build again against all compatible Hadoop 
versions, warns (in a crude but accurate way) what to do if user did not set 
hadoop.version at the mvn command line...and passes mvn clean verify etc.

I have removed a hardcoded version setting and replaced it with the 
destined-to-fail warning to allow/force folks to stay on top of which version 
they will build against (the 2.x Hadoop line is growing quickly!)

The correct way (thanks Eugene!) to build our YARN branch against any 
compatible Hadoop, as of now, is this:

mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install

Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
POM.XML files to see the various profiles we support for newer Hadoops, and 
select the hadoop.version you see in your favorite to build, as shown above.

Thats it. Enjoy.



> Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, 
> warns if none set, works w/new 1.1.0 line
> --
>
> Key: GIRAPH-688
> URL: https://issues.apache.org/jira/browse/GIRAPH-688
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-688-1.patch
>
>
> This makes the hadoop-yarn branch build again against all compatible Hadoop 
> versions, warns (in a crude but accurate way) what to do if user did not set 
> hadoop.version at the mvn command line...and passes mvn clean verify etc.
> I have removed a hardcoded version setting and replaced it with the 
> destined-to-fail warning to allow/force folks to stay on top of which version 
> they will build against (the 2.x Hadoop line is growing quickly!)
> The correct way (thanks Eugene!) to build our YARN branch against any 
> compatible Hadoop, as of now, is this:
> mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install
> Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
> POM.XML files to see the various profiles we support for newer Hadoops, and 
> select the hadoop.version you see in your favorite to build, as shown above.
> Thats it. Enjoy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line

2013-06-26 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-688:
---

Description: 
This makes the hadoop-yarn branch build again against all compatible Hadoop 
versions, warns (in a crude but accurate way) what to do if user did not set 
hadoop.version at the mvn command line...and passes mvn clean verify etc.

I have removed a hardcoded version setting and replaced it with the 
destined-to-fail warning to allow/force folks to stay on top of which version 
they will build against (the 2.x Hadoop line is growing quickly!)

The correct way (thanks Eugene!) to build our YARN branch against any 
compatible Hadoop, as of now, is this:

{code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install{code}

Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
POM.XML files to see the various profiles we support for newer Hadoops, and 
select the hadoop.version you see in your favorite to build, as shown above.

Thats it. Enjoy.


  was:
This makes the hadoop-yarn branch build again against all compatible Hadoop 
versions, warns (in a crude but accurate way) what to do if user did not set 
hadoop.version at the mvn command line...and passes mvn clean verify etc.

I have removed a hardcoded version setting and replaced it with the 
destined-to-fail warning to allow/force folks to stay on top of which version 
they will build against (the 2.x Hadoop line is growing quickly!)

The correct way (thanks Eugene!) to build our YARN branch against any 
compatible Hadoop, as of now, is this:

mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install

Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
POM.XML files to see the various profiles we support for newer Hadoops, and 
select the hadoop.version you see in your favorite to build, as shown above.

Thats it. Enjoy.



> Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, 
> warns if none set, works w/new 1.1.0 line
> --
>
> Key: GIRAPH-688
> URL: https://issues.apache.org/jira/browse/GIRAPH-688
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-688-1.patch
>
>
> This makes the hadoop-yarn branch build again against all compatible Hadoop 
> versions, warns (in a crude but accurate way) what to do if user did not set 
> hadoop.version at the mvn command line...and passes mvn clean verify etc.
> I have removed a hardcoded version setting and replaced it with the 
> destined-to-fail warning to allow/force folks to stay on top of which version 
> they will build against (the 2.x Hadoop line is growing quickly!)
> The correct way (thanks Eugene!) to build our YARN branch against any 
> compatible Hadoop, as of now, is this:
> {code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install{code}
> Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
> POM.XML files to see the various profiles we support for newer Hadoops, and 
> select the hadoop.version you see in your favorite to build, as shown above.
> Thats it. Enjoy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config

2013-06-25 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman resolved GIRAPH-631.


Resolution: Fixed
  Assignee: Eli Reisman

I think we're good here at this point. Several patches are up for review that 
make 2.0.4 and 2.0.5 as well as variable YARN versions + YARN Giraph profile 
work. I'm resolving this one.

> Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with 
> a more flexible Maven config
> -
>
> Key: GIRAPH-631
> URL: https://issues.apache.org/jira/browse/GIRAPH-631
> Project: Giraph
>  Issue Type: Improvement
>  Components: conf and scripts
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
> Fix For: 1.0.0, 1.1.0
>
>
> Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of 
> Hadoop. This is because of two problems:
> 1. Simply creating profiles that can "coexist" such as Hadoop's own 
> -Pdist,native type mvn calls is not possible for us since we use munging and 
> excludes in Maven to prevent compilation of the YARN code where the deps are 
> not included (many profiles) and these excludes don't seem overridable. This 
> has been documented online as a Maven "feature" already.
> 2. Simply resetting hadoop.version for the Maven build using a -D option 
> should work and should probably be the right fix for us but in the brief time 
> I played with it (and with our versioning story that affects backporting not 
> decided yet) I did not get it to work myself for Giraph-13 (this is all 
> documented there)
> Option 2 will look like:
> {code}
> mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-693) Giraph-Hive check user code as soon as possible

2013-06-25 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693574#comment-13693574
 ] 

Eli Reisman commented on GIRAPH-693:


I am not well versed in Hive I/O but this is very straightforward and a good 
idea so after reading the patch carefully I'm going to say +1 on this. Thanks 
Nitay! I assume it builds etc?


> Giraph-Hive check user code as soon as possible
> ---
>
> Key: GIRAPH-693
> URL: https://issues.apache.org/jira/browse/GIRAPH-693
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> We have a lot of cases of users running long jobs and then failing at the 
> Hive output step because of some misconfigured schema or type mismatch. We'd 
> like to move these errors as soon as possible.
> To make this happen I am adding checkput methods to the 
> HiveTo and VertexToHive API and letting the user do their 
> checks. Look at the diff for examples and tests.
> https://reviews.apache.org/r/12080/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line

2013-06-25 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693569#comment-13693569
 ] 

Eli Reisman commented on GIRAPH-688:


Hey all,

Sorry I haven't checked in lately, I'll peek in on this tomorrow too. Could I 
grab a quick review from someone, its a very small patch. I only care because 
I'm presenting about Giraph + YARN at the end of the week :)

Thanks, I promise I'll sit down and review some patches too!


> Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, 
> warns if none set, works w/new 1.1.0 line
> --
>
> Key: GIRAPH-688
> URL: https://issues.apache.org/jira/browse/GIRAPH-688
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-688-1.patch
>
>
> This makes the hadoop-yarn branch build again against all compatible Hadoop 
> versions, warns (in a crude but accurate way) what to do if user did not set 
> hadoop.version at the mvn command line...and passes mvn clean verify etc.
> I have removed a hardcoded version setting and replaced it with the 
> destined-to-fail warning to allow/force folks to stay on top of which version 
> they will build against (the 2.x Hadoop line is growing quickly!)
> The correct way (thanks Eugene!) to build our YARN branch against any 
> compatible Hadoop, as of now, is this:
> mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install
> Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
> POM.XML files to see the various profiles we support for newer Hadoops, and 
> select the hadoop.version you see in your favorite to build, as shown above.
> Thats it. Enjoy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-683) Jython for Computation

2013-06-13 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682420#comment-13682420
 ] 

Eli Reisman commented on GIRAPH-683:


This is fantastic work, excellent!

> Jython for Computation
> --
>
> Key: GIRAPH-683
> URL: https://issues.apache.org/jira/browse/GIRAPH-683
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so 
> that the Python computation code can communicate back with the Java Giraph 
> classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a 
> Computation object for it at runtime. This has to be done on each worker 
> separately after the job has launched. Because of this, there is no 
> Computation class set at the beginning. I suspect other scripting languages 
> will have similar issue. To fix this I created a ComputationFactory interface 
> which is responsible for creating the Computation, with a default that just 
> grabs the class from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a 
> lot of repetitive code around these things so centralizing it all in one 
> place made things a lot cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options.
> To use Jython all the user has to do is call Jython#init(...) somewhere in 
> his initialization.
> This patch contains our page rank benchmark implementation in Jython. I added 
> an option (--jython) which chooses whether to run the default or the jython 
> version.
> Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 
> edges per vertex):
> Java:
> Total (milliseconds)  104,388 0   104,388
> Superstep 3 (milliseconds)16,750  0   16,750
> Setup (milliseconds)  2,895   0   2,895
> Shutdown (milliseconds)   50  0   50
> Superstep 0 (milliseconds)15,838  0   15,838
> Superstep 4 (milliseconds)19,088  0   19,088
> Input superstep (milliseconds)8,700   0   8,700
> Superstep 5 (milliseconds)3,550   0   3,550
> Superstep 2 (milliseconds)17,905  0   17,905
> Superstep 1 (milliseconds)19,608  0   19,608
> Jython:
> Total (milliseconds)  244,965 0   244,965
> Superstep 3 (milliseconds)43,405  0   43,405
> Setup (milliseconds)  3,735   0   3,735
> Shutdown (milliseconds)   117 0   117
> Superstep 0 (milliseconds)36,962  0   36,962
> Superstep 4 (milliseconds)46,088  0   46,088
> Input superstep (milliseconds)8,551   0   8,551
> Superstep 5 (milliseconds)22,040  0   22,040
> Superstep 2 (milliseconds)42,329  0   42,329
> Superstep 1 (milliseconds)41,737  0   41,737
> Overhead of Jython vs Java = 2.5x.
> However at scale things get better (200 workers, 1B vertices, 200 edges per 
> vertex):
> Java:
> Total (milliseconds)  1,702,429   0   1,702,429
> Superstep 3 (milliseconds)316,844 0   316,844
> Setup (milliseconds)  13,226  0   13,226
> Shutdown (milliseconds)   113 0   113
> Superstep 0 (milliseconds)300,950 0   300,950
> Superstep 4 (milliseconds)318,627 0   318,627
> Input superstep (milliseconds)114,673 0   114,673
> Superstep 5 (milliseconds)7,898   0   7,898
> Superstep 2 (milliseconds)312,152 0   312,152
> Superstep 1 (milliseconds)317,942 0   317,942
> Jython:
> Total (milliseconds)  2,123,228   0   2,123,228
> Superstep 3 (milliseconds)406,422 0   406,422
> Setup (milliseconds)  7,159   0   7,159
> Shutdown (milliseconds)   131 0   131
> Superstep 0 (milliseconds)347,732 0   347,732
> Superstep 4 (milliseconds)405,696 0   405,696
> Input superstep (milliseconds)112,645 0   112,645
> Superstep 5 (milliseconds)46,687  0   46,687
> Superstep 2 (milliseconds)410,349 0   410,349
> Superstep 1 (milliseconds)386,404 0   386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch: 
> https://reviews.apache.org/r/11709/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line

2013-06-12 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-688:
---

Summary: Make sure Giraph builds against all compatible YARN-enabled Hadoop 
versions, warns if none set, works w/new 1.1.0 line  (was: Make sure YARN 
builds against all compatible Giraph versions, warns if none set, works w/new 
1.1.0 line)

> Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, 
> warns if none set, works w/new 1.1.0 line
> --
>
> Key: GIRAPH-688
> URL: https://issues.apache.org/jira/browse/GIRAPH-688
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-688-1.patch
>
>
> This makes the hadoop-yarn branch build again against all compatible Hadoop 
> versions, warns (in a crude but accurate way) what to do if user did not set 
> hadoop.version at the mvn command line...and passes mvn clean verify etc.
> I have removed a hardcoded version setting and replaced it with the 
> destined-to-fail warning to allow/force folks to stay on top of which version 
> they will build against (the 2.x Hadoop line is growing quickly!)
> The correct way (thanks Eugene!) to build our YARN branch against any 
> compatible Hadoop, as of now, is this:
> mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install
> Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
> POM.XML files to see the various profiles we support for newer Hadoops, and 
> select the hadoop.version you see in your favorite to build, as shown above.
> Thats it. Enjoy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-688) Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line

2013-06-12 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-688:
---

Attachment: GIRAPH-688-1.patch

Here it is, sorry it took too long. Job/book/life...grr you get the idea :)

> Make sure YARN builds against all compatible Giraph versions, warns if none 
> set, works w/new 1.1.0 line
> ---
>
> Key: GIRAPH-688
> URL: https://issues.apache.org/jira/browse/GIRAPH-688
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-688-1.patch
>
>
> This makes the hadoop-yarn branch build again against all compatible Hadoop 
> versions, warns (in a crude but accurate way) what to do if user did not set 
> hadoop.version at the mvn command line...and passes mvn clean verify etc.
> I have removed a hardcoded version setting and replaced it with the 
> destined-to-fail warning to allow/force folks to stay on top of which version 
> they will build against (the 2.x Hadoop line is growing quickly!)
> The correct way (thanks Eugene!) to build our YARN branch against any 
> compatible Hadoop, as of now, is this:
> mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install
> Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
> POM.XML files to see the various profiles we support for newer Hadoops, and 
> select the hadoop.version you see in your favorite to build, as shown above.
> Thats it. Enjoy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-688) Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line

2013-06-12 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-688:
--

 Summary: Make sure YARN builds against all compatible Giraph 
versions, warns if none set, works w/new 1.1.0 line
 Key: GIRAPH-688
 URL: https://issues.apache.org/jira/browse/GIRAPH-688
 Project: Giraph
  Issue Type: Bug
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Minor


This makes the hadoop-yarn branch build again against all compatible Hadoop 
versions, warns (in a crude but accurate way) what to do if user did not set 
hadoop.version at the mvn command line...and passes mvn clean verify etc.

I have removed a hardcoded version setting and replaced it with the 
destined-to-fail warning to allow/force folks to stay on top of which version 
they will build against (the 2.x Hadoop line is growing quickly!)

The correct way (thanks Eugene!) to build our YARN branch against any 
compatible Hadoop, as of now, is this:

mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install

Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our 
POM.XML files to see the various profiles we support for newer Hadoops, and 
select the hadoop.version you see in your favorite to build, as shown above.

Thats it. Enjoy.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-06-12 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-687:
---

Description: 
Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes:
mvn -Phadoop_2.0.4 clean verify
mvn -Phadoop_2.0.5 clean verify


  was:
Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} 
clean verify



> Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
> ---
>
> Key: GIRAPH-687
> URL: https://issues.apache.org/jira/browse/GIRAPH-687
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-687-1.patch
>
>
> Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes:
> mvn -Phadoop_2.0.4 clean verify
> mvn -Phadoop_2.0.5 clean verify

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-06-12 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-687:
---

Attachment: GIRAPH-687-1.patch

Next up, take Eugene's advice for building YARN module and try to make it 
easier to select one of these...!

> Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
> ---
>
> Key: GIRAPH-687
> URL: https://issues.apache.org/jira/browse/GIRAPH-687
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-687-1.patch
>
>
> Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} 
> clean verify

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-06-12 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-687:
---

Issue Type: New Feature  (was: Bug)

> Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
> ---
>
> Key: GIRAPH-687
> URL: https://issues.apache.org/jira/browse/GIRAPH-687
> Project: Giraph
>  Issue Type: New Feature
>  Components: build
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Minor
> Attachments: GIRAPH-687-1.patch
>
>
> Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes:
> mvn -Phadoop_2.0.4 clean verify
> mvn -Phadoop_2.0.5 clean verify

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha

2013-06-12 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-687:
--

 Summary: Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
 Key: GIRAPH-687
 URL: https://issues.apache.org/jira/browse/GIRAPH-687
 Project: Giraph
  Issue Type: Bug
  Components: build
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Minor


Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} 
clean verify


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-629) YARN profile is broken when compiled against hadoop-2.0.4

2013-06-12 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681797#comment-13681797
 ] 

Eli Reisman commented on GIRAPH-629:


awesome! I was out of the loop for a while and hadn't seen this, thanks Eugene. 
I'm putting up a patch to add 2.0.4 and 2.0.4 support now. The "alpha" makes 
sense because thats how its stated in the POM for hadoop.version, its only 
hadoop_2.0.4 in our profile names.


> YARN profile is broken when compiled against hadoop-2.0.4
> -
>
> Key: GIRAPH-629
> URL: https://issues.apache.org/jira/browse/GIRAPH-629
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>
> {noformat}
> $ mvn -Phadoop_yarn -DskipTests -Dhadoop.version=2.0.4-SNAPSHOT clean package
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Giraph Parent .. SUCCESS [1.359s]
> [INFO] Apache Giraph Core  FAILURE [15.319s]
> [INFO] Apache Giraph Hive I/O  SKIPPED
> [INFO] Apache Giraph Examples  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 17.374s
> [INFO] Finished at: Fri Apr 12 17:21:11 PDT 2013
> [INFO] Final Memory: 39M/481M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) 
> on project giraph-core: Compilation failure: Compilation failure:
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[46,42]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: package org.apache.hadoop.yarn.api.records
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[206,42]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[291,47]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[368,11]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[398,35]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[178,7]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[255,26]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[296,37]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[327,49]
>  cannot find symbol
> [ERROR] symbol  : method getState()
> [ERROR] location: interface org.apache.hadoop.yarn.api.records.Container
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[349,46]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[353,42]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[379,7]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.a

[jira] [Commented] (GIRAPH-624) ByteArrayPartition reports 0 aggregate edges when used with DiskBackedPartitionStore

2013-04-20 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637268#comment-13637268
 ] 

Eli Reisman commented on GIRAPH-624:


Sorry i didn't get to thsi sooner, I am not going to have a lot of time to 
review patches right now but will try when I can. Sebastian is right, you 
pretty much want to close any IO goodies in a finally block whenever you can, 
IOE handled or not.

> ByteArrayPartition reports 0 aggregate edges when used with 
> DiskBackedPartitionStore 
> -
>
> Key: GIRAPH-624
> URL: https://issues.apache.org/jira/browse/GIRAPH-624
> Project: Giraph
>  Issue Type: Bug
>Reporter: Claudio Martella
>Assignee: Claudio Martella
> Attachments: GIRAPH-624.diff, GIRAPH-624.diff, GIRAPH-624.diff
>
>
> ByteArrayPartition reports the correct number of edges when run in-memory or 
> with checkpointing, but reports 0 edges when used OOC. OOC runs fine with 
> SimplePartition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-596) Single uber-jar

2013-04-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631365#comment-13631365
 ] 

Eli Reisman commented on GIRAPH-596:


Great, great idea! I also noticed during GIRAPH-13 building on newer Hadoops 
that not all these subprojects were getting built under every profile. If that 
is close enough to fall under this JIRA too, then hey, bonus.


> Single uber-jar
> ---
>
> Key: GIRAPH-596
> URL: https://issues.apache.org/jira/browse/GIRAPH-596
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>
> Right now we build a fatjarĀ (with all the deps) for giraph-hbase, 
> giraph-hive, giraph-accumulo, and so on.
> We should just build one single uber-jar at top level that contains 
> everything.
> This should not affect the regular per-module jars built for each module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-632) YARN dependencies add a fair amount of size to the fat jar and may be subject to simplification

2013-04-14 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-632:
--

 Summary: YARN dependencies add a fair amount of size to the fat 
jar and may be subject to simplification
 Key: GIRAPH-632
 URL: https://issues.apache.org/jira/browse/GIRAPH-632
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 1.0, 1.1
Reporter: Eli Reisman
Priority: Minor
 Fix For: 1.0, 1.1


The hadoop_yarn profile requires some new package dependencies that the rest of 
our build does not. They add size to the build projects, and due to the fact 
that our YARN implementation "rides the fence" between the old API (for our 
ApplicationMaster) and the new API (for our Client) were "what worked for me at 
the time"

on the Maven repos there are a number of "api" versions of some of these libs 
that are lighter weight. Someone (maybe me someday but not yet, sorry) could 
experiment with just replacing some of our current yarn reps in the Maven 
profile that "cast a very wide net" with a bunch of smaller, lighter weight 
packages that cover the same API we need. See also Maven repos and the YARN 
dependencies listed there.

My very brief experiments with this didn't yield anything, but again I ran out 
of time and barely played with this before having other responsibilities take 
priority. Seems like this would not be hard to tune up and could be good for 
our build.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-592) YourKit profiling API for easy profiling of giraph

2013-04-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631363#comment-13631363
 ] 

Eli Reisman commented on GIRAPH-592:


Hey man where's the patch? +1

> YourKit profiling API for easy profiling of giraph
> --
>
> Key: GIRAPH-592
> URL: https://issues.apache.org/jira/browse/GIRAPH-592
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> Adds YourKit API with helpers to Giraph, to make it easy to profile with 
> YourKit. No more having to attach to processes and have the user time things 
> by hand. This allows us to profile specific parts of the code very easily.
> As an example this diff adds profiling to edge input loading.
> To use YourKit with Hadoop jobs you need to set parameters as follows:
> {code}
> -Dmapred.task.profile=true \
> -Dmapred.task.profile.maps=0-${numWorkers} \ 
> -Dmapred.task.profile.params=-agentpath:
> {code}
> Note if the YourKit agent is not passed in (not profiling), the calls I've 
> added here have negligible effect.
> https://reviews.apache.org/r/10147/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config

2013-04-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631362#comment-13631362
 ] 

Eli Reisman commented on GIRAPH-631:


I'm not going to have time to attack this right now. Here's what I know:

In the Maven profiles for various Hadoops, we hardcode a "hadoop.version" Maven 
property for each. So for instance "hadoop_2.0.3" profile hardcodes 
hadoop.version to be "2.0.3-alpha" so that the Maven repo deps can be properly 
downloaded and resolved.

Of course this can all be overriden at the command line with -D options. But 
something about the way our profiles interact (and/or the way the subprojects 
like giraph-examples and the IO subprojects for Hive etc.) breaks still and I 
didn't have time to investigate why.

If you decide to dive in and try this, I'm happy to help, ping me and I will 
attempt to guide you or advise. But in the end there will be some changes to 
our Maven set up.

There is always the chance that there is a more radical approach such as making 
giraph-yarn a subproject but for various reasons I rejected this as a less 
natural fit, especially when I saw that we could stitch the YARN code into our 
real Giraph code with so little munging, and that this munging was required 
anyway to get the profile to integrate with our MapReduce-based profiles at 
all. If you want to pursue a more involved solution like this, I'm not against 
it and am also happy to help where I can.

Finally, if no one cares about this and you leave it to collect dust, I'll come 
back at some point and do it myself ;)


> Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with 
> a more flexible Maven config
> -
>
> Key: GIRAPH-631
> URL: https://issues.apache.org/jira/browse/GIRAPH-631
> Project: Giraph
>  Issue Type: Improvement
>  Components: conf and scripts
>Affects Versions: 1.0, 1.1
>Reporter: Eli Reisman
> Fix For: 1.0, 1.1
>
>
> Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of 
> Hadoop. This is because of two problems:
> 1. Simply creating profiles that can "coexist" such as Hadoop's own 
> -Pdist,native type mvn calls is not possible for us since we use munging and 
> excludes in Maven to prevent compilation of the YARN code where the deps are 
> not included (many profiles) and these excludes don't seem overridable. This 
> has been documented online as a Maven "feature" already.
> 2. Simply resetting hadoop.version for the Maven build using a -D option 
> should work and should probably be the right fix for us but in the brief time 
> I played with it (and with our versioning story that affects backporting not 
> decided yet) I did not get it to work myself for Giraph-13 (this is all 
> documented there)
> Option 2 will look like:
> {code}
> mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config

2013-04-14 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-631:
--

 Summary: Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from 
YARN and replace with a more flexible Maven config
 Key: GIRAPH-631
 URL: https://issues.apache.org/jira/browse/GIRAPH-631
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 1.0, 1.1
Reporter: Eli Reisman
 Fix For: 1.0, 1.1


Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of Hadoop. 
This is because of two problems:

1. Simply creating profiles that can "coexist" such as Hadoop's own 
-Pdist,native type mvn calls is not possible for us since we use munging and 
excludes in Maven to prevent compilation of the YARN code where the deps are 
not included (many profiles) and these excludes don't seem overridable. This 
has been documented online as a Maven "feature" already.

2. Simply resetting hadoop.version for the Maven build using a -D option should 
work and should probably be the right fix for us but in the brief time I played 
with it (and with our versioning story that affects backporting not decided 
yet) I did not get it to work myself for Giraph-13 (this is all documented 
there)

Option 2 will look like:

{code}
mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install 
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (GIRAPH-629) YARN profile is broken when compiled against hadoop-2.0.4

2013-04-14 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman resolved GIRAPH-629.


Resolution: Won't Fix

As is stated in the final Giraph-13 stuff, The hadoop version is currently 
hardcoded to 2.0.3 and while we can upgrade this by hand in the POMs there is 
no setup to move it from code yet. I would suggest a new JIRA to reconfigure 
the POM to accept this.

My "dream" was to have the build do something on the order of:

{code}
mvn -Phadoop_yarn,HADOOPVERSION clean install
{code}

where "HADOOPVERSION" would be the name of one of our other profiles, so that 
you could pick at the command line and have it blow up if the versioning made 
no sense (as 0.20.x would etc) but due to strange behavior in the filtering (I 
can't try to compile the YARN code against hadoop versions that do not supply 
it) this was not possible.

so option two (which might be possible) is something like this:

{code}
mvn -Phadoop_yarn -Dhadoop.version=HADOOPVERSION clean install
{code}

however early attempts to do this indicate that the POM hadoop-version is being 
overridden at times in our build by sub-project POM's or not propagating 
correctly to allow it. I ran out of time at HW to handle this, but if I didn't 
put up a JIRA for it already (I think I did) then we should have one, this 
seems like it could be done and would work.

Anyway, because this behavior (only allowing 2.0.3) is "normal" for now, I'm 
resolving this particular JIRA as "won't fix"

> YARN profile is broken when compiled against hadoop-2.0.4
> -
>
> Key: GIRAPH-629
> URL: https://issues.apache.org/jira/browse/GIRAPH-629
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>
> {noformat}
> $ mvn -Phadoop_yarn -DskipTests -Dhadoop.version=2.0.4-SNAPSHOT clean package
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Giraph Parent .. SUCCESS [1.359s]
> [INFO] Apache Giraph Core  FAILURE [15.319s]
> [INFO] Apache Giraph Hive I/O  SKIPPED
> [INFO] Apache Giraph Examples  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 17.374s
> [INFO] Finished at: Fri Apr 12 17:21:11 PDT 2013
> [INFO] Final Memory: 39M/481M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) 
> on project giraph-core: Compilation failure: Compilation failure:
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[46,42]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: package org.apache.hadoop.yarn.api.records
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[206,42]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[291,47]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[368,11]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[398,35]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[178,7]
>  cannot find symbol
> [ERROR] symbol  : class AMResponse
> [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[255,26]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
> [ERROR] 
> /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[296,37]
>  cannot find symbol
> [ERROR] symbol  : method getAMResponse()
> [ERROR] location: interface 
> org.apache.hadoop.

[jira] [Created] (GIRAPH-608) Spelling error in Combiner.java

2013-04-07 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-608:
--

 Summary: Spelling error in Combiner.java
 Key: GIRAPH-608
 URL: https://issues.apache.org/jira/browse/GIRAPH-608
 Project: Giraph
  Issue Type: Bug
Reporter: Eli Reisman
Priority: Trivial


In line 35, the variable name "originalMessage" is misspelled in one spot. Good 
newbie issue for figuring out how to contribute.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest

2013-04-07 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624979#comment-13624979
 ] 

Eli Reisman commented on GIRAPH-601:


to clarify point one: YARN adds that "little extra" not you, so its sort of a 
grey area. Just keep in mind if your cluster offers 10 gigs of available 
resources, doing -w 8 to account for a gig for master and a gig for app master 
is not good enough. You need to leave some extra container resources "overhead" 
unused for YARN jobs because they will also suck up some extra each.

clarify about yarn-site: there is more than one resource setting in yarn-site 
make sure they are all set the way you need or bad things like this happen with 
little error reporting.


Hope its going well, good luck with this.


> Exception when running pagerank benchmark with 6 or more workers on a 
> pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch, print_addresses.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest

2013-04-07 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624978#comment-13624978
 ] 

Eli Reisman commented on GIRAPH-601:


there are several yarn-site resource settings. I had this type of problem when 
I didn't know it but asked for too much and the cluster isn't always good at 
telling you thats the deal. Two things:

1. each yarn task needs the heap you choose for it, plus "a little extra" for 
the container itself. So keep that in mind.

2. you always have to "pay" for your app master too, which is not part of the 
Giraph API so if you want "-w 5" you are getting:

one app master with some amount of YARN resources, one master task, and 5 
worker tasks (with the master taking a share of heap equal to what each of the 
5 workers gets)

point being its extremely easy to overpower a small cluster on local machine 
without knowing it ;)


> Exception when running pagerank benchmark with 6 or more workers on a 
> pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch, print_addresses.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-527) readVertexInputSplit is always reporting 0 vertices and 0 edges

2013-04-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623389#comment-13623389
 ] 

Eli Reisman commented on GIRAPH-527:


Nice catch Maja!


> readVertexInputSplit is always reporting 0 vertices and 0 edges
> ---
>
> Key: GIRAPH-527
> URL: https://issues.apache.org/jira/browse/GIRAPH-527
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Claudio Martella
>Assignee: Nitay Joffe
>
> readVertexInputSplit is reporting in the status always 0 vertices and 0 edges 
> loaded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-536) Clean up configuration options

2013-04-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623387#comment-13623387
 ] 

Eli Reisman commented on GIRAPH-536:


This has waited a long time. Great one! Absolutely needed for the release. 
Everyone gets confused about this stuff, and it leaves a frustrated impression 
before folks have the chance to really see what Giraph can do for them.

These details matter! Thanks for getting to this!!!

> Clean up configuration options
> --
>
> Key: GIRAPH-536
> URL: https://issues.apache.org/jira/browse/GIRAPH-536
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Alessandro Presta
>Assignee: Alessandro Presta
> Attachments: GIRAPH-536.patch, GIRAPH-536.patch, GIRAPH-536.patch
>
>
> Option names are all over the place, and I think they should be rationalized 
> before we cut the 0.2 release.
> Some examples:
> 1) Options that don't start with "giraph.*", like "partition.*".
> 2) Ambiguous naming: "giraph.numInputSplitsThreads" refers to worker input 
> threads, "giraph.inputSplitThreadCount" refers to threads used by the master 
> to write splits to ZooKeeper.
> 3) Some options are defined in GiraphConstants, some other ones in the 
> classes that use them. We can find all of them by searching for "static final 
> String".
> 4) "giraph.zKForceSync" and "giraph.ZkSkipAcl" use "yes"/"no" instead of 
> true/false, just because they are later used to write ZK configuration (which 
> requires "yes"/"no"). I think we should stick to true/false since these are 
> Giraph options regardless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-604) Clean up benchmarks

2013-04-04 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623380#comment-13623380
 ] 

Eli Reisman commented on GIRAPH-604:


Nice!!! +1 from me!




> Clean up benchmarks
> ---
>
> Key: GIRAPH-604
> URL: https://issues.apache.org/jira/browse/GIRAPH-604
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-604.patch
>
>
> Benchmark classes have a lot of duplicate options and duplicate code which 
> handles CommandLine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles

2013-03-29 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman resolved GIRAPH-599.


Resolution: Fixed

Maja just committed this. Thanks Maja and Nitay!

> Hive IO dependency issues with some Hadoop profiles
> ---
>
> Key: GIRAPH-599
> URL: https://issues.apache.org/jira/browse/GIRAPH-599
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-599.patch
>
>
> Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now 
> this happens:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
>  (15 KB at 20.2 KB/sec)
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is 
> missing, no dependency information available
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
>  (201 KB at 194.6 KB/sec)
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Giraph Parent .. SUCCESS [0.717s]
> [INFO] Apache Giraph Core  SUCCESS [2:58.276s]
> [INFO] Apache Giraph Hive I/O  FAILURE [6.455s]
> [INFO] Apache Giraph Examples  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 3:05.779s
> [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013
> [INFO] Final Memory: 48M/352M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project giraph-hive: Could not resolve 
> dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: 
> Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in 
> central (http://repo1.maven.org/maven2) -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug

[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617939#comment-13617939
 ] 

Eli Reisman commented on GIRAPH-601:


Oh hey did you set your yarn-site and core-site stuff that is not well
doc'd? Does wordcount or pi work on your YARN cluster?





> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617940#comment-13617940
 ] 

Eli Reisman commented on GIRAPH-13:
---

I'll wait a few days for folks to point out problems (and maybe see what
happens with GIRAPH-601) and then commit if no other review issues crop up.
Thanks!






> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, 
> GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617938#comment-13617938
 ] 

Eli Reisman commented on GIRAPH-601:


So masterCount is part of the problem, forcing us to have a "task 0" to be
below the masterCount value of 1? Whats up with masterCount?

Did you possible ask for more workers than your YARN cluster has resources
for? Check out your YARN webui. Could be MRv2 is waiting until the cluster
has enough mem to launch all of your PR tasks, and that moment never comes
in time? Not sure how (or how well) MRv2 wraps these problems.

Also, did you see in one of the earlier dumps that YarnClientImpl is
hitting an IOE on security tokens? Is that normal? I did you had auth on
SIMPLE so that should work as-is?






> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617927#comment-13617927
 ] 

Eli Reisman commented on GIRAPH-13:
---

Thanks! I learned a ton doing it! I'll give this a day or two for folks to play 
with it if they want or ask for changes, I'll be checking review board for any 
such requests, and commit in a few days if not.

I am hoping its clear (and the low-hanging fruit ripe for improvement well 
marked) so others can dive in and play with it and get comfortable extending 
it. There are a lot of fun new possibilities if we choose to flesh this out.


> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, 
> GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617911#comment-13617911
 ] 

Eli Reisman commented on GIRAPH-601:


Awesome, thanks Maja! I did not keep good notes during that part of the YARN 
patch and what I remember is that the problem requiring tasks to start from (or 
at least include? don't know) a "taskId 0" was from MapReduce and IO code. 
Don't know what the deal is.

When we have this all straightened out I can update the YARN patch. The 
solution I used was "stable for now" but YARn is not guaranteeing into the 
future contiguous taskId's or that task 2 will always be our first 
non-app-master task issued, etc. so being able to just use the Id's YARN gives 
us without alteration will be a good idea.


> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617885#comment-13617885
 ] 

Eli Reisman commented on GIRAPH-599:


@Nitay: this worked for me on trunk, thanks! +1

> Hive IO dependency issues with some Hadoop profiles
> ---
>
> Key: GIRAPH-599
> URL: https://issues.apache.org/jira/browse/GIRAPH-599
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-599.patch
>
>
> Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now 
> this happens:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
>  (15 KB at 20.2 KB/sec)
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is 
> missing, no dependency information available
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
>  (201 KB at 194.6 KB/sec)
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Giraph Parent .. SUCCESS [0.717s]
> [INFO] Apache Giraph Core  SUCCESS [2:58.276s]
> [INFO] Apache Giraph Hive I/O  FAILURE [6.455s]
> [INFO] Apache Giraph Examples  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 3:05.779s
> [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013
> [INFO] Final Memory: 48M/352M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project giraph-hive: Could not resolve 
> dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: 
> Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in 
> central (http://repo1.maven.org/maven2) -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the

[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617871#comment-13617871
 ] 

Eli Reisman commented on GIRAPH-601:


Nice! See how the containers for our tasks in the Yarn MRv2 start from "2" and 
go up? This is the problem I had with the YARN patch. The first YARN task is 
always the app master (there is no MRv1 analogue for this) and so our first 
task to run Giraph code is alwasy task 2 or higher. I had to adjust this to 
start handing Id's into Giraph starting at 0. If you guys figure out where our 
taskId dependencies are i'd love to know.

Ideally, I'd like to see Giraph not care internally what the taskId's are, 
where the numbering starts or, that they are contiguous as long as they are 
unique.

> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617857#comment-13617857
 ] 

Eli Reisman commented on GIRAPH-601:


When doing the YARN patch I puzzled over some of this SplitMasterWorker logic I 
think this could be another case where maybe some of this code has evolved 
quickly and isn't doing what it used to any more.

> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> 
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>  org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>   at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>   at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>   at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>   at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>   at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>   at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>   at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-362) Address master task id for communication for master (known issue from GIRAPH-211)

2013-03-29 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617852#comment-13617852
 ] 

Eli Reisman commented on GIRAPH-362:


This is interesting because I had trouble in the YARN patch with the taskid 
stuff too. I noticed in a recent patch Maja removed a hardcoded setting of the 
master task id and set it with getTaskPartitionId type calls.

Does anyone know exactly where the task id dependencies in Giraph are, what 
they are, etc? Are there any Giraph tasks that need a certain task id for a job 
to run? How about Hadoop or MR dependencies in the IO formats needing this? 
Thanks!


> Address master task id for communication for master (known issue from 
> GIRAPH-211)
> -
>
> Key: GIRAPH-362
> URL: https://issues.apache.org/jira/browse/GIRAPH-362
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>
> There is a workaround from GIRAPH-211 to handle requests a little differently 
> due to issues communicating to the master.  We should fix this to be a 
> regular request in the future.
> {code}
>   public void sendWritableRequest(Integer destWorkerId,
>   InetSocketAddress remoteServer,
>   WritableRequest request) {
> if (clientRequestIdRequestInfoMap.isEmpty()) {
>   byteCounter.resetAll();
> }
> boolean registerRequest = true;
> /*if[HADOOP_NON_SECURE]
> else[HADOOP_NON_SECURE]*/
> if (request.getType() == RequestType.SASL_TOKEN_MESSAGE_REQUEST) {
>   registerRequest = false;
> }
> /*end[HADOOP_NON_SECURE]*/
> Channel channel = getNextChannel(remoteServer);
> RequestInfo newRequestInfo = new RequestInfo(remoteServer, request);
> if (registerRequest) {
>   request.setClientId(clientId);
>   request.setRequestId(
> addressRequestIdGenerator.getNextRequestId(remoteServer));
>   ClientRequestId clientRequestId =
> new ClientRequestId(destWorkerId, request.getRequestId());
>   RequestInfo oldRequestInfo = clientRequestIdRequestInfoMap.putIfAbsent(
> clientRequestId, newRequestInfo);
>   if (oldRequestInfo != null) {
> throw new IllegalStateException("sendWritableRequest: Impossible to " 
> +
>   "have a previous request id = " + request.getRequestId() + ", " +
>   "request info of " + oldRequestInfo);
>   }
> }
> ChannelFuture writeFuture = channel.write(request);
> newRequestInfo.setWriteFuture(writeFuture);
> if (limitNumberOfOpenRequests &&
> clientRequestIdRequestInfoMap.size() > maxNumberOfOpenRequests) {
>   waitSomeRequests(maxNumberOfOpenRequests);
> }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles

2013-03-28 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616759#comment-13616759
 ] 

Eli Reisman commented on GIRAPH-599:


I am assuming the problem is somewhere in the giraph-hive dependencies, we are 
using {code}hadoop.version{code} where we cannot safely do so to choose the 
right facebook hive io jar. Thanks!


> Hive IO dependency issues with some Hadoop profiles
> ---
>
> Key: GIRAPH-599
> URL: https://issues.apache.org/jira/browse/GIRAPH-599
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
> Fix For: 0.2.0
>
>
> Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now 
> this happens:
> {code}
> [INFO] 
> 
> [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT
> [INFO] 
> 
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
>  (15 KB at 20.2 KB/sec)
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
> [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is 
> missing, no dependency information available
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
> Downloading: 
> https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
> Downloaded: 
> https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
>  (201 KB at 194.6 KB/sec)
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Giraph Parent .. SUCCESS [0.717s]
> [INFO] Apache Giraph Core  SUCCESS [2:58.276s]
> [INFO] Apache Giraph Hive I/O  FAILURE [6.455s]
> [INFO] Apache Giraph Examples  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 3:05.779s
> [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013
> [INFO] Final Memory: 48M/352M
> [INFO] 
> 
> [ERROR] Failed to execute goal on project giraph-hive: Could not resolve 
> dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: 
> Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in 
> central (http://repo1.maven.org/maven2) -> [Help 1]
> [ERROR] 
> [ERROR] To see 

[jira] [Updated] (GIRAPH-13) Port Giraph to YARN

2013-03-28 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-13:
--

Attachment: GIRAPH-13-9-r6.patch

Just another rebase. Not to hurry anyone, I know everyone's busy, but starting 
in a week or two I will have a lot less time to fix issues that reviewers put 
up.

So...if anyone has a chance to peek at it over the next few days, I will be 
available to respond quickly to reviews, for now. If not...I understand! Thanks 
again!

I will update this on RB too, where comments on the last couple iterations of 
the patch contain good command lines for building and running it on the cluster.



> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, 
> GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles

2013-03-28 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-599:
--

 Summary: Hive IO dependency issues with some Hadoop profiles
 Key: GIRAPH-599
 URL: https://issues.apache.org/jira/browse/GIRAPH-599
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eli Reisman
 Fix For: 0.2.0


Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now this 
happens:

{code}
[INFO] 
[INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT
[INFO] 
Downloading: 
http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
Downloading: 
https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
Downloading: 
https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
Downloaded: 
https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom
 (15 KB at 20.2 KB/sec)
Downloading: 
http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
Downloading: 
https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
Downloading: 
https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom
[WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is 
missing, no dependency information available
Downloading: 
http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
Downloading: 
http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
Downloading: 
https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
Downloading: 
https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
Downloading: 
https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
Downloading: 
https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
Downloading: 
https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar
Downloaded: 
https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar
 (201 KB at 194.6 KB/sec)
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Giraph Parent .. SUCCESS [0.717s]
[INFO] Apache Giraph Core  SUCCESS [2:58.276s]
[INFO] Apache Giraph Hive I/O  FAILURE [6.455s]
[INFO] Apache Giraph Examples  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 3:05.779s
[INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013
[INFO] Final Memory: 48M/352M
[INFO] 
[ERROR] Failed to execute goal on project giraph-hive: Could not resolve 
dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: Could 
not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in central 
(http://repo1.maven.org/maven2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :giraph-hive
{code}

I was building the YARN profile, which used to cause giraph-hive not to build 
(incompatible profiles o

[jira] [Commented] (GIRAPH-582) Create a generic option for determining the number of supersteps that a job runs for

2013-03-26 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614461#comment-13614461
 ] 

Eli Reisman commented on GIRAPH-582:


Sorry to jump in late after the patch was put up, but I do think upon further 
reflection we might want in a future JIRA to change the name from 
giraph.maxSuperstep to something that clearly maps to setting the end of the 
job like giraph.finishJobOnSuperstep or something even clearer (to reflect that 
the superstep number we give is never actually executed.)

I still think its a great idea, and an option we should have had for a while 
now!

> Create a generic option for determining the number of supersteps that a job 
> runs for
> 
>
> Key: GIRAPH-582
> URL: https://issues.apache.org/jira/browse/GIRAPH-582
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-582.patch, GIRAPH-582.patch.2
>
>
> Lots of applications just run for a fixed number of iterations.  We can make 
> the code simpler if we make this feature part of the infrastructure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-13) Port Giraph to YARN

2013-03-24 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-13:
--

Attachment: GIRAPH-13-9-r5.patch

Just a rebase. Also available on RB (see link here in Description)

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, 
> GIRAPH-13-9-r5.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-579) Make it possible to use different out-edges data structures for input and computation

2013-03-24 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612196#comment-13612196
 ] 

Eli Reisman commented on GIRAPH-579:


Really clever idea. Patch looks good. +1 on idea, will build when Hive-Io maven 
repo issues are fixed and i can verify the patch. ;)

> Make it possible to use different out-edges data structures for input and 
> computation
> -
>
> Key: GIRAPH-579
> URL: https://issues.apache.org/jira/browse/GIRAPH-579
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Alessandro Presta
>Assignee: Alessandro Presta
> Attachments: GIRAPH-579.patch, GIRAPH-579.patch
>
>
> In some cases, the properties we want in the VertexEdges implementation 
> during input may differ from the ones we want during computation.
> Two examples:
> 1) During input, we want to keep only the top K edges according to weight, so 
> we use a fixed-size min-heap. During computation, our algorithm needs fast 
> random access, so we use a hash-map.
> 2) We have a VertexEdges implementation that's optimized for space and/or 
> iteration speed, but has slow insertion. We can then use a different data 
> structure that has fast insertion during input.
> We can add an option to specify a different VertexEdges class to be used in 
> EdgeStore during input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-581) More flexible Hive output

2013-03-24 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612181#comment-13612181
 ] 

Eli Reisman commented on GIRAPH-581:


Hey folks. I was just going to commit this patch, downloaded fresh trunk, 
applied 581, etc. And this happens again:

{code}
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Giraph Parent .. SUCCESS [1.573s]
[INFO] Apache Giraph Core  SUCCESS [2:42.075s]
[INFO] Apache Giraph Hive I/O  FAILURE [1:17.718s]
[INFO] Apache Giraph Examples  SKIPPED
[INFO] Apache Giraph Accumulo I/O  SKIPPED
[INFO] Apache Giraph HBase I/O ... SKIPPED
[INFO] Apache Giraph HCatalog I/O  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 4:01.839s
[INFO] Finished at: Sun Mar 24 11:07:10 PDT 2013
[INFO] Final Memory: 43M/348M
[INFO] 
[ERROR] Failed to execute goal on project giraph-hive: Could not resolve 
dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: Failed 
to collect dependencies for 
[com.facebook.giraph.hive:hive-io-experimental:jar:0.4-SNAPSHOT (compile), 
com.fasterxml.jackson.core:jackson-core:jar:2.1.0 (compile), 
com.fasterxml.jackson.core:jackson-databind:jar:2.1.0 (compile), 
com.github.spullara.cli-parser:cli-parser:jar:1.1 (compile), 
org.apache.giraph:giraph-core:jar:0.2-SNAPSHOT (compile), 
org.apache.hive:hive-metastore:jar:0.10.0 (compile), 
org.apache.giraph:giraph-core:jar:tests:0.2-SNAPSHOT (test), 
commons-net:commons-net:jar:3.1 (provided), 
org.apache.hadoop:hadoop-core:jar:0.20.203.0 (provided)]: Failed to read 
artifact descriptor for 
com.facebook.giraph.hive:hive-io-experimental:jar:0.4-SNAPSHOT: Could not 
transfer artifact 
com.facebook.giraph.hive:hive-io-experimental:pom:0.4-SNAPSHOT from/to 
sonatypereleases (https://oss.sonatype.org/content/groups/public/): Connection 
to https://oss.sonatype.org refused: Connection timed out -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :giraph-hive
{code}

I don't think its this patch, but something in the dependencies with hive-io 
jars is still not right I think?


> More flexible Hive output
> -
>
> Key: GIRAPH-581
> URL: https://issues.apache.org/jira/browse/GIRAPH-581
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-581.patch, GIRAPH-581.patch
>
>
> Currently with Hive output formats it's only possible to write single row per 
> vertex. We should support variable number of rows per vertex (zero or 
> multiple).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-582) Create a generic option for determining the number of supersteps that a job runs for

2013-03-24 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612165#comment-13612165
 ] 

Eli Reisman commented on GIRAPH-582:


Great Idea!

> Create a generic option for determining the number of supersteps that a job 
> runs for
> 
>
> Key: GIRAPH-582
> URL: https://issues.apache.org/jira/browse/GIRAPH-582
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>
> Lots of applications just run for a fixed number of iterations.  We can make 
> the code simpler if we make this feature part of the infrastructure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-583) Problem with authentication on Hadoop 0.23

2013-03-24 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612163#comment-13612163
 ] 

Eli Reisman commented on GIRAPH-583:


Hi folks. I just wanted to mention the issue I saw with RWR was also an 
IOException, but it was in the tests where InternalVertexRunner was not finding 
an output file (which is what often happens when IVR hits a problem while 
running an integration test.)

I have still not figured out what was doing it, or if the matter is resolved, 
or there is some configuration problem. I was running 2.0.3-alpha on trunk and 
it was only happening every 5-6 attempted builds. Very odd. Haven't seen it 
lately, but haven't built Giraph too much this week either.

Hope you guys get this figured out. Looks like an authentication issue. I think 
Eugene will be the guy with the answers here.

> Problem with authentication on Hadoop 0.23
> --
>
> Key: GIRAPH-583
> URL: https://issues.apache.org/jira/browse/GIRAPH-583
> Project: Giraph
>  Issue Type: Bug
>Reporter: Gianmarco De Francisci Morales
>
> Hi,
> I am trying to run the RWR code on trunk and Hadoop 0.23 with Kerberos 
> authentication, but I get this exception:
> {code}
> 13/03/23 17:32:36 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:gdfm (auth:KERBEROS) 
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 13/03/23 17:32:36 WARN ipc.Client: Exception encountered while connecting to 
> the server : javax.security.sasl.SaslException: GSS initiate failed [Caused 
> by GSSException: No valid credentials provided (Mechanism level: Failed to 
> find any Kerberos tgt)]
> 13/03/23 17:32:36 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:gdfm (auth:KERBEROS) cause:java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> 13/03/23 17:32:36 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:gdfm (auth:KERBEROS) cause:java.io.IOException: 
> Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: 
> "gwta3005.tan.ygrid.yahoo.com/98.138.127.244"; destination host is: 
> ""tiberiumtan-nn1.tan.ygrid.yahoo.com":8020; 
> Exception in thread "main" java.io.IOException: Failed on local exception: 
> java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]; Host Details : local host is: 
> "gwta3005.tan.ygrid.yahoo.com/98.138.127.244"; destination host is: 
> ""tiberiumtan-nn1.tan.ygrid.yahoo.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1092)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195)
>   at $Proxy6.getDelegationToken(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67)
>   at $Proxy6.getDelegationToken(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:603)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:786)
>   at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:466)
>   at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:444)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:122)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:101)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>   at 
> org.apache.giraph.io.formats.TextVertexOutputFormat.checkOutputSpecs(TextVertexOutputFormat.j

[jira] [Commented] (GIRAPH-510) Remove HBase Cruft

2013-03-21 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609198#comment-13609198
 ] 

Eli Reisman commented on GIRAPH-510:


As far as miniclusters, having just seen this on GIRAPH-13, let me pipe in 
again: pick dir names and ports that will not collide with: 
InternalVertexRunner, various Hadoop mini cluster impls, OR MiniYARNCluster as 
they all run tests in parallel and can conflict in confusing ways when their 
dirs or ports collide. Including tests that only fail once in a while etc. Be 
careful out there!


> Remove HBase Cruft
> --
>
> Key: GIRAPH-510
> URL: https://issues.apache.org/jira/browse/GIRAPH-510
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Nitay Joffe
>Priority: Minor
>  Labels: easy, newbie
> Attachments: GIRAPH-510.patch, GIRAPH-510-v2.patch
>
>
> The HBase tests appear to leave around lots of cruft, namely graph.csv, 
> .graph.csv in the giraph folders and -ROOT-, simple_graph, hbase.version in 
> the user home directory. We should remove these (or better yet not create 
> them on first place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-510) Remove HBase Cruft

2013-03-21 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609195#comment-13609195
 ] 

Eli Reisman commented on GIRAPH-510:


Thanks Alessandro! I didn't comment here. I had piped in about 
System.getProperty("java.io.tmp.dir") or FileUtils in commons-io (which uses 
the same thing as a base dir I think when creating a test directory?) that 
seems to work out well for this sort of thing. But yes my home dir is filling 
up with hbase.version and other charming junk so I'm all for this happening!

> Remove HBase Cruft
> --
>
> Key: GIRAPH-510
> URL: https://issues.apache.org/jira/browse/GIRAPH-510
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Nitay Joffe
>Priority: Minor
>  Labels: easy, newbie
> Attachments: GIRAPH-510.patch, GIRAPH-510-v2.patch
>
>
> The HBase tests appear to leave around lots of cruft, namely graph.csv, 
> .graph.csv in the giraph folders and -ROOT-, simple_graph, hbase.version in 
> the user home directory. We should remove these (or better yet not create 
> them on first place).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-577) Create a testing framework that doesn't require I/O formats

2013-03-20 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608267#comment-13608267
 ] 

Eli Reisman commented on GIRAPH-577:


Thanks for the contribution. Could you re-submit your diff using "git diff 
--no-prefix trunk" to strip the a/ and b/ directory headings?

So this is not meant to generate graph data, or to perform a no-op job, but to 
construct a small, hardcoded graph for reuse in small tests?

One thing along these lines we really need is someone to convert GIRAPH-26 from 
Colt to Mahout math libraries so we can generate interesting synthetic graph 
data as well, if you're curious.

> Create a testing framework that doesn't require I/O formats
> ---
>
> Key: GIRAPH-577
> URL: https://issues.apache.org/jira/browse/GIRAPH-577
> Project: Giraph
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Alessandro Presta
>Assignee: Veselin Stoyanov
>  Labels: patch
> Attachments: GIRAPH-577.patch
>
>
> Create a TestGraph class to conveniently build graphs stored in memory.
> Add appropriate input/output formats to be used in InternalVertexRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2013-03-18 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605408#comment-13605408
 ] 

Eli Reisman commented on GIRAPH-13:
---

Hey Eugene, a better command line is on the current revision of this patch on 
RB (marked r5 there, its r4 in the patch here...sorry) in the explanation. 
Forgot to post it here. And yes, there are several yarn-site.xml values you 
need set I can pass along that are not well doc'ed that make the cluster happy 
if you run into trouble.

So far, this version works well for me.

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-574:
--

 Summary: Move Giraph Master node functionality to AppMaster or 
launch directly from AppMaster in YARN profile
 Key: GIRAPH-574
 URL: https://issues.apache.org/jira/browse/GIRAPH-574
 Project: Giraph
  Issue Type: Improvement
Reporter: Eli Reisman
Priority: Minor


As folks read the Giraph on YARN code it is inevitable it will occur to someone 
"Well, if the job fails when the ApplicationMaster fails, could we move some or 
all of our Master task functions there and just call it master?"

Yes. In two ways.

One, we launch a dedicated master process marked as such with setup 
responsibilities, and we assess from the app master how the launch went. We 
keep launching "masters" until one takes. Then, we launch the workers.

Another is to simply run MasterThread and associated stuff from the App Master 
directly, and when we know its up and running properly, only then does app 
master launch the workers.

The YARN app master can be rebooted is designed to be a place for 
fault-tolerant "master node" stuff to happen. However, I think a larger purpose 
is to act as a meta-master for launching a DAG of jobs within the run of a 
single app master lifecycle. Or the app master cant act as any of these things, 
or something else I haven't thought of. The architecture is fairly malleable.

This is not a requirement for us, and maybe not a good idea at all. This is 
just a placeholder JIRA to discuss and collect ideas since as I said above 
someone is going to bring it up ;)

Thank you for reading.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile?

2013-03-16 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-574:
---

Summary: Move Giraph Master node functionality to AppMaster or launch 
directly from AppMaster in YARN profile?  (was: Move Giraph Master node 
functionality to AppMaster or launch directly from AppMaster in YARN profile)

> Move Giraph Master node functionality to AppMaster or launch directly from 
> AppMaster in YARN profile?
> -
>
> Key: GIRAPH-574
> URL: https://issues.apache.org/jira/browse/GIRAPH-574
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Eli Reisman
>Priority: Minor
>
> As folks read the Giraph on YARN code it is inevitable it will occur to 
> someone "Well, if the job fails when the ApplicationMaster fails, could we 
> move some or all of our Master task functions there and just call it master?"
> Yes. In two ways.
> One, we launch a dedicated master process marked as such with setup 
> responsibilities, and we assess from the app master how the launch went. We 
> keep launching "masters" until one takes. Then, we launch the workers.
> Another is to simply run MasterThread and associated stuff from the App 
> Master directly, and when we know its up and running properly, only then does 
> app master launch the workers.
> The YARN app master can be rebooted is designed to be a place for 
> fault-tolerant "master node" stuff to happen. However, I think a larger 
> purpose is to act as a meta-master for launching a DAG of jobs within the run 
> of a single app master lifecycle. Or the app master cant act as any of these 
> things, or something else I haven't thought of. The architecture is fairly 
> malleable.
> This is not a requirement for us, and maybe not a good idea at all. This is 
> just a placeholder JIRA to discuss and collect ideas since as I said above 
> someone is going to bring it up ;)
> Thank you for reading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile

2013-03-16 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604468#comment-13604468
 ] 

Eli Reisman commented on GIRAPH-574:


Think aggregators, master compute, other things? also...


> Move Giraph Master node functionality to AppMaster or launch directly from 
> AppMaster in YARN profile
> 
>
> Key: GIRAPH-574
> URL: https://issues.apache.org/jira/browse/GIRAPH-574
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Eli Reisman
>Priority: Minor
>
> As folks read the Giraph on YARN code it is inevitable it will occur to 
> someone "Well, if the job fails when the ApplicationMaster fails, could we 
> move some or all of our Master task functions there and just call it master?"
> Yes. In two ways.
> One, we launch a dedicated master process marked as such with setup 
> responsibilities, and we assess from the app master how the launch went. We 
> keep launching "masters" until one takes. Then, we launch the workers.
> Another is to simply run MasterThread and associated stuff from the App 
> Master directly, and when we know its up and running properly, only then does 
> app master launch the workers.
> The YARN app master can be rebooted is designed to be a place for 
> fault-tolerant "master node" stuff to happen. However, I think a larger 
> purpose is to act as a meta-master for launching a DAG of jobs within the run 
> of a single app master lifecycle. Or the app master cant act as any of these 
> things, or something else I haven't thought of. The architecture is fairly 
> malleable.
> This is not a requirement for us, and maybe not a good idea at all. This is 
> just a placeholder JIRA to discuss and collect ideas since as I said above 
> someone is going to bring it up ;)
> Thank you for reading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-573) Giraph is ready for port to Mesos or other cluster frameworks

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-573:
--

 Summary: Giraph is ready for port to Mesos or other cluster 
frameworks
 Key: GIRAPH-573
 URL: https://issues.apache.org/jira/browse/GIRAPH-573
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Eli Reisman
Priority: Minor
 Fix For: 0.2.0


The refactors and general approach that worked with YARN set up a template that 
could be adapted easily to other cluster management platforms like Mesos. Or 
take-your-pick. I am not saying this is a priority or even desirable, I leave 
that to the community.

But it would be easy now, if we want to. Ideas and opinions can be posted here.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-572) The o.a.g.yarn package could be the top-level of a source tree of packages that mirrors core

2013-03-16 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-572:
---

Summary: The o.a.g.yarn package could be the top-level of a source tree of 
packages that mirrors core  (was: The o.a.g.yarn package could be the top-level 
of a source tree of packages that miorrors core)

> The o.a.g.yarn package could be the top-level of a source tree of packages 
> that mirrors core
> 
>
> Key: GIRAPH-572
> URL: https://issues.apache.org/jira/browse/GIRAPH-572
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>
> This might be a bad idea. But here goes:
> There are possibilities to move all sorts of functionality out of the 
> Giraph/BSP parts of the code and into the YARN AppMaster, or into 
> separately-managed containers launched from the AppMaster.
> For each functionality we decide to re-implement in YARN, it will need to 
> live in the yarn package tree to be selectively compiled and to use YARN-only 
> imports.
> One possibility to begin doing this is to use GIRAPH-13's 
> Configuration#isPureYarnJob. We will use the isPureYarnJob in Giraph to 
> selectively "no-op" each functionality we replace. Then, we re-implement the 
> YARN way in our yarn package tree.
> If we do this, we should begin early by mirroring the core source tree in 
> subpackages of yarn. So if we moved a functionality out of o.a.g.graph 
> package we would reimplement it in o.a.g.yarn.graph package.
> I don't suggest doing it all at once, but as we add files to o.a.g.yarn, just 
> to get the idea out there before the files start to pile up. Anything that 
> uses YARN imports will have to choose between munge flags and being in the 
> o.a.g.yarn package, one way or another.
> If we don't like this idea, mark it won't fix. I'm not attached to it, just 
> an idea.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-572) The o.a.g.yarn package could be the top-level of a source tree of packages that miorrors core

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-572:
--

 Summary: The o.a.g.yarn package could be the top-level of a source 
tree of packages that miorrors core
 Key: GIRAPH-572
 URL: https://issues.apache.org/jira/browse/GIRAPH-572
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eli Reisman


This might be a bad idea. But here goes:

There are possibilities to move all sorts of functionality out of the 
Giraph/BSP parts of the code and into the YARN AppMaster, or into 
separately-managed containers launched from the AppMaster.

For each functionality we decide to re-implement in YARN, it will need to live 
in the yarn package tree to be selectively compiled and to use YARN-only 
imports.


One possibility to begin doing this is to use GIRAPH-13's 
Configuration#isPureYarnJob. We will use the isPureYarnJob in Giraph to 
selectively "no-op" each functionality we replace. Then, we re-implement the 
YARN way in our yarn package tree.

If we do this, we should begin early by mirroring the core source tree in 
subpackages of yarn. So if we moved a functionality out of o.a.g.graph package 
we would reimplement it in o.a.g.yarn.graph package.

I don't suggest doing it all at once, but as we add files to o.a.g.yarn, just 
to get the idea out there before the files start to pile up. Anything that uses 
YARN imports will have to choose between munge flags and being in the 
o.a.g.yarn package, one way or another.

If we don't like this idea, mark it won't fix. I'm not attached to it, just an 
idea.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-571) Giraph on YARN could launch a job-local ZK instance from the AppMaster

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-571:
--

 Summary: Giraph on YARN could launch a job-local ZK instance from 
the AppMaster
 Key: GIRAPH-571
 URL: https://issues.apache.org/jira/browse/GIRAPH-571
 Project: Giraph
  Issue Type: Improvement
  Components: zookeeper
Affects Versions: 0.2.0
Reporter: Eli Reisman


Once GIRAPH-13 is in, we can think differently about a lot of things if we 
choose too.

for one thing, we have had problems launching job-local ZK instances. We could 
(for YARN) move that functionality to the App Master, having it launch a 
container just for ZK and populating the Configuration's giraph.zkList setting 
so when the MRv1 ZK manager code sees the Conf, it will think we already have a 
non-job-local ZK at zkList's host and port, and will just connect instead of 
starting another local instance, making the whole affair transparent to 
existing Giraph code.

Not important, but the YARN patch is currently defaulting to only execute jobs 
with a non-local ZK instance already running, and giraph.zkList populated with 
its host:port.

Its quite possible when we get our MRv1 job local ZK working again, we can 
remove this and it will work right out of the box, there's no reason it won't. 
But managing extraneous services (especially those that hold up the job setup 
like launching a ZK) is what the YARN AppMaster is all about anyway. i haven't 
been able to get our local ZK instance to launch outside of test cases for a 
while now.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-570) Create YARN RPC Records using BuilderUtils instead of populating them by hand

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-570:
--

 Summary: Create YARN RPC Records using BuilderUtils instead of 
populating them by hand
 Key: GIRAPH-570
 URL: https://issues.apache.org/jira/browse/GIRAPH-570
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eli Reisman
Priority: Minor
 Fix For: 0.2.0


Good newbie JIRA. Might need to check as see how far back in the Hadoop-2.0.x 
line the BuilderUtils exist so we know if we are cutting ourselves off from a 
future backport, but if we don't care, this can happen:

Instead of creating and hand-populating each RPC record Giraph uses to request 
resources from YARN like:

{code}
Record x = Records.newRecord( className );
x.setField(blah);
x.setOtherField(blahblah);
// ...and so on
{code}

we can use BuilderUtils:

{code}
Record readyToSend = BuilderUtils.MakeMyNewRecord( blah, blah );
{code}

anyway you get the drill.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-569) Decided what the versioning story should be for Giraph on YARN

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-569:
--

 Summary: Decided what the versioning story should be for Giraph on 
YARN
 Key: GIRAPH-569
 URL: https://issues.apache.org/jira/browse/GIRAPH-569
 Project: Giraph
  Issue Type: Task
Affects Versions: 0.2.0
Reporter: Eli Reisman
Priority: Minor


Right now, Giraph straddles the fence between a new and old YARN API. The place 
we're starting is a good compromise but we will need to make some decisions if 
we want to backport.

Pros:

Service as many version of YARN as possible, going back potentially to 2.0.1 or 
2.0.0.

Cons:

I would like to provide the slickest, most up-to-date example of how to run a 
framework like Giraph with a YARN cluster so that others can take an example 
from us. I have been told by folks who know that these newer API's are more 
concise and more robust. But this is currently looking like supporting 
2.0.3-alpha at the very oldest, and newer versions up to trunk, and thats it. 
This sort sucks because we have legitimate, working profiles for the whole 
2.0.x line and there may be some expectations there.

On the other hand, by not backporting, could go the other direction and adopt 
some of the newest 2.0.4-alpha API and just assume YARN is maturing and folks 
using it now would update with each alpha release right away anyhow. Adding the 
new API's to the whole YARN impl (especially the GiraphApplicationMaster) would 
make the implementation a real nice example of how to use the new API's and 
would make the profile more robust in job runs.

Opinions?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-568) Giraph on YARN will need a WebUI to display job stats, Yammer metrics, whatever

2013-03-16 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-568:
--

 Summary: Giraph on YARN will need a WebUI to display job stats, 
Yammer metrics, whatever
 Key: GIRAPH-568
 URL: https://issues.apache.org/jira/browse/GIRAPH-568
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eli Reisman
 Fix For: 0.2.0


In YARN, the Client is the driver program when you run a job at the commmand 
line. This lauches the application master, which is like an uber-master that 
manages the job lifecycle for all the Giraph worker/master tasks that actually 
run the BSP job.

The Application Master can register an RPC Port and a Tracking URL with the 
YARN system (ResourceManager) which will be published on the YARN cluster WebUI 
in case folks running a Giraph job want to see detailed formatted web info such 
as Hadoop has. Previously we have hijacked Hadoop's counters and web ui. Now, 
we can start to think fresh about how to read logs, view job and node status, 
memory use, disk spills, Yammer metrics, whatever.

Someone could get very creative with this. If someone is feeling up to it, I 
can show you where the YARN bits are you will want to interface with. The rest 
can really go any way you want it to.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-560) Input filtering

2013-03-16 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604357#comment-13604357
 ] 

Eli Reisman commented on GIRAPH-560:


Yes. Great idea. The current ways of forcing this are unfortunate.

> Input filtering
> ---
>
> Key: GIRAPH-560
> URL: https://issues.apache.org/jira/browse/GIRAPH-560
> Project: Giraph
>  Issue Type: Bug
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> Add some simple filtering for user to be able to drop edges / vertices at 
> input time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-13) Port Giraph to YARN

2013-03-16 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-13:
--

Attachment: GIRAPH-13-9-r4.patch

Thanks Eugene! This will be a bear to review so take your time. But make sure 
and use this copy, the integration tests would occasionally fail on the last 
one because tests that run InternalVertexRunner were occasionally stealing each 
other's test dirs and ports. All fixed here. I have run a bunch of jobs on this 
today and its running well now (I hope!)

I'll put this on RB too.

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Eli Reisman
> Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, 
> GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, 
> GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, 
> GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.
> Review Board link (ready for review now): https://reviews.apache.org/r/9811/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (GIRAPH-567) Tests on trunk are failing for giraph-examples at RandomWalk

2013-03-15 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-567:
---

Summary: Tests on trunk are failing for giraph-examples at RandomWalk  
(was: Tests on trunk are failing for giraph-examples at RandomWalks)

> Tests on trunk are failing for giraph-examples at RandomWalk
> 
>
> Key: GIRAPH-567
> URL: https://issues.apache.org/jira/browse/GIRAPH-567
> Project: Giraph
>  Issue Type: Bug
>Reporter: Eli Reisman
>
> Seems to be something has upset the tests in examples here, this is the 
> surefire report from "mvn verify" on trunk tonight:
> {code}
> ---
> Test set: org.apache.giraph.examples.RandomWalkWithRestartVertexTest
> ---
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.085 sec <<< 
> FAILURE!
> testWeightedGraph(org.apache.giraph.examples.RandomWalkWithRestartVertexTest) 
>  Time elapsed: 1.038 sec  <<< ERROR!
> java.io.FileNotFoundException: 
> /var/folders/wq/rrrp5_8s3wgby3ybwn87z5lcgn/T/giraph-RandomWalkWithRestartVertex-1996932221558672384/output/part-m-0
>  (No such file or directory)
> at java.io.FileInputStream.open(Native Method)
> at java.io.FileInputStream.(FileInputStream.java:120)
> at com.google.common.io.Files$1.getInput(Files.java:110)
> at com.google.common.io.Files$1.getInput(Files.java:107)
> at com.google.common.io.CharStreams$2.getInput(CharStreams.java:93)
> at com.google.common.io.CharStreams$2.getInput(CharStreams.java:90)
> at com.google.common.io.CharStreams.readLines(CharStreams.java:310)
> at com.google.common.io.Files.readLines(Files.java:544)
> at 
> org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:208)
> at 
> org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:77)
> at 
> org.apache.giraph.examples.RandomWalkWithRestartVertexTest.testWeightedGraph(RandomWalkWithRestartVertexTest.java:108)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> at 
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
> at 
> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120)
> at 
> org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103)
> at org.apache.maven.surefire.Surefire.run(Surefire.java:169)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
> at 
> org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-554) Set PartitionContext in InternalVertexRunner

2013-03-15 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604105#comment-13604105
 ] 

Eli Reisman commented on GIRAPH-554:


Did we ever rerun this? Where's the SUCCESS log?

> Set PartitionContext in InternalVertexRunner
> 
>
> Key: GIRAPH-554
> URL: https://issues.apache.org/jira/browse/GIRAPH-554
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Attachments: GIRAPH-554.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (GIRAPH-567) Tests on trunk are failing for giraph-examples at RandomWalks

2013-03-15 Thread Eli Reisman (JIRA)
Eli Reisman created GIRAPH-567:
--

 Summary: Tests on trunk are failing for giraph-examples at 
RandomWalks
 Key: GIRAPH-567
 URL: https://issues.apache.org/jira/browse/GIRAPH-567
 Project: Giraph
  Issue Type: Bug
Reporter: Eli Reisman


Seems to be something has upset the tests in examples here, this is the 
surefire report from "mvn verify" on trunk tonight:

{code}
---
Test set: org.apache.giraph.examples.RandomWalkWithRestartVertexTest
---
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.085 sec <<< 
FAILURE!
testWeightedGraph(org.apache.giraph.examples.RandomWalkWithRestartVertexTest)  
Time elapsed: 1.038 sec  <<< ERROR!
java.io.FileNotFoundException: 
/var/folders/wq/rrrp5_8s3wgby3ybwn87z5lcgn/T/giraph-RandomWalkWithRestartVertex-1996932221558672384/output/part-m-0
 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:120)
at com.google.common.io.Files$1.getInput(Files.java:110)
at com.google.common.io.Files$1.getInput(Files.java:107)
at com.google.common.io.CharStreams$2.getInput(CharStreams.java:93)
at com.google.common.io.CharStreams$2.getInput(CharStreams.java:90)
at com.google.common.io.CharStreams.readLines(CharStreams.java:310)
at com.google.common.io.Files.readLines(Files.java:544)
at 
org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:208)
at 
org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:77)
at 
org.apache.giraph.examples.RandomWalkWithRestartVertexTest.testWeightedGraph(RandomWalkWithRestartVertexTest.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
at 
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120)
at 
org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103)
at org.apache.maven.surefire.Surefire.run(Surefire.java:169)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
at 
org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-565) Make an easy way to gather some logs from workers on master

2013-03-15 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604101#comment-13604101
 ] 

Eli Reisman commented on GIRAPH-565:


This is great. Cool idea! +1


> Make an easy way to gather some logs from workers on master
> ---
>
> Key: GIRAPH-565
> URL: https://issues.apache.org/jira/browse/GIRAPH-565
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-565.patch
>
>
> When debugging jobs with a lot of workers, it's really useful to be able to 
> have some information from any of the workers at a single place, and not to 
> have to go through each worker's logs to find what you are looking for.
> Every time I do this I find myself implementing some aggregator to gather 
> those logs from all the workers on the master, so might as well make this 
> aggregator an easy option for everyone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-564) Input formats should provide GiraphContext

2013-03-15 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603558#comment-13603558
 ] 

Eli Reisman commented on GIRAPH-564:


@Avery: yes this will be a big help. On the other hand, I have gotten the YARN 
impl to work without it, and in a stable way, so we're free to let this take a 
while if we want. Replacing the Mapper#Context connection that has been exposed 
now in GraphMapper/GraphTaskManager will get rid of the umbilical cord to 
Hadoop MRv1. On the other hand, this also means a big refactor to the IO 
formats since they depend on various Task-related objects handed off to us by 
the Mapper#Context, our Configuration is the "easy" one to deal with.

@Alessandro: I like this idea, and I like the simplification. I'm thinking 
there were some places outside IO where the Immutable version is non-negotiable 
to get the generics plumbing to work on that reference down the road. So just 
placing the wrapper in the IO might not work. I think there were at least 2 
places just internal to the YARN setup code and ConfigurationUtils where I had 
to wrap the class to keep the generics working. Other times it doesn't seem to 
matter.

A cleaner solution here is inevitable soon. The Mapper#Context is the key to 
the whole thing. I was actually going to put up this JIRA this week myself ;)

In the GiraphYarnTask in GIRAPH-13 you can see what stuff the Mapper#Context 
replacement will need to carry in for the engine to turn over on the Giraph 
side. So it could be simpler than Mapper#context in a bunch of ways also.


> Input formats should provide GiraphContext
> --
>
> Key: GIRAPH-564
> URL: https://issues.apache.org/jira/browse/GIRAPH-564
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Avery Ching
>
> Context is a MapReduce Context that input classes have to explicitly create a 
> ImmutableGiraphClassesConfiguration from (which is not intuitive).  It would 
> be better to provide a GiraphContext that would provide a 
> ImmutableGiraphClassesConfiguration directly for the user, while still 
> providing the user access to the MapReduce Context if really necessary.  This 
> might also help with the YARN port?  Not sure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-547) Allow in-place modification of edges

2013-03-15 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603536#comment-13603536
 ] 

Eli Reisman commented on GIRAPH-547:


Thanks!


On Thu, Mar 14, 2013 at 4:52 PM, Alessandro Presta (JIRA)



> Allow in-place modification of edges
> 
>
> Key: GIRAPH-547
> URL: https://issues.apache.org/jira/browse/GIRAPH-547
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Alessandro Presta
>Assignee: Alessandro Presta
> Attachments: GIRAPH-547.patch
>
>
> This is a somewhat long term item.
> Because of some optimized edge storage implementations (byte array, primitive 
> array), we have a contract with the user that Edge objects returned by 
> getEdges() are read-only.
> One concrete example where in-place modification would be useful: in the 
> weighted version of PageRank, you can store the weight sum and normalize each 
> message sent, or you could more efficiently normalize the out-edges once in 
> superstep 0.
> The Pregel paper describes an OutEdgeIterator that allows for in-place 
> modification of edges. I can see how that would be easy to implement in C++, 
> where there is no need to reuse objects.
> Giraph "unofficially" supports this if one is using generic collections to 
> represent edges (e.g. ArrayList or HashMap).
> It may be trickier in some optimized implementations, but in principle it 
> should be doable.
> One way would be to have some special MutableEdge implementation which calls 
> back to the edge data structure in order to save modifications:
> {code}
> for (Edge edge : getEdges()) {
>   edge.setValue(newValue);
> }
> {code}
> Another option would be to add a special set() method to our edge iterator, 
> where one can replace the current edge:
> {code}
> for (EdgeIterator it = getEdges().iterator(); it.hasNext();) {
>   Edge edge = it.next();
>   edge.setValue(newValue);
>   it.set(edge);
> }
> {code}
> We could actually implement the first version as syntactic sugar on top of 
> the second version (the special MutableEdge would need a reference to the 
> iterator in order to call set(this)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   7   8   >