[jira] [Resolved] (GIRAPH-971) Simple Giraph Oozie Action module
[ https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman resolved GIRAPH-971. Resolution: Won't Fix old ticket, will come back to this if I end up needing it or anyone else shows interest, closing the ticket for now > Simple Giraph Oozie Action module > - > > Key: GIRAPH-971 > URL: https://issues.apache.org/jira/browse/GIRAPH-971 > Project: Giraph > Issue Type: New Feature > Components: conf and scripts >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Trivial > Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch, > GIRAPH-971-3.patch > > > Adds 'giraph-oozie' module while will build a JAR to be installed/configured > as an Oozie extension as well as added to Giraph runtime deps. Alllows us to > write Oozie workflow XML's that include a Action node. > Not well tested yet, but module builds fine in default Giraph profile against > Hadoop 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-969) STATIC_SASL_SYMBOL munge results in compilation errors for yarn profile with hadoop > 2.3.0
[ https://issues.apache.org/jira/browse/GIRAPH-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-969: --- Attachment: GIRAPH-969-1.patch This should fix the YARN profile Hadoop-2.6.x build issues. > STATIC_SASL_SYMBOL munge results in compilation errors for yarn profile with > hadoop > 2.3.0 > --- > > Key: GIRAPH-969 > URL: https://issues.apache.org/jira/browse/GIRAPH-969 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: Hadoop 2.3.0 and higher >Reporter: Philipp Nolte > Attachments: GIRAPH-969-1.patch > > > The SaslRpcServer.SALS_PROPS field was removed in Hadoop 2.3.0 (see > https://issues.apache.org/jira/browse/HADOOP-10451). > The hadoop_yarn profile uses the STATIC_SASL munge symbol and makes Giraph > try to use the SALS_PROPS field. > This results in a compilation error when running > {noformat} > mvn clean package -Phadoop_yarn -Dhadoop.version=2.5.1 > {noformat} > {noformat} > [ERROR] > giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyClient.java:[84,68] > cannot find symbol > symbol: variable SASL_PROPS > location: class org.apache.hadoop.security.SaslRpcServer > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module
[ https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-971: --- Attachment: GIRAPH-971-3.patch > Simple Giraph Oozie Action module > - > > Key: GIRAPH-971 > URL: https://issues.apache.org/jira/browse/GIRAPH-971 > Project: Giraph > Issue Type: New Feature > Components: conf and scripts >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Trivial > Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch, > GIRAPH-971-3.patch > > > Adds 'giraph-oozie' module while will build a JAR to be installed/configured > as an Oozie extension as well as added to Giraph runtime deps. Alllows us to > write Oozie workflow XML's that include a Action node. > Not well tested yet, but module builds fine in default Giraph profile against > Hadoop 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module
[ https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-971: --- Attachment: GIRAPH-971-2.patch Now passes 'mvn verify' on default build profile. Needs more love around example workflow and testing. > Simple Giraph Oozie Action module > - > > Key: GIRAPH-971 > URL: https://issues.apache.org/jira/browse/GIRAPH-971 > Project: Giraph > Issue Type: New Feature > Components: conf and scripts >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Trivial > Attachments: GIRAPH-971-1.patch, GIRAPH-971-2.patch > > > Adds 'giraph-oozie' module while will build a JAR to be installed/configured > as an Oozie extension as well as added to Giraph runtime deps. Alllows us to > write Oozie workflow XML's that include a Action node. > Not well tested yet, but module builds fine in default Giraph profile against > Hadoop 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-971) Simple Giraph Oozie Action module
[ https://issues.apache.org/jira/browse/GIRAPH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-971: --- Attachment: GIRAPH-971-1.patch > Simple Giraph Oozie Action module > - > > Key: GIRAPH-971 > URL: https://issues.apache.org/jira/browse/GIRAPH-971 > Project: Giraph > Issue Type: New Feature > Components: conf and scripts >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Trivial > Attachments: GIRAPH-971-1.patch > > > Adds 'giraph-oozie' module while will build a JAR to be installed/configured > as an Oozie extension as well as added to Giraph runtime deps. Alllows us to > write Oozie workflow XML's that include a Action node. > Not well tested yet, but module builds fine in default Giraph profile against > Hadoop 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-971) Simple Giraph Oozie Action module
Eli Reisman created GIRAPH-971: -- Summary: Simple Giraph Oozie Action module Key: GIRAPH-971 URL: https://issues.apache.org/jira/browse/GIRAPH-971 Project: Giraph Issue Type: New Feature Components: conf and scripts Reporter: Eli Reisman Assignee: Eli Reisman Priority: Trivial Adds 'giraph-oozie' module while will build a JAR to be installed/configured as an Oozie extension as well as added to Giraph runtime deps. Alllows us to write Oozie workflow XML's that include a Action node. Not well tested yet, but module builds fine in default Giraph profile against Hadoop 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-959) Giraph's 1.1.0 hadoop_yarn profile can no longer be built with hadoop 2.0.3-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247459#comment-14247459 ] Eli Reisman commented on GIRAPH-959: The original hadoop_yarn profile ran on Hadoop 2.0.3-alpha or newer. As far as I know, there were modifications made to the hadoop_yarn profile by LinkedIn folks more recently that made it dependent on Hadoop 2.2.0 or newer, it should build fine on those versions. There should be some JIRA tickets documenting the discussion around that. There are threads in the mailing list that address it. > Giraph's 1.1.0 hadoop_yarn profile can no longer be built with hadoop > 2.0.3-alpha > - > > Key: GIRAPH-959 > URL: https://issues.apache.org/jira/browse/GIRAPH-959 > Project: Giraph > Issue Type: Bug > Components: build >Affects Versions: 1.1.0 >Reporter: Philipp Nolte > Labels: build, dependencies, hadoop-version > > Trying to build giraph release 1.1.0-RC1 for hadoop 2.0.3-alpha with profile > hadoop_yarn > {{$ git clone git://git.apache.org/giraph.git}} > {{$ git checkout release-1.1.0-RC1}} > {{$ mvn -Dhadoop.version=2.0.3-alpha -DskipTests -Phadoop_yarn clean package}} > fails with lots of {{cannot find symbol}} errors: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) > on project giraph-core: Compilation failure: Compilation failure: > [ERROR] > /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[49,41] > package org.apache.hadoop.yarn.client.api does not exist > [ERROR] > /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[50,41] > package org.apache.hadoop.yarn.client.api does not exist > [ERROR] > /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[52,41] > cannot find symbol > [ERROR] symbol: class YarnException > [ERROR] location: package org.apache.hadoop.yarn.exceptions > [ERROR] > /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[88,11] > cannot find symbol > [ERROR] symbol: class YarnClient > [ERROR] location: class org.apache.giraph.yarn.GiraphYarnClient > [ERROR] > /Users/philipp/Code/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphYarnClient.java:[115,52] > cannot find symbol > {noformat} > This is probably due to missing dependencies in the hadoop_yarn profile. It > may also mean, that giraph's hadoop_yarn profile is no longer compatible with > hadoop 2.0.3-alpha, as hadoop-yarn-project version 2.0.3-alpha does not > include the package org.apache.hadoop.yarn.client.api for example. > In latter case, the pom.xml comment stating that hadoop_yarn runs on hadoop > 2.0.3-alpha by default is deprecated and should be removed to prevent > confusion. > What versions of hadoop can I build giraph version 1.1.0 with the hadoop_yarn > profile with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-811) Infinite ZooKeeper CleanUp
[ https://issues.apache.org/jira/browse/GIRAPH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974922#comment-13974922 ] Eli Reisman commented on GIRAPH-811: This is a good solution, are you certain this issue is fixed, I think since the transition to Hadoop 2.2.0 YARN support I have seen some more recent reports of this problem? This (rather than the >= solution) seems like the right fix if we still need one. [~aching] ^ if people are still reporting cleanup problems due to the "extra" master task in Giraph-on-YARN runs, I'd take a look at this patch or a variation of it. > Infinite ZooKeeper CleanUp > -- > > Key: GIRAPH-811 > URL: https://issues.apache.org/jira/browse/GIRAPH-811 > Project: Giraph > Issue Type: Bug > Components: bsp, zookeeper >Affects Versions: 1.1.0 >Reporter: Alexandre Fonseca > Labels: yarn > Attachments: GIRAPH-811.patch > > > While executing the SimpleShortestPaths example with Giraph 1.1.0-SNAPSHOT > compiled for Hadoop Yarn 2.2.0, I've noticed that the application would never > stop even after recognizing that all supersteps had completed and the output > had been written to the output directory. > Looking at the logs, I found that the BspServiceMaster is stuck at the while > loop at the end of cleanrUpZooKeeper() (BspServiceMaster.java:1729): > {code}2013-12-08 03:51:21,698 INFO [org.apache.giraph.master.MasterThread] > master.MasterThread (MasterThread.java:run(121)) - masterThread: Coordination > of superstep 3 took 0.433 seconds ended with state ALL_SUPERSTEPS_DONE and is > now on superstep 4 > 2013-12-08 03:51:21,699 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:setJobState(261)) - > setJobState: > {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on > superstep 4 > 2013-12-08 03:51:21,753 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanup(1836)) - cleanup: > Notifying master its okay to cleanup with > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir/0_master > 2013-12-08 03:51:21,790 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1711)) - > cleanUpZooKeeper: Node > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir already > exists, no need to create. > 2013-12-08 03:51:21,792 INFO [org.apache.giraph.master.MasterThread] > bsp.BspInputFormat (BspInputFormat.java:getMaxTasks(64)) - getMaxTasks: Max > workers = 1, split master/worker = true, is YARN-only job = true, total max > tasks = 1 > 2013-12-08 03:51:21,792 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1735)) - > cleanUpZooKeeper: Got 2 of 1 desired children from > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir > 2013-12-08 03:51:21,793 INFO [org.apache.giraph.master.MasterThread] > master.BspServiceMaster (BspServiceMaster.java:cleanUpZooKeeper(1744)) - > cleanedUpZooKeeper: Waiting for the children of > /_hadoopBsp/giraph_yarn_application_1386468390622_0005/_cleanedUpDir to > change since only got 2 nodes.{code} > As the last 2 entries show, instead of registering just 1 task ending, it > registers 2 and thus it misses the condition on line 1740. > One solution would be to change the == in line 1740 to a >=. However, the > actual issue seems to reside with the BspInputFormat.getMaxTasks() > (BspInputFormat.java:51). This function assumes that in a pure yarn execution > the total number of tasks will be equal to the maximum number of workers. > However, based on GiraphApplicationMaster:167, this is not the case. An extra > Master task is launched in addition to all the Worker tasks. > BspInputFormat.getMaxTasks() should then return maxWorkers + 1 in the case of > a pure yarn execution. > Compilation: > {code}mvn -Phadoop_yarn -Dhadoop.version=2.2.0 -DskipTests compile{code} > Execution command: > {code}$HADOOP_PREFIX/bin/hadoop jar > ~/Projects/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsComputation -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip > giraph/input/tiny_graph.txt -vof > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > giraph/output/shortestpahts -w 1 -ca giraph.zkList=localhost:2181 -yj > giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar{code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete
[ https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887351#comment-13887351 ] Eli Reisman commented on GIRAPH-747: Had a chance to look again and my read is this breaks non-YARN. We might need to adjust this patch to use another method. I do think this is a real issue and we should get something in to fix it. > BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers > to complete > --- > > Key: GIRAPH-747 > URL: https://issues.apache.org/jira/browse/GIRAPH-747 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chuan Lei >Assignee: Chuan Lei > Fix For: 1.0.0 > > Attachments: GIRAPH-747.v1.patch > > > In BspServiceMaster, the function cleanUpZooKeeper should wait for the number > of workers and masters to complete. However, it appears that maxTasks only > takes workers into consideration. Consequently, the worker straggler may fail > to report to the ZooKeeper due to the path gets removed too early. This will > cause No lease on path File does not exist exception at runtime. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete
[ https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886989#comment-13886989 ] Eli Reisman commented on GIRAPH-747: Hey, reviewing this. I recall this issue I thought I was shimming this number somewhere else? The reason is that BspServiceMaster is also used by non-YARN and I didn't want to break or alter the shared code. Could another non-YARN Giraph committer take a look and see if this change is safe? If not we should def commit this. If so, maybe another (ugh) munge flag here will suffice? > BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers > to complete > --- > > Key: GIRAPH-747 > URL: https://issues.apache.org/jira/browse/GIRAPH-747 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chuan Lei >Assignee: Chuan Lei > Fix For: 1.0.0 > > Attachments: GIRAPH-747.v1.patch > > > In BspServiceMaster, the function cleanUpZooKeeper should wait for the number > of workers and masters to complete. However, it appears that maxTasks only > takes workers into consideration. Consequently, the worker straggler may fail > to report to the ZooKeeper due to the path gets removed too early. This will > cause No lease on path File does not exist exception at runtime. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete
[ https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886195#comment-13886195 ] Eli Reisman commented on GIRAPH-747: I'll review and commit this, thanks again! > BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers > to complete > --- > > Key: GIRAPH-747 > URL: https://issues.apache.org/jira/browse/GIRAPH-747 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Chuan Lei >Assignee: Chuan Lei > Fix For: 1.0.0 > > Attachments: GIRAPH-747.v1.patch > > > In BspServiceMaster, the function cleanUpZooKeeper should wait for the number > of workers and masters to complete. However, it appears that maxTasks only > takes workers into consideration. Consequently, the worker straggler may fail > to report to the ZooKeeper due to the path gets removed too early. This will > cause No lease on path File does not exist exception at runtime. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (GIRAPH-819) Number of containers required for a job
[ https://issues.apache.org/jira/browse/GIRAPH-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886157#comment-13886157 ] Eli Reisman commented on GIRAPH-819: Thanks, sorry it took so long I will be happy to try this patch out tonight. good catch. If Mohammed approves also I will commit it. > Number of containers required for a job > --- > > Key: GIRAPH-819 > URL: https://issues.apache.org/jira/browse/GIRAPH-819 > Project: Giraph > Issue Type: Bug > Components: lib, mapreduce >Affects Versions: 1.1.0 >Reporter: Rafal Wojdyla > Labels: patch > Fix For: 1.1.0 > > Attachments: GIRAPH-819.patch > > > Java 1.6.x > Giraph trunk - revert java 1.7 support. > Hadoop 2.2.0.x > Job submission fails due to: > {noformat} > 13/11/28 12:02:14 INFO yarn.GiraphYarnClient: Running Client > 13/11/28 12:02:14 INFO client.RMProxy: Connecting to ResourceManager at > master/192.168.1.100:8045 > 13/11/28 12:02:15 INFO yarn.GiraphYarnClient: Got node report from ASM for, > nodeId=kreator:46477, nodeAddresskreator:8042, nodeRackName/default-rack, > nodeNumContainers7 > 13/11/28 12:02:15 INFO yarn.GiraphYarnClient: Got node report from ASM for, > nodeId=exotica:46645, nodeAddressexotica:8042, nodeRackName/default-rack, > nodeNumContainers8 > Exception in thread "main" java.lang.RuntimeException: Giraph job requires 2 > containers to run; cluster only hosts 15 > at > org.apache.giraph.yarn.GiraphYarnClient.checkPerNodeResourcesAvailable(GiraphYarnClient.java:230) > at > org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (GIRAPH-794) add support for generic hadoop1 and hadoop2 profiles
[ https://issues.apache.org/jira/browse/GIRAPH-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842580#comment-13842580 ] Eli Reisman commented on GIRAPH-794: i like this idea, does anyone have a problem with this? > add support for generic hadoop1 and hadoop2 profiles > > > Key: GIRAPH-794 > URL: https://issues.apache.org/jira/browse/GIRAPH-794 > Project: Giraph > Issue Type: Improvement > Components: build >Affects Versions: 1.0.0 >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > Fix For: 1.1.0 > > Attachments: > 0001-GIRAPH-794.-add-support-for-generic-hadoop1-and-hado.patch > > > I would like to propose that as part of Giraph 1.1.0 we introduce generic > hadoop1 and hadoop2 profiles that would be expected to track latest releases > on hadoop 1.x and hadoop 2.x codelines (currently these are 1.2.1 and 2.2.0). > These profiles will be the ones used to publish Giraph maven artifacts. > Following the convention established by HBase I propose that we bake hadoop1 > and hadoop2 tokens into a version. > Thus every release of Giraph starting from 1.1.0 will deploy the following > versions: >* (same as -hadoop1) >* -hadoop1 >* -hadoop2 > Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-730: --- Attachment: GIRAPH-730-2-suggestion.patch > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730-2-suggestion.patch, GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-730: --- Attachment: (was: GIRAPH-730-2.patch) > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API
[ https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814479#comment-13814479 ] Eli Reisman commented on GIRAPH-737: Committed, thanks so much Muhammad, great work! > Giraph Application Master: move to new and stable YARN API > -- > > Key: GIRAPH-737 > URL: https://issues.apache.org/jira/browse/GIRAPH-737 > Project: Giraph > Issue Type: New Feature > Components: mapreduce >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: GIRAPH-737-2.patch, GIRAPH-737-3.patch, > GIRAPH-737.WIP.patch > > > Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a > Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn > significantly *overhauled* its APIs and associated coding patterns. The new > beta version is 2.1.x and I was told by Yarn-dev that current APIs will not > change much. > In the above circumstances, we need to substantially overhaul Giraph AM as > well to accommodate with the new Yarn API. Moreover, in newer YARN API, > supporting kerberos security in AM becomes easier and more transparent. > Potential impact: > The upcoming Girpah AM will not work with earlier alpha Hadoop versions such > as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, > the more prevalent way of Giraph processing (MR-based) should continue to > work. > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API
[ https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-737: --- Attachment: GIRAPH-737-2.patch This is the most recent patch Muhammad uploaded to RB. I am posting here for convenience. When you build the very first time using {code}mvn -Phadoop_yarn -Dhadoop.version=2.1.1-SNAPSHOT clean package -Dtest=TestFilters -DfailIfNoTests=false{code} then the patch builds fine. Once the full build has completed, one can run a more vanilla: {code}mvn -Phadoop_yarn -Dhadoop.version=2.1.1-SNAPSHOT clean verify{code} will build flawlessly. The bad news: we still have 342 check style issues to resolve. Once Muhammad uploads 737-3 patch with the checkstyle issues fixed, we're ready to commit. Excited to get this checked in! > Giraph Application Master: move to new and stable YARN API > -- > > Key: GIRAPH-737 > URL: https://issues.apache.org/jira/browse/GIRAPH-737 > Project: Giraph > Issue Type: New Feature > Components: mapreduce >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Attachments: GIRAPH-737-2.patch, GIRAPH-737.WIP.patch > > > Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a > Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn > significantly *overhauled* its APIs and associated coding patterns. The new > beta version is 2.1.x and I was told by Yarn-dev that current APIs will not > change much. > In the above circumstances, we need to substantially overhaul Giraph AM as > well to accommodate with the new Yarn API. Moreover, in newer YARN API, > supporting kerberos security in AM becomes easier and more transparent. > Potential impact: > The upcoming Girpah AM will not work with earlier alpha Hadoop versions such > as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, > the more prevalent way of Giraph processing (MR-based) should continue to > work. > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API
[ https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792207#comment-13792207 ] Eli Reisman commented on GIRAPH-737: I'm +1 on moving forward with this, thanks Avery! I originally wanted to be as backwards compatible as possible but the YARN API's have evolved so much I think moving forward with this will be a big win. > Giraph Application Master: move to new and stable YARN API > -- > > Key: GIRAPH-737 > URL: https://issues.apache.org/jira/browse/GIRAPH-737 > Project: Giraph > Issue Type: New Feature > Components: mapreduce >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > > Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a > Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn > significantly *overhauled* its APIs and associated coding patterns. The new > beta version is 2.1.x and I was told by Yarn-dev that current APIs will not > change much. > In the above circumstances, we need to substantially overhaul Giraph AM as > well to accommodate with the new Yarn API. Moreover, in newer YARN API, > supporting kerberos security in AM becomes easier and more transparent. > Potential impact: > The upcoming Girpah AM will not work with earlier alpha Hadoop versions such > as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, > the more prevalent way of Giraph processing (MR-based) should continue to > work. > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788341#comment-13788341 ] Eli Reisman commented on GIRAPH-730: Oops -- comment above seems to have cut off the top of my original text. Missing part: I am uploading a "suggestion patch" in case Chaun is working something else now that does what we talked about above -- just synchronize the creation of the LOCAL_RESOURCES map, not every call to get a reference to it. > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730-2.patch, GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-730: --- Attachment: GIRAPH-730-2.patch I'll upload the patch here, but if Chaun is still working this problem, I'm happy to leave it to him to fix this problem. I am in agreement now that if we catch this fix at the getTaskResourceMap() level we can solve the problem for now. Great work, Chaun! One issue: i'm having trouble building Giraph (under any profile, even default) right now to test this...is the build broken right now? I'm on a clean trunk repo...? > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730-2.patch, GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755667#comment-13755667 ] Eli Reisman commented on GIRAPH-730: My question is this: the AM is running as a single thread, and then makes a request for all the containers it needs in one lump. In my tests, what happened was this callback (by the RM giving the local AM all the containers it was asked for) returned the whole bunch of containers at once, but this call is made asynchronously. However, once the callback produced all the requested containers (always in a single asynchronous callback), the same single AM thread is what iterated through the collection of containers, one at a time, populating them with metadata and the resource map in buildContainerLaunchContext. So there was no concurrency issue. BUT, I think now that you are running on a larger cluster and asking for more containers, they are being returned in smaller groups. Perhaps you ask for 500 workers and instead you get two asynchronous callbacks from the RM, one with 200 and one with 300 containers, and both of _those_ asyncronous calls returning the groups of containers are now racing into buildContainerLaunchContext (etc) and this is where the concurrency issue arises? YARN certainly does not guarantee you can get back all the containers you ask for at once, although in my tests I didn't see any behavior but this. If at your scale you are seeing this problem, we need to address it. Good catch! If this is what is happening (you have logged one AM ask for X containers resulting in more than one asynchronous callback returning A, B, and C # of containers, where A+B+C = X) then we need to fix this. But, I do think we should not risk going with a partial solution. If what you're describing and what I am describing above match up, we really should just eliminate this risk now by protecting buildContainerContext, or concurrent attempts to populate the launch container contexts with id's etc could overwrite each other, and containers could be lost on the AM side this way. It doesn't mean we have to just slap a "synchronized" block on to buildContainerLaunchContext, maybe something more subtle could work. But we probably should address the problem so that all the risk is gone. What do you think? Maybe try another patch that addresses all possible race risks here? Also, if the race you're seeing is not as I have described here, please let me know what the real concern is, maybe I missed your point. Thanks, great work! > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-737) Giraph Application Master: move to new and stable YARN API
[ https://issues.apache.org/jira/browse/GIRAPH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755664#comment-13755664 ] Eli Reisman commented on GIRAPH-737: Originally, the concept was to be compatible with as much of the 2.0.x-alpha Hadoop line as possible. To this end, I attempted to use as much of the "old" YARN API as I could get away with, figuring we could update later and end back compatibility if we ever wanted to. Now that the 2.1-beta line is out, I think it makes a lot of sense to reevaluate this and move forward, refactoring to the newer YARN API and perhaps even abandon 2.0.x Hadoop in favor of the 2.1 beta line. Need to see some code (of course) but I'm fully +1 on the idea. Anyone else want to chime in here? > Giraph Application Master: move to new and stable YARN API > -- > > Key: GIRAPH-737 > URL: https://issues.apache.org/jira/browse/GIRAPH-737 > Project: Giraph > Issue Type: New Feature > Components: mapreduce >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > > Giraph was the early adopter of Hadoop YARN AM! Eli successfully wrote a > Giraph AM based on Hadoop 2.0.x_alpha. However, in last few months, Yarn > significantly *overhauled* its APIs and associated coding patterns. The new > beta version is 2.1.x and I was told by Yarn-dev that current APIs will not > change much. > In the above circumstances, we need to substantially overhaul Giraph AM as > well to accommodate with the new Yarn API. Moreover, in newer YARN API, > supporting kerberos security in AM becomes easier and more transparent. > Potential impact: > The upcoming Girpah AM will not work with earlier alpha Hadoop versions such > as 2.0.3. I'm not sure if anyone is using Giraph AM in production. However, > the more prevalent way of Giraph processing (MR-based) should continue to > work. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749300#comment-13749300 ] Eli Reisman commented on GIRAPH-730: Let me put it another way: I thought that the single thread of the AM is the only one that calls getTaskResourceMap or buildContainerLaunchContext for that matter, what are these other threads? If getTaskResourceMap is not thread safe, how are the other values set in buildContainerLanchContext not subject to the same race condition? should we be synchronizing there? > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749294#comment-13749294 ] Eli Reisman commented on GIRAPH-730: Hey Chuan than makes a lot of sense, thank you. I think the part I was confused about is that I don't think getTaskResourceMap is called asynchronously, I thought the first call (the one that actually initializes the map) was returned and completed before the first thread is launched, and the remaining calls would be reading an immutable object (not by declaration but by convention only) so would be essentially thread safe. If you guys are sure I'm wrong thats good enough for me, we can commit this. I have not been keeping up with the mailing list as I should -- have you noticed anyone else reproducing this problem? Did this patch solve the problem for you? If so, let me know and we should move forward with it. Good catch! > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-730) GiraphApplicationMaster race condition in resource loading
[ https://issues.apache.org/jira/browse/GIRAPH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749010#comment-13749010 ] Eli Reisman commented on GIRAPH-730: Hi Chaun, I'm having a bit of trouble finding the concurrency issue here. The LOCAL_RESOURCES is a common resource map that is initialized once per AM instance (which is hopefully a singleton at this point!) and reused, unchanged, for each task launched. It is this object that is returned from getTaskResourceMap to buildContainerLaunchContext, which returns the launch context, populating the ContainerLaunchContext before any threads are run or submitted. from then on, the method is just returning a reference to the same map each call. If there is a concurrency issue, it might be more likely attributed to buildLaunchContainerContext. But I'm not really seeing one. If you are certain this is a concurrency issue and the syncronization fix is the only thing verified to work, I'd try this: I think the null check at the top of getTaskResourceMap is atomic by nature, you could just add a syncronization block around the map construction portion. I think returning the unchanging (and essentially immutable) map singleton in the loop of containers after that will be thread safe. > GiraphApplicationMaster race condition in resource loading > -- > > Key: GIRAPH-730 > URL: https://issues.apache.org/jira/browse/GIRAPH-730 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Giraph with Yarn >Reporter: Chuan Lei >Assignee: Chuan Lei > Attachments: GIRAPH-730.v1.patch > > > In GiraphApplicationMaster.java, getTaskResourceMap function is not > multi-thread safe, which causes the application master fail to distribute the > resources (jar, configuration file, etc.) to each container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-717) HiveJythonRunner with support for pure Jython value types.
[ https://issues.apache.org/jira/browse/GIRAPH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721195#comment-13721195 ] Eli Reisman commented on GIRAPH-717: +1 > HiveJythonRunner with support for pure Jython value types. > -- > > Key: GIRAPH-717 > URL: https://issues.apache.org/jira/browse/GIRAPH-717 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe >Assignee: Nitay Joffe > > This adds support for pure Jython jobs. Currently this runner is hooked up to > work with Hive. I'll make it more generic later. > Running a Jython job is simply: > HIVE_HOME= > HADOOP_HOME= > $HIVE_HOME/bin/hive --service jar > org.apache.giraph.hive.jython.HiveJythonRunner jython1.py [jython2.py] ... > You can pass in any number of scripts. They will be parsed in order and sent > to all the workers using DistributedCache. > There are examples and testsĀ in the diff. Here is one example: > launcher: https://gist.github.com/nitay/a62e0a5d369a5e701fa3 > worker: https://gist.github.com/nitay/7834fd2b059527e65a36 > There are a few pieces to a Jython job, I'll go over each part here. > The HiveJythonRunner will call a function called "prepare(job)" from the > Jython scripts. This is the entry point for configuring your job. > In this configuration you setup everything, such as your graph types (those > IVEMM writables) and sets up the Hive vertex/edge inputs and output. Each > graph type is one of the following: > 1) A Java type. For example the user can specify simply IntWritable > 2) A Jython type that implements Writable. In the example above the message > value implements Writable. > 3) A pure Jython type. The Java code will wrap these objects in a Writable > wrapper that serializes Jython values using Pickle (jython IO framework). > Your computation must implement JythonComputation. Note that this does not > actually implement Computation, but rather is a separate class so that we can > wrap all the types passed in with a wrapper that implements Writable. The > methods are named the same so that the user does not notice anything. > For Hive usage - if your value type is a primitive e.g. IntWritable or > LongWritable, then you need not do anything. The Java code will automatically > read/write the Hive table specified and convert between Hive types and the > primitive Writable. The vertex_id type in the example works like this. > IfĀ your value is a custom Jython type, you must create classes which > implement JythonHiveReader/JythonHiveWriter (or JythonHiveIO which is both). > These objects read/write Jython types from Hive. There are wrappers in the > Java code which take HiveIO data normally used in giraph-hive and turns them > into Jython types. This means, for example, that getMap() will return a > Jython dictionary instead of a Java Map. > There is also a PageRankBenchmark (from previous diff) implemented in Jython. > Here's a run for comparison / sanity check: > PageRankBenchmark with 10 workers, 100M vertices, 10B edges, 10 compute > threads > trunk: > https://gist.github.com/nitay/3170fa3b575d4d2e22a9 > total time: 302466 > with this diff: > https://gist.github.com/nitay/a52b6d1d64e50ab9829e > total time: 306517 > in jython: > https://gist.github.com/nitay/3f2e758b2933c3521727 > total time: 434730 > So we see that existing things are not affected (is there something else I > should test?) and that Jython has around 40% overhead. > ReviewBoard: https://reviews.apache.org/r/12543/ (Sorry it's a big one, hard > to split up :/) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-719) Typo fixes for strings in GiraphYarnClient.java
[ https://issues.apache.org/jira/browse/GIRAPH-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721193#comment-13721193 ] Eli Reisman commented on GIRAPH-719: +1 on this, thank you! > Typo fixes for strings in GiraphYarnClient.java > --- > > Key: GIRAPH-719 > URL: https://issues.apache.org/jira/browse/GIRAPH-719 > Project: Giraph > Issue Type: Bug >Reporter: Nicholas Karkoulias >Priority: Trivial > Attachments: GIRAPH-719-1.patch > > > Two trivial fixes in Strings with user messages (file GiraphYarnClient.java). > I'm attaching a patch that can be applied to the current trunk (commit > 4caffaf2b0). > First-time JIRA user, so tell me if I'm doing anything wrong. :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-706) Hybrid management of configuration options
[ https://issues.apache.org/jira/browse/GIRAPH-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700269#comment-13700269 ] Eli Reisman commented on GIRAPH-706: This is a great idea, and would make a great newbie JIRA for someone who wants to get involved with Giraph. > Hybrid management of configuration options > -- > > Key: GIRAPH-706 > URL: https://issues.apache.org/jira/browse/GIRAPH-706 > Project: Giraph > Issue Type: Improvement >Reporter: Armando Miraglia > > While checking the source code (specially under the formats package in > giraph-core) I realized that many configuration options are managed using > hadoop Configuration instead of the more appropriate *ConfOption classes. > This causes the unavailability of such configuration in the documentation as > well as an hybrid management of the configurations in the source code. > I think that the project should be reviewed to make all the configuration use > the common *ConfOption API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-707) Giraph could probably support Hadoop 2.0.x-alpha line using a single build profile
Eli Reisman created GIRAPH-707: -- Summary: Giraph could probably support Hadoop 2.0.x-alpha line using a single build profile Key: GIRAPH-707 URL: https://issues.apache.org/jira/browse/GIRAPH-707 Project: Giraph Issue Type: Improvement Reporter: Eli Reisman Priority: Trivial The title says it all. Other that switching the "hadoop.version" Maven property setting, these basically all do the same stuff from the build perspective and are starting to cluster up our POM.xml On the other hand, this adds verbosity and another layer of complexity to our build command line. Instead of: {code}mvn -Phadoop_2.0.3 clean install{code} we would have: {code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_2_alpha clean install{code} as the user would still need to pick out a Hadoop-2.0.x to build against. Alternately, we could just make the decision "its an alpha release" and always point -Phadoop_2_alpha to the newest release. This could cause some confusion among users during a new Hadoop-2.0.x release, but then all Hadoop-2.x builds would look like: {code}mvn -Phadoop_2_alpha clean install{code} If anyone cares, please post your opinions or a patch according to your particular inclination. This will be an easy fix, whatever we decide. Or we can do nothing. Thats fine too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700249#comment-13700249 ] Eli Reisman commented on GIRAPH-687: Ping. Can I get someone to take a peek at this, folks are asking to build against 2.0.5-alpha :) > Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha > --- > > Key: GIRAPH-687 > URL: https://issues.apache.org/jira/browse/GIRAPH-687 > Project: Giraph > Issue Type: New Feature > Components: build >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-687-1.patch > > > Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes: > mvn -Phadoop_2.0.4 clean verify > mvn -Phadoop_2.0.5 clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-704) Specialized message stores
[ https://issues.apache.org/jira/browse/GIRAPH-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700245#comment-13700245 ] Eli Reisman commented on GIRAPH-704: This is great, +1 on the patch, its big but the changes go together and this has been well tested. Great work! > Specialized message stores > -- > > Key: GIRAPH-704 > URL: https://issues.apache.org/jira/browse/GIRAPH-704 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-704.patch > > > I was investigating with where the time/CPU is going in some applications, > and receiving messages on server side turned out to be one of the most > expensive things we do. We should provide better implementations using > primitive maps whenever that's possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694094#comment-13694094 ] Eli Reisman commented on GIRAPH-687: Tried this again today, it works for -Phadoop_yarn builds as well as standard -Phadoop_2.0.x builds for use w/MRv2 interface. > Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha > --- > > Key: GIRAPH-687 > URL: https://issues.apache.org/jira/browse/GIRAPH-687 > Project: Giraph > Issue Type: New Feature > Components: build >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-687-1.patch > > > Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes: > mvn -Phadoop_2.0.4 clean verify > mvn -Phadoop_2.0.5 clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line
[ https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-688: --- Description: This makes the hadoop-yarn branch build again against all compatible Hadoop versions, warns (in a crude but accurate way) what to do if user did not set hadoop.version at the mvn command line...and passes mvn clean verify etc. I have removed a hardcoded version setting and replaced it with the destined-to-fail warning to allow/force folks to stay on top of which version they will build against (the 2.x Hadoop line is growing quickly!) The correct way (thanks Eugene!) to build our YARN branch against any compatible Hadoop, as of now, is this: mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our POM.XML files to see the various profiles we support for newer Hadoops, and select the hadoop.version you see in your favorite to build, as shown above. Thats it. Enjoy. was: This makes the hadoop-yarn branch build again against all compatible Hadoop versions, warns (in a crude but accurate way) what to do if user did not set hadoop.version at the mvn command line...and passes mvn clean verify etc. I have removed a hardcoded version setting and replaced it with the destined-to-fail warning to allow/force folks to stay on top of which version they will build against (the 2.x Hadoop line is growing quickly!) The correct way (thanks Eugene!) to build our YARN branch against any compatible Hadoop, as of now, is this: mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our POM.XML files to see the various profiles we support for newer Hadoops, and select the hadoop.version you see in your favorite to build, as shown above. Thats it. Enjoy. > Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, > warns if none set, works w/new 1.1.0 line > -- > > Key: GIRAPH-688 > URL: https://issues.apache.org/jira/browse/GIRAPH-688 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-688-1.patch > > > This makes the hadoop-yarn branch build again against all compatible Hadoop > versions, warns (in a crude but accurate way) what to do if user did not set > hadoop.version at the mvn command line...and passes mvn clean verify etc. > I have removed a hardcoded version setting and replaced it with the > destined-to-fail warning to allow/force folks to stay on top of which version > they will build against (the 2.x Hadoop line is growing quickly!) > The correct way (thanks Eugene!) to build our YARN branch against any > compatible Hadoop, as of now, is this: > mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install > Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our > POM.XML files to see the various profiles we support for newer Hadoops, and > select the hadoop.version you see in your favorite to build, as shown above. > Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line
[ https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-688: --- Description: This makes the hadoop-yarn branch build again against all compatible Hadoop versions, warns (in a crude but accurate way) what to do if user did not set hadoop.version at the mvn command line...and passes mvn clean verify etc. I have removed a hardcoded version setting and replaced it with the destined-to-fail warning to allow/force folks to stay on top of which version they will build against (the 2.x Hadoop line is growing quickly!) The correct way (thanks Eugene!) to build our YARN branch against any compatible Hadoop, as of now, is this: {code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install{code} Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our POM.XML files to see the various profiles we support for newer Hadoops, and select the hadoop.version you see in your favorite to build, as shown above. Thats it. Enjoy. was: This makes the hadoop-yarn branch build again against all compatible Hadoop versions, warns (in a crude but accurate way) what to do if user did not set hadoop.version at the mvn command line...and passes mvn clean verify etc. I have removed a hardcoded version setting and replaced it with the destined-to-fail warning to allow/force folks to stay on top of which version they will build against (the 2.x Hadoop line is growing quickly!) The correct way (thanks Eugene!) to build our YARN branch against any compatible Hadoop, as of now, is this: mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our POM.XML files to see the various profiles we support for newer Hadoops, and select the hadoop.version you see in your favorite to build, as shown above. Thats it. Enjoy. > Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, > warns if none set, works w/new 1.1.0 line > -- > > Key: GIRAPH-688 > URL: https://issues.apache.org/jira/browse/GIRAPH-688 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-688-1.patch > > > This makes the hadoop-yarn branch build again against all compatible Hadoop > versions, warns (in a crude but accurate way) what to do if user did not set > hadoop.version at the mvn command line...and passes mvn clean verify etc. > I have removed a hardcoded version setting and replaced it with the > destined-to-fail warning to allow/force folks to stay on top of which version > they will build against (the 2.x Hadoop line is growing quickly!) > The correct way (thanks Eugene!) to build our YARN branch against any > compatible Hadoop, as of now, is this: > {code}mvn -Dhadoop.version=2.0.3-alpha -Phadoop_yarn clean install{code} > Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our > POM.XML files to see the various profiles we support for newer Hadoops, and > select the hadoop.version you see in your favorite to build, as shown above. > Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config
[ https://issues.apache.org/jira/browse/GIRAPH-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman resolved GIRAPH-631. Resolution: Fixed Assignee: Eli Reisman I think we're good here at this point. Several patches are up for review that make 2.0.4 and 2.0.5 as well as variable YARN versions + YARN Giraph profile work. I'm resolving this one. > Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with > a more flexible Maven config > - > > Key: GIRAPH-631 > URL: https://issues.apache.org/jira/browse/GIRAPH-631 > Project: Giraph > Issue Type: Improvement > Components: conf and scripts >Affects Versions: 1.0.0, 1.1.0 >Reporter: Eli Reisman >Assignee: Eli Reisman > Fix For: 1.0.0, 1.1.0 > > > Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of > Hadoop. This is because of two problems: > 1. Simply creating profiles that can "coexist" such as Hadoop's own > -Pdist,native type mvn calls is not possible for us since we use munging and > excludes in Maven to prevent compilation of the YARN code where the deps are > not included (many profiles) and these excludes don't seem overridable. This > has been documented online as a Maven "feature" already. > 2. Simply resetting hadoop.version for the Maven build using a -D option > should work and should probably be the right fix for us but in the brief time > I played with it (and with our versioning story that affects backporting not > decided yet) I did not get it to work myself for Giraph-13 (this is all > documented there) > Option 2 will look like: > {code} > mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-693) Giraph-Hive check user code as soon as possible
[ https://issues.apache.org/jira/browse/GIRAPH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693574#comment-13693574 ] Eli Reisman commented on GIRAPH-693: I am not well versed in Hive I/O but this is very straightforward and a good idea so after reading the patch carefully I'm going to say +1 on this. Thanks Nitay! I assume it builds etc? > Giraph-Hive check user code as soon as possible > --- > > Key: GIRAPH-693 > URL: https://issues.apache.org/jira/browse/GIRAPH-693 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe >Assignee: Nitay Joffe > > We have a lot of cases of users running long jobs and then failing at the > Hive output step because of some misconfigured schema or type mismatch. We'd > like to move these errors as soon as possible. > To make this happen I am adding checkput methods to the > HiveTo and VertexToHive API and letting the user do their > checks. Look at the diff for examples and tests. > https://reviews.apache.org/r/12080/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line
[ https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693569#comment-13693569 ] Eli Reisman commented on GIRAPH-688: Hey all, Sorry I haven't checked in lately, I'll peek in on this tomorrow too. Could I grab a quick review from someone, its a very small patch. I only care because I'm presenting about Giraph + YARN at the end of the week :) Thanks, I promise I'll sit down and review some patches too! > Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, > warns if none set, works w/new 1.1.0 line > -- > > Key: GIRAPH-688 > URL: https://issues.apache.org/jira/browse/GIRAPH-688 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-688-1.patch > > > This makes the hadoop-yarn branch build again against all compatible Hadoop > versions, warns (in a crude but accurate way) what to do if user did not set > hadoop.version at the mvn command line...and passes mvn clean verify etc. > I have removed a hardcoded version setting and replaced it with the > destined-to-fail warning to allow/force folks to stay on top of which version > they will build against (the 2.x Hadoop line is growing quickly!) > The correct way (thanks Eugene!) to build our YARN branch against any > compatible Hadoop, as of now, is this: > mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install > Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our > POM.XML files to see the various profiles we support for newer Hadoops, and > select the hadoop.version you see in your favorite to build, as shown above. > Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-683) Jython for Computation
[ https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682420#comment-13682420 ] Eli Reisman commented on GIRAPH-683: This is fantastic work, excellent! > Jython for Computation > -- > > Key: GIRAPH-683 > URL: https://issues.apache.org/jira/browse/GIRAPH-683 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe >Assignee: Nitay Joffe > > Support for writing Computation code in Python. We add Jython bindings so > that the Python computation code can communicate back with the Java Giraph > classes. > To make this work I had to change a few parts of Giraph: > 1) The Jython computation is not known until we read the script and create a > Computation object for it at runtime. This has to be done on each worker > separately after the job has launched. Because of this, there is no > Computation class set at the beginning. I suspect other scripting languages > will have similar issue. To fix this I created a ComputationFactory interface > which is responsible for creating the Computation, with a default that just > grabs the class from the Configuration and creates it. > 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a > lot of repetitive code around these things so centralizing it all in one > place made things a lot cleaner. > 3) I added some more helpers like isDefaultValue() to our conf options. > To use Jython all the user has to do is call Jython#init(...) somewhere in > his initialization. > This patch contains our page rank benchmark implementation in Jython. I added > an option (--jython) which chooses whether to run the default or the jython > version. > Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 > edges per vertex): > Java: > Total (milliseconds) 104,388 0 104,388 > Superstep 3 (milliseconds)16,750 0 16,750 > Setup (milliseconds) 2,895 0 2,895 > Shutdown (milliseconds) 50 0 50 > Superstep 0 (milliseconds)15,838 0 15,838 > Superstep 4 (milliseconds)19,088 0 19,088 > Input superstep (milliseconds)8,700 0 8,700 > Superstep 5 (milliseconds)3,550 0 3,550 > Superstep 2 (milliseconds)17,905 0 17,905 > Superstep 1 (milliseconds)19,608 0 19,608 > Jython: > Total (milliseconds) 244,965 0 244,965 > Superstep 3 (milliseconds)43,405 0 43,405 > Setup (milliseconds) 3,735 0 3,735 > Shutdown (milliseconds) 117 0 117 > Superstep 0 (milliseconds)36,962 0 36,962 > Superstep 4 (milliseconds)46,088 0 46,088 > Input superstep (milliseconds)8,551 0 8,551 > Superstep 5 (milliseconds)22,040 0 22,040 > Superstep 2 (milliseconds)42,329 0 42,329 > Superstep 1 (milliseconds)41,737 0 41,737 > Overhead of Jython vs Java = 2.5x. > However at scale things get better (200 workers, 1B vertices, 200 edges per > vertex): > Java: > Total (milliseconds) 1,702,429 0 1,702,429 > Superstep 3 (milliseconds)316,844 0 316,844 > Setup (milliseconds) 13,226 0 13,226 > Shutdown (milliseconds) 113 0 113 > Superstep 0 (milliseconds)300,950 0 300,950 > Superstep 4 (milliseconds)318,627 0 318,627 > Input superstep (milliseconds)114,673 0 114,673 > Superstep 5 (milliseconds)7,898 0 7,898 > Superstep 2 (milliseconds)312,152 0 312,152 > Superstep 1 (milliseconds)317,942 0 317,942 > Jython: > Total (milliseconds) 2,123,228 0 2,123,228 > Superstep 3 (milliseconds)406,422 0 406,422 > Setup (milliseconds) 7,159 0 7,159 > Shutdown (milliseconds) 131 0 131 > Superstep 0 (milliseconds)347,732 0 347,732 > Superstep 4 (milliseconds)405,696 0 405,696 > Input superstep (milliseconds)112,645 0 112,645 > Superstep 5 (milliseconds)46,687 0 46,687 > Superstep 2 (milliseconds)410,349 0 410,349 > Superstep 1 (milliseconds)386,404 0 386,404 > That's a mere 25% overhead. > Take a look at the reviewboard for latest patch: > https://reviews.apache.org/r/11709/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-688) Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line
[ https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-688: --- Summary: Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, warns if none set, works w/new 1.1.0 line (was: Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line) > Make sure Giraph builds against all compatible YARN-enabled Hadoop versions, > warns if none set, works w/new 1.1.0 line > -- > > Key: GIRAPH-688 > URL: https://issues.apache.org/jira/browse/GIRAPH-688 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-688-1.patch > > > This makes the hadoop-yarn branch build again against all compatible Hadoop > versions, warns (in a crude but accurate way) what to do if user did not set > hadoop.version at the mvn command line...and passes mvn clean verify etc. > I have removed a hardcoded version setting and replaced it with the > destined-to-fail warning to allow/force folks to stay on top of which version > they will build against (the 2.x Hadoop line is growing quickly!) > The correct way (thanks Eugene!) to build our YARN branch against any > compatible Hadoop, as of now, is this: > mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install > Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our > POM.XML files to see the various profiles we support for newer Hadoops, and > select the hadoop.version you see in your favorite to build, as shown above. > Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-688) Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line
[ https://issues.apache.org/jira/browse/GIRAPH-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-688: --- Attachment: GIRAPH-688-1.patch Here it is, sorry it took too long. Job/book/life...grr you get the idea :) > Make sure YARN builds against all compatible Giraph versions, warns if none > set, works w/new 1.1.0 line > --- > > Key: GIRAPH-688 > URL: https://issues.apache.org/jira/browse/GIRAPH-688 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-688-1.patch > > > This makes the hadoop-yarn branch build again against all compatible Hadoop > versions, warns (in a crude but accurate way) what to do if user did not set > hadoop.version at the mvn command line...and passes mvn clean verify etc. > I have removed a hardcoded version setting and replaced it with the > destined-to-fail warning to allow/force folks to stay on top of which version > they will build against (the 2.x Hadoop line is growing quickly!) > The correct way (thanks Eugene!) to build our YARN branch against any > compatible Hadoop, as of now, is this: > mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install > Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our > POM.XML files to see the various profiles we support for newer Hadoops, and > select the hadoop.version you see in your favorite to build, as shown above. > Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-688) Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line
Eli Reisman created GIRAPH-688: -- Summary: Make sure YARN builds against all compatible Giraph versions, warns if none set, works w/new 1.1.0 line Key: GIRAPH-688 URL: https://issues.apache.org/jira/browse/GIRAPH-688 Project: Giraph Issue Type: Bug Reporter: Eli Reisman Assignee: Eli Reisman Priority: Minor This makes the hadoop-yarn branch build again against all compatible Hadoop versions, warns (in a crude but accurate way) what to do if user did not set hadoop.version at the mvn command line...and passes mvn clean verify etc. I have removed a hardcoded version setting and replaced it with the destined-to-fail warning to allow/force folks to stay on top of which version they will build against (the 2.x Hadoop line is growing quickly!) The correct way (thanks Eugene!) to build our YARN branch against any compatible Hadoop, as of now, is this: mvn -Phadoop_yarn -Dhadoop.version=2.0.3-alpha clean install Where 2.0.3 can be any 2.0.x line, or Hadoop trunk if you like. Consult our POM.XML files to see the various profiles we support for newer Hadoops, and select the hadoop.version you see in your favorite to build, as shown above. Thats it. Enjoy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-687: --- Description: Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes: mvn -Phadoop_2.0.4 clean verify mvn -Phadoop_2.0.5 clean verify was: Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} clean verify > Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha > --- > > Key: GIRAPH-687 > URL: https://issues.apache.org/jira/browse/GIRAPH-687 > Project: Giraph > Issue Type: Bug > Components: build >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-687-1.patch > > > Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes: > mvn -Phadoop_2.0.4 clean verify > mvn -Phadoop_2.0.5 clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-687: --- Attachment: GIRAPH-687-1.patch Next up, take Eugene's advice for building YARN module and try to make it easier to select one of these...! > Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha > --- > > Key: GIRAPH-687 > URL: https://issues.apache.org/jira/browse/GIRAPH-687 > Project: Giraph > Issue Type: Bug > Components: build >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-687-1.patch > > > Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} > clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
[ https://issues.apache.org/jira/browse/GIRAPH-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-687: --- Issue Type: New Feature (was: Bug) > Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha > --- > > Key: GIRAPH-687 > URL: https://issues.apache.org/jira/browse/GIRAPH-687 > Project: Giraph > Issue Type: New Feature > Components: build >Reporter: Eli Reisman >Assignee: Eli Reisman >Priority: Minor > Attachments: GIRAPH-687-1.patch > > > Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes: > mvn -Phadoop_2.0.4 clean verify > mvn -Phadoop_2.0.5 clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-687) Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha
Eli Reisman created GIRAPH-687: -- Summary: Lets add support for Hadoop 2.0.4-alpha and 2.0.5-alpha Key: GIRAPH-687 URL: https://issues.apache.org/jira/browse/GIRAPH-687 Project: Giraph Issue Type: Bug Components: build Reporter: Eli Reisman Assignee: Eli Reisman Priority: Minor Just boilerplate to bring 2.0.4 and 2.0.5 in. Passes mvn -Phadoop_2.0.{4,5} clean verify -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-629) YARN profile is broken when compiled against hadoop-2.0.4
[ https://issues.apache.org/jira/browse/GIRAPH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681797#comment-13681797 ] Eli Reisman commented on GIRAPH-629: awesome! I was out of the loop for a while and hadn't seen this, thanks Eugene. I'm putting up a patch to add 2.0.4 and 2.0.4 support now. The "alpha" makes sense because thats how its stated in the POM for hadoop.version, its only hadoop_2.0.4 in our profile names. > YARN profile is broken when compiled against hadoop-2.0.4 > - > > Key: GIRAPH-629 > URL: https://issues.apache.org/jira/browse/GIRAPH-629 > Project: Giraph > Issue Type: Bug > Components: build >Affects Versions: 1.0.0 >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > > {noformat} > $ mvn -Phadoop_yarn -DskipTests -Dhadoop.version=2.0.4-SNAPSHOT clean package > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Giraph Parent .. SUCCESS [1.359s] > [INFO] Apache Giraph Core FAILURE [15.319s] > [INFO] Apache Giraph Hive I/O SKIPPED > [INFO] Apache Giraph Examples SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 17.374s > [INFO] Finished at: Fri Apr 12 17:21:11 PDT 2013 > [INFO] Final Memory: 39M/481M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) > on project giraph-core: Compilation failure: Compilation failure: > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[46,42] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: package org.apache.hadoop.yarn.api.records > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[206,42] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[291,47] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[368,11] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[398,35] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[178,7] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[255,26] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[296,37] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[327,49] > cannot find symbol > [ERROR] symbol : method getState() > [ERROR] location: interface org.apache.hadoop.yarn.api.records.Container > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[349,46] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[353,42] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[379,7] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.a
[jira] [Commented] (GIRAPH-624) ByteArrayPartition reports 0 aggregate edges when used with DiskBackedPartitionStore
[ https://issues.apache.org/jira/browse/GIRAPH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637268#comment-13637268 ] Eli Reisman commented on GIRAPH-624: Sorry i didn't get to thsi sooner, I am not going to have a lot of time to review patches right now but will try when I can. Sebastian is right, you pretty much want to close any IO goodies in a finally block whenever you can, IOE handled or not. > ByteArrayPartition reports 0 aggregate edges when used with > DiskBackedPartitionStore > - > > Key: GIRAPH-624 > URL: https://issues.apache.org/jira/browse/GIRAPH-624 > Project: Giraph > Issue Type: Bug >Reporter: Claudio Martella >Assignee: Claudio Martella > Attachments: GIRAPH-624.diff, GIRAPH-624.diff, GIRAPH-624.diff > > > ByteArrayPartition reports the correct number of edges when run in-memory or > with checkpointing, but reports 0 edges when used OOC. OOC runs fine with > SimplePartition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-596) Single uber-jar
[ https://issues.apache.org/jira/browse/GIRAPH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631365#comment-13631365 ] Eli Reisman commented on GIRAPH-596: Great, great idea! I also noticed during GIRAPH-13 building on newer Hadoops that not all these subprojects were getting built under every profile. If that is close enough to fall under this JIRA too, then hey, bonus. > Single uber-jar > --- > > Key: GIRAPH-596 > URL: https://issues.apache.org/jira/browse/GIRAPH-596 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe > > Right now we build a fatjarĀ (with all the deps) for giraph-hbase, > giraph-hive, giraph-accumulo, and so on. > We should just build one single uber-jar at top level that contains > everything. > This should not affect the regular per-module jars built for each module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-632) YARN dependencies add a fair amount of size to the fat jar and may be subject to simplification
Eli Reisman created GIRAPH-632: -- Summary: YARN dependencies add a fair amount of size to the fat jar and may be subject to simplification Key: GIRAPH-632 URL: https://issues.apache.org/jira/browse/GIRAPH-632 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 1.0, 1.1 Reporter: Eli Reisman Priority: Minor Fix For: 1.0, 1.1 The hadoop_yarn profile requires some new package dependencies that the rest of our build does not. They add size to the build projects, and due to the fact that our YARN implementation "rides the fence" between the old API (for our ApplicationMaster) and the new API (for our Client) were "what worked for me at the time" on the Maven repos there are a number of "api" versions of some of these libs that are lighter weight. Someone (maybe me someday but not yet, sorry) could experiment with just replacing some of our current yarn reps in the Maven profile that "cast a very wide net" with a bunch of smaller, lighter weight packages that cover the same API we need. See also Maven repos and the YARN dependencies listed there. My very brief experiments with this didn't yield anything, but again I ran out of time and barely played with this before having other responsibilities take priority. Seems like this would not be hard to tune up and could be good for our build. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-592) YourKit profiling API for easy profiling of giraph
[ https://issues.apache.org/jira/browse/GIRAPH-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631363#comment-13631363 ] Eli Reisman commented on GIRAPH-592: Hey man where's the patch? +1 > YourKit profiling API for easy profiling of giraph > -- > > Key: GIRAPH-592 > URL: https://issues.apache.org/jira/browse/GIRAPH-592 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe >Assignee: Nitay Joffe > > Adds YourKit API with helpers to Giraph, to make it easy to profile with > YourKit. No more having to attach to processes and have the user time things > by hand. This allows us to profile specific parts of the code very easily. > As an example this diff adds profiling to edge input loading. > To use YourKit with Hadoop jobs you need to set parameters as follows: > {code} > -Dmapred.task.profile=true \ > -Dmapred.task.profile.maps=0-${numWorkers} \ > -Dmapred.task.profile.params=-agentpath: > {code} > Note if the YourKit agent is not passed in (not profiling), the calls I've > added here have negligible effect. > https://reviews.apache.org/r/10147/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config
[ https://issues.apache.org/jira/browse/GIRAPH-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631362#comment-13631362 ] Eli Reisman commented on GIRAPH-631: I'm not going to have time to attack this right now. Here's what I know: In the Maven profiles for various Hadoops, we hardcode a "hadoop.version" Maven property for each. So for instance "hadoop_2.0.3" profile hardcodes hadoop.version to be "2.0.3-alpha" so that the Maven repo deps can be properly downloaded and resolved. Of course this can all be overriden at the command line with -D options. But something about the way our profiles interact (and/or the way the subprojects like giraph-examples and the IO subprojects for Hive etc.) breaks still and I didn't have time to investigate why. If you decide to dive in and try this, I'm happy to help, ping me and I will attempt to guide you or advise. But in the end there will be some changes to our Maven set up. There is always the chance that there is a more radical approach such as making giraph-yarn a subproject but for various reasons I rejected this as a less natural fit, especially when I saw that we could stitch the YARN code into our real Giraph code with so little munging, and that this munging was required anyway to get the profile to integrate with our MapReduce-based profiles at all. If you want to pursue a more involved solution like this, I'm not against it and am also happy to help where I can. Finally, if no one cares about this and you leave it to collect dust, I'll come back at some point and do it myself ;) > Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with > a more flexible Maven config > - > > Key: GIRAPH-631 > URL: https://issues.apache.org/jira/browse/GIRAPH-631 > Project: Giraph > Issue Type: Improvement > Components: conf and scripts >Affects Versions: 1.0, 1.1 >Reporter: Eli Reisman > Fix For: 1.0, 1.1 > > > Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of > Hadoop. This is because of two problems: > 1. Simply creating profiles that can "coexist" such as Hadoop's own > -Pdist,native type mvn calls is not possible for us since we use munging and > excludes in Maven to prevent compilation of the YARN code where the deps are > not included (many profiles) and these excludes don't seem overridable. This > has been documented online as a Maven "feature" already. > 2. Simply resetting hadoop.version for the Maven build using a -D option > should work and should probably be the right fix for us but in the brief time > I played with it (and with our versioning story that affects backporting not > decided yet) I did not get it to work myself for Giraph-13 (this is all > documented there) > Option 2 will look like: > {code} > mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-631) Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config
Eli Reisman created GIRAPH-631: -- Summary: Remove Hardcoded Dependency on Hadoop-2.0.3-alpha from YARN and replace with a more flexible Maven config Key: GIRAPH-631 URL: https://issues.apache.org/jira/browse/GIRAPH-631 Project: Giraph Issue Type: Improvement Components: conf and scripts Affects Versions: 1.0, 1.1 Reporter: Eli Reisman Fix For: 1.0, 1.1 Currently, Giraph's YARN profile is hardcoded to Version 2.0.3-alpha of Hadoop. This is because of two problems: 1. Simply creating profiles that can "coexist" such as Hadoop's own -Pdist,native type mvn calls is not possible for us since we use munging and excludes in Maven to prevent compilation of the YARN code where the deps are not included (many profiles) and these excludes don't seem overridable. This has been documented online as a Maven "feature" already. 2. Simply resetting hadoop.version for the Maven build using a -D option should work and should probably be the right fix for us but in the brief time I played with it (and with our versioning story that affects backporting not decided yet) I did not get it to work myself for Giraph-13 (this is all documented there) Option 2 will look like: {code} mvn -Phadoop_yarn -Dhadoop.version=YOUR_FAVORITE_YARNY_HADOOP clean install {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-629) YARN profile is broken when compiled against hadoop-2.0.4
[ https://issues.apache.org/jira/browse/GIRAPH-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman resolved GIRAPH-629. Resolution: Won't Fix As is stated in the final Giraph-13 stuff, The hadoop version is currently hardcoded to 2.0.3 and while we can upgrade this by hand in the POMs there is no setup to move it from code yet. I would suggest a new JIRA to reconfigure the POM to accept this. My "dream" was to have the build do something on the order of: {code} mvn -Phadoop_yarn,HADOOPVERSION clean install {code} where "HADOOPVERSION" would be the name of one of our other profiles, so that you could pick at the command line and have it blow up if the versioning made no sense (as 0.20.x would etc) but due to strange behavior in the filtering (I can't try to compile the YARN code against hadoop versions that do not supply it) this was not possible. so option two (which might be possible) is something like this: {code} mvn -Phadoop_yarn -Dhadoop.version=HADOOPVERSION clean install {code} however early attempts to do this indicate that the POM hadoop-version is being overridden at times in our build by sub-project POM's or not propagating correctly to allow it. I ran out of time at HW to handle this, but if I didn't put up a JIRA for it already (I think I did) then we should have one, this seems like it could be done and would work. Anyway, because this behavior (only allowing 2.0.3) is "normal" for now, I'm resolving this particular JIRA as "won't fix" > YARN profile is broken when compiled against hadoop-2.0.4 > - > > Key: GIRAPH-629 > URL: https://issues.apache.org/jira/browse/GIRAPH-629 > Project: Giraph > Issue Type: Bug > Components: build >Affects Versions: 1.0 >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > > {noformat} > $ mvn -Phadoop_yarn -DskipTests -Dhadoop.version=2.0.4-SNAPSHOT clean package > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Giraph Parent .. SUCCESS [1.359s] > [INFO] Apache Giraph Core FAILURE [15.319s] > [INFO] Apache Giraph Hive I/O SKIPPED > [INFO] Apache Giraph Examples SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 17.374s > [INFO] Finished at: Fri Apr 12 17:21:11 PDT 2013 > [INFO] Final Memory: 39M/481M > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) > on project giraph-core: Compilation failure: Compilation failure: > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[46,42] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: package org.apache.hadoop.yarn.api.records > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[206,42] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[291,47] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[368,11] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[398,35] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[178,7] > cannot find symbol > [ERROR] symbol : class AMResponse > [ERROR] location: class org.apache.giraph.yarn.GiraphApplicationMaster > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[255,26] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse > [ERROR] > /tmp/giraph/giraph-core/target/munged/main/org/apache/giraph/yarn/GiraphApplicationMaster.java:[296,37] > cannot find symbol > [ERROR] symbol : method getAMResponse() > [ERROR] location: interface > org.apache.hadoop.
[jira] [Created] (GIRAPH-608) Spelling error in Combiner.java
Eli Reisman created GIRAPH-608: -- Summary: Spelling error in Combiner.java Key: GIRAPH-608 URL: https://issues.apache.org/jira/browse/GIRAPH-608 Project: Giraph Issue Type: Bug Reporter: Eli Reisman Priority: Trivial In line 35, the variable name "originalMessage" is misspelled in one spot. Good newbie issue for figuring out how to contribute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624979#comment-13624979 ] Eli Reisman commented on GIRAPH-601: to clarify point one: YARN adds that "little extra" not you, so its sort of a grey area. Just keep in mind if your cluster offers 10 gigs of available resources, doing -w 8 to account for a gig for master and a gig for app master is not good enough. You need to leave some extra container resources "overhead" unused for YARN jobs because they will also suck up some extra each. clarify about yarn-site: there is more than one resource setting in yarn-site make sure they are all set the way you need or bad things like this happen with little error reporting. Hope its going well, good luck with this. > Exception when running pagerank benchmark with 6 or more workers on a > pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch, print_addresses.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark with 6 or more workers on a pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624978#comment-13624978 ] Eli Reisman commented on GIRAPH-601: there are several yarn-site resource settings. I had this type of problem when I didn't know it but asked for too much and the cluster isn't always good at telling you thats the deal. Two things: 1. each yarn task needs the heap you choose for it, plus "a little extra" for the container itself. So keep that in mind. 2. you always have to "pay" for your app master too, which is not part of the Giraph API so if you want "-w 5" you are getting: one app master with some amount of YARN resources, one master task, and 5 worker tasks (with the master taking a share of heap equal to what each of the 5 workers gets) point being its extremely easy to overpower a small cluster on local machine without knowing it ;) > Exception when running pagerank benchmark with 6 or more workers on a > pseudodistributed setup: SendVertexRequest cannot be cast to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch, print_addresses.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-527) readVertexInputSplit is always reporting 0 vertices and 0 edges
[ https://issues.apache.org/jira/browse/GIRAPH-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623389#comment-13623389 ] Eli Reisman commented on GIRAPH-527: Nice catch Maja! > readVertexInputSplit is always reporting 0 vertices and 0 edges > --- > > Key: GIRAPH-527 > URL: https://issues.apache.org/jira/browse/GIRAPH-527 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Claudio Martella >Assignee: Nitay Joffe > > readVertexInputSplit is reporting in the status always 0 vertices and 0 edges > loaded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-536) Clean up configuration options
[ https://issues.apache.org/jira/browse/GIRAPH-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623387#comment-13623387 ] Eli Reisman commented on GIRAPH-536: This has waited a long time. Great one! Absolutely needed for the release. Everyone gets confused about this stuff, and it leaves a frustrated impression before folks have the chance to really see what Giraph can do for them. These details matter! Thanks for getting to this!!! > Clean up configuration options > -- > > Key: GIRAPH-536 > URL: https://issues.apache.org/jira/browse/GIRAPH-536 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Alessandro Presta >Assignee: Alessandro Presta > Attachments: GIRAPH-536.patch, GIRAPH-536.patch, GIRAPH-536.patch > > > Option names are all over the place, and I think they should be rationalized > before we cut the 0.2 release. > Some examples: > 1) Options that don't start with "giraph.*", like "partition.*". > 2) Ambiguous naming: "giraph.numInputSplitsThreads" refers to worker input > threads, "giraph.inputSplitThreadCount" refers to threads used by the master > to write splits to ZooKeeper. > 3) Some options are defined in GiraphConstants, some other ones in the > classes that use them. We can find all of them by searching for "static final > String". > 4) "giraph.zKForceSync" and "giraph.ZkSkipAcl" use "yes"/"no" instead of > true/false, just because they are later used to write ZK configuration (which > requires "yes"/"no"). I think we should stick to true/false since these are > Giraph options regardless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-604) Clean up benchmarks
[ https://issues.apache.org/jira/browse/GIRAPH-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623380#comment-13623380 ] Eli Reisman commented on GIRAPH-604: Nice!!! +1 from me! > Clean up benchmarks > --- > > Key: GIRAPH-604 > URL: https://issues.apache.org/jira/browse/GIRAPH-604 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-604.patch > > > Benchmark classes have a lot of duplicate options and duplicate code which > handles CommandLine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles
[ https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman resolved GIRAPH-599. Resolution: Fixed Maja just committed this. Thanks Maja and Nitay! > Hive IO dependency issues with some Hadoop profiles > --- > > Key: GIRAPH-599 > URL: https://issues.apache.org/jira/browse/GIRAPH-599 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-599.patch > > > Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now > this happens: > {code} > [INFO] > > [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT > [INFO] > > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > (15 KB at 20.2 KB/sec) > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is > missing, no dependency information available > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > (201 KB at 194.6 KB/sec) > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Giraph Parent .. SUCCESS [0.717s] > [INFO] Apache Giraph Core SUCCESS [2:58.276s] > [INFO] Apache Giraph Hive I/O FAILURE [6.455s] > [INFO] Apache Giraph Examples SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 3:05.779s > [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013 > [INFO] Final Memory: 48M/352M > [INFO] > > [ERROR] Failed to execute goal on project giraph-hive: Could not resolve > dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: > Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in > central (http://repo1.maven.org/maven2) -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617939#comment-13617939 ] Eli Reisman commented on GIRAPH-601: Oh hey did you set your yarn-site and core-site stuff that is not well doc'd? Does wordcount or pi work on your YARN cluster? > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617940#comment-13617940 ] Eli Reisman commented on GIRAPH-13: --- I'll wait a few days for folks to point out problems (and maybe see what happens with GIRAPH-601) and then commit if no other review issues crop up. Thanks! > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, > GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617938#comment-13617938 ] Eli Reisman commented on GIRAPH-601: So masterCount is part of the problem, forcing us to have a "task 0" to be below the masterCount value of 1? Whats up with masterCount? Did you possible ask for more workers than your YARN cluster has resources for? Check out your YARN webui. Could be MRv2 is waiting until the cluster has enough mem to launch all of your PR tasks, and that moment never comes in time? Not sure how (or how well) MRv2 wraps these problems. Also, did you see in one of the earlier dumps that YarnClientImpl is hitting an IOE on security tokens? Is that normal? I did you had auth on SIMPLE so that should work as-is? > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617927#comment-13617927 ] Eli Reisman commented on GIRAPH-13: --- Thanks! I learned a ton doing it! I'll give this a day or two for folks to play with it if they want or ask for changes, I'll be checking review board for any such requests, and commit in a few days if not. I am hoping its clear (and the low-hanging fruit ripe for improvement well marked) so others can dive in and play with it and get comfortable extending it. There are a lot of fun new possibilities if we choose to flesh this out. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, > GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617911#comment-13617911 ] Eli Reisman commented on GIRAPH-601: Awesome, thanks Maja! I did not keep good notes during that part of the YARN patch and what I remember is that the problem requiring tasks to start from (or at least include? don't know) a "taskId 0" was from MapReduce and IO code. Don't know what the deal is. When we have this all straightened out I can update the YARN patch. The solution I used was "stable for now" but YARn is not guaranteeing into the future contiguous taskId's or that task 2 will always be our first non-app-master task issued, etc. so being able to just use the Id's YARN gives us without alteration will be a good idea. > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles
[ https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617885#comment-13617885 ] Eli Reisman commented on GIRAPH-599: @Nitay: this worked for me on trunk, thanks! +1 > Hive IO dependency issues with some Hadoop profiles > --- > > Key: GIRAPH-599 > URL: https://issues.apache.org/jira/browse/GIRAPH-599 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-599.patch > > > Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now > this happens: > {code} > [INFO] > > [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT > [INFO] > > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > (15 KB at 20.2 KB/sec) > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is > missing, no dependency information available > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > (201 KB at 194.6 KB/sec) > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Giraph Parent .. SUCCESS [0.717s] > [INFO] Apache Giraph Core SUCCESS [2:58.276s] > [INFO] Apache Giraph Hive I/O FAILURE [6.455s] > [INFO] Apache Giraph Examples SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 3:05.779s > [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013 > [INFO] Final Memory: 48M/352M > [INFO] > > [ERROR] Failed to execute goal on project giraph-hive: Could not resolve > dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: > Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in > central (http://repo1.maven.org/maven2) -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617871#comment-13617871 ] Eli Reisman commented on GIRAPH-601: Nice! See how the containers for our tasks in the Yarn MRv2 start from "2" and go up? This is the problem I had with the YARN patch. The first YARN task is always the app master (there is no MRv1 analogue for this) and so our first task to run Giraph code is alwasy task 2 or higher. I had to adjust this to start handing Id's into Giraph starting at 0. If you guys figure out where our taskId dependencies are i'd love to know. Ideally, I'd like to see Giraph not care internally what the taskId's are, where the numbering starts or, that they are contiguous as long as they are unique. > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-601) Exception when running pagerank benchmark: SendVertexRequest cannot be cast to MasterRequest
[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617857#comment-13617857 ] Eli Reisman commented on GIRAPH-601: When doing the YARN patch I puzzled over some of this SplitMasterWorker logic I think this could be another case where maybe some of this code has evolved quickly and isn't doing what it used to any more. > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug >Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_02/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-362) Address master task id for communication for master (known issue from GIRAPH-211)
[ https://issues.apache.org/jira/browse/GIRAPH-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617852#comment-13617852 ] Eli Reisman commented on GIRAPH-362: This is interesting because I had trouble in the YARN patch with the taskid stuff too. I noticed in a recent patch Maja removed a hardcoded setting of the master task id and set it with getTaskPartitionId type calls. Does anyone know exactly where the task id dependencies in Giraph are, what they are, etc? Are there any Giraph tasks that need a certain task id for a job to run? How about Hadoop or MR dependencies in the IO formats needing this? Thanks! > Address master task id for communication for master (known issue from > GIRAPH-211) > - > > Key: GIRAPH-362 > URL: https://issues.apache.org/jira/browse/GIRAPH-362 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching > > There is a workaround from GIRAPH-211 to handle requests a little differently > due to issues communicating to the master. We should fix this to be a > regular request in the future. > {code} > public void sendWritableRequest(Integer destWorkerId, > InetSocketAddress remoteServer, > WritableRequest request) { > if (clientRequestIdRequestInfoMap.isEmpty()) { > byteCounter.resetAll(); > } > boolean registerRequest = true; > /*if[HADOOP_NON_SECURE] > else[HADOOP_NON_SECURE]*/ > if (request.getType() == RequestType.SASL_TOKEN_MESSAGE_REQUEST) { > registerRequest = false; > } > /*end[HADOOP_NON_SECURE]*/ > Channel channel = getNextChannel(remoteServer); > RequestInfo newRequestInfo = new RequestInfo(remoteServer, request); > if (registerRequest) { > request.setClientId(clientId); > request.setRequestId( > addressRequestIdGenerator.getNextRequestId(remoteServer)); > ClientRequestId clientRequestId = > new ClientRequestId(destWorkerId, request.getRequestId()); > RequestInfo oldRequestInfo = clientRequestIdRequestInfoMap.putIfAbsent( > clientRequestId, newRequestInfo); > if (oldRequestInfo != null) { > throw new IllegalStateException("sendWritableRequest: Impossible to " > + > "have a previous request id = " + request.getRequestId() + ", " + > "request info of " + oldRequestInfo); > } > } > ChannelFuture writeFuture = channel.write(request); > newRequestInfo.setWriteFuture(writeFuture); > if (limitNumberOfOpenRequests && > clientRequestIdRequestInfoMap.size() > maxNumberOfOpenRequests) { > waitSomeRequests(maxNumberOfOpenRequests); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles
[ https://issues.apache.org/jira/browse/GIRAPH-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616759#comment-13616759 ] Eli Reisman commented on GIRAPH-599: I am assuming the problem is somewhere in the giraph-hive dependencies, we are using {code}hadoop.version{code} where we cannot safely do so to choose the right facebook hive io jar. Thanks! > Hive IO dependency issues with some Hadoop profiles > --- > > Key: GIRAPH-599 > URL: https://issues.apache.org/jira/browse/GIRAPH-599 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Eli Reisman > Fix For: 0.2.0 > > > Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now > this happens: > {code} > [INFO] > > [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT > [INFO] > > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom > (15 KB at 20.2 KB/sec) > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom > [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is > missing, no dependency information available > Downloading: > http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > Downloading: > https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar > Downloaded: > https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar > (201 KB at 194.6 KB/sec) > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Giraph Parent .. SUCCESS [0.717s] > [INFO] Apache Giraph Core SUCCESS [2:58.276s] > [INFO] Apache Giraph Hive I/O FAILURE [6.455s] > [INFO] Apache Giraph Examples SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 3:05.779s > [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013 > [INFO] Final Memory: 48M/352M > [INFO] > > [ERROR] Failed to execute goal on project giraph-hive: Could not resolve > dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: > Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in > central (http://repo1.maven.org/maven2) -> [Help 1] > [ERROR] > [ERROR] To see
[jira] [Updated] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-13: -- Attachment: GIRAPH-13-9-r6.patch Just another rebase. Not to hurry anyone, I know everyone's busy, but starting in a week or two I will have a lot less time to fix issues that reviewers put up. So...if anyone has a chance to peek at it over the next few days, I will be available to respond quickly to reviews, for now. If not...I understand! Thanks again! I will update this on RB too, where comments on the last couple iterations of the patch contain good command lines for building and running it on the cluster. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, > GIRAPH-13-9-r5.patch, GIRAPH-13-9-r6.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-599) Hive IO dependency issues with some Hadoop profiles
Eli Reisman created GIRAPH-599: -- Summary: Hive IO dependency issues with some Hadoop profiles Key: GIRAPH-599 URL: https://issues.apache.org/jira/browse/GIRAPH-599 Project: Giraph Issue Type: Bug Affects Versions: 0.2.0 Reporter: Eli Reisman Fix For: 0.2.0 Hey folks. I was rebasing GIRAPH-13 for all the new changes today and now this happens: {code} [INFO] [INFO] Building Apache Giraph Hive I/O 0.2-SNAPSHOT [INFO] Downloading: http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom Downloading: https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom Downloading: https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom Downloaded: https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.pom (15 KB at 20.2 KB/sec) Downloading: http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom Downloading: https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom Downloading: https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.pom [WARNING] The POM for com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha is missing, no dependency information available Downloading: http://repo1.maven.org/maven2/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar Downloading: http://repo1.maven.org/maven2/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar Downloading: https://repository.apache.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar Downloading: https://repository.apache.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar Downloading: https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar Downloading: https://oss.sonatype.org/content/groups/public/com/facebook/hadoop/hadoop-core/2.0.3-alpha/hadoop-core-2.0.3-alpha.jar Downloaded: https://oss.sonatype.org/content/groups/public/com/facebook/giraph/hive/hive-io-experimental/0.5/hive-io-experimental-0.5.jar (201 KB at 194.6 KB/sec) [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent .. SUCCESS [0.717s] [INFO] Apache Giraph Core SUCCESS [2:58.276s] [INFO] Apache Giraph Hive I/O FAILURE [6.455s] [INFO] Apache Giraph Examples SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 3:05.779s [INFO] Finished at: Thu Mar 28 14:40:17 PDT 2013 [INFO] Final Memory: 48M/352M [INFO] [ERROR] Failed to execute goal on project giraph-hive: Could not resolve dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: Could not find artifact com.facebook.hadoop:hadoop-core:jar:2.0.3-alpha in central (http://repo1.maven.org/maven2) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :giraph-hive {code} I was building the YARN profile, which used to cause giraph-hive not to build (incompatible profiles o
[jira] [Commented] (GIRAPH-582) Create a generic option for determining the number of supersteps that a job runs for
[ https://issues.apache.org/jira/browse/GIRAPH-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614461#comment-13614461 ] Eli Reisman commented on GIRAPH-582: Sorry to jump in late after the patch was put up, but I do think upon further reflection we might want in a future JIRA to change the name from giraph.maxSuperstep to something that clearly maps to setting the end of the job like giraph.finishJobOnSuperstep or something even clearer (to reflect that the superstep number we give is never actually executed.) I still think its a great idea, and an option we should have had for a while now! > Create a generic option for determining the number of supersteps that a job > runs for > > > Key: GIRAPH-582 > URL: https://issues.apache.org/jira/browse/GIRAPH-582 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching >Assignee: Avery Ching > Attachments: GIRAPH-582.patch, GIRAPH-582.patch.2 > > > Lots of applications just run for a fixed number of iterations. We can make > the code simpler if we make this feature part of the infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-13: -- Attachment: GIRAPH-13-9-r5.patch Just a rebase. Also available on RB (see link here in Description) > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch, > GIRAPH-13-9-r5.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-579) Make it possible to use different out-edges data structures for input and computation
[ https://issues.apache.org/jira/browse/GIRAPH-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612196#comment-13612196 ] Eli Reisman commented on GIRAPH-579: Really clever idea. Patch looks good. +1 on idea, will build when Hive-Io maven repo issues are fixed and i can verify the patch. ;) > Make it possible to use different out-edges data structures for input and > computation > - > > Key: GIRAPH-579 > URL: https://issues.apache.org/jira/browse/GIRAPH-579 > Project: Giraph > Issue Type: New Feature >Reporter: Alessandro Presta >Assignee: Alessandro Presta > Attachments: GIRAPH-579.patch, GIRAPH-579.patch > > > In some cases, the properties we want in the VertexEdges implementation > during input may differ from the ones we want during computation. > Two examples: > 1) During input, we want to keep only the top K edges according to weight, so > we use a fixed-size min-heap. During computation, our algorithm needs fast > random access, so we use a hash-map. > 2) We have a VertexEdges implementation that's optimized for space and/or > iteration speed, but has slow insertion. We can then use a different data > structure that has fast insertion during input. > We can add an option to specify a different VertexEdges class to be used in > EdgeStore during input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-581) More flexible Hive output
[ https://issues.apache.org/jira/browse/GIRAPH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612181#comment-13612181 ] Eli Reisman commented on GIRAPH-581: Hey folks. I was just going to commit this patch, downloaded fresh trunk, applied 581, etc. And this happens again: {code} [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Giraph Parent .. SUCCESS [1.573s] [INFO] Apache Giraph Core SUCCESS [2:42.075s] [INFO] Apache Giraph Hive I/O FAILURE [1:17.718s] [INFO] Apache Giraph Examples SKIPPED [INFO] Apache Giraph Accumulo I/O SKIPPED [INFO] Apache Giraph HBase I/O ... SKIPPED [INFO] Apache Giraph HCatalog I/O SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 4:01.839s [INFO] Finished at: Sun Mar 24 11:07:10 PDT 2013 [INFO] Final Memory: 43M/348M [INFO] [ERROR] Failed to execute goal on project giraph-hive: Could not resolve dependencies for project org.apache.giraph:giraph-hive:jar:0.2-SNAPSHOT: Failed to collect dependencies for [com.facebook.giraph.hive:hive-io-experimental:jar:0.4-SNAPSHOT (compile), com.fasterxml.jackson.core:jackson-core:jar:2.1.0 (compile), com.fasterxml.jackson.core:jackson-databind:jar:2.1.0 (compile), com.github.spullara.cli-parser:cli-parser:jar:1.1 (compile), org.apache.giraph:giraph-core:jar:0.2-SNAPSHOT (compile), org.apache.hive:hive-metastore:jar:0.10.0 (compile), org.apache.giraph:giraph-core:jar:tests:0.2-SNAPSHOT (test), commons-net:commons-net:jar:3.1 (provided), org.apache.hadoop:hadoop-core:jar:0.20.203.0 (provided)]: Failed to read artifact descriptor for com.facebook.giraph.hive:hive-io-experimental:jar:0.4-SNAPSHOT: Could not transfer artifact com.facebook.giraph.hive:hive-io-experimental:pom:0.4-SNAPSHOT from/to sonatypereleases (https://oss.sonatype.org/content/groups/public/): Connection to https://oss.sonatype.org refused: Connection timed out -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :giraph-hive {code} I don't think its this patch, but something in the dependencies with hive-io jars is still not right I think? > More flexible Hive output > - > > Key: GIRAPH-581 > URL: https://issues.apache.org/jira/browse/GIRAPH-581 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-581.patch, GIRAPH-581.patch > > > Currently with Hive output formats it's only possible to write single row per > vertex. We should support variable number of rows per vertex (zero or > multiple). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-582) Create a generic option for determining the number of supersteps that a job runs for
[ https://issues.apache.org/jira/browse/GIRAPH-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612165#comment-13612165 ] Eli Reisman commented on GIRAPH-582: Great Idea! > Create a generic option for determining the number of supersteps that a job > runs for > > > Key: GIRAPH-582 > URL: https://issues.apache.org/jira/browse/GIRAPH-582 > Project: Giraph > Issue Type: Improvement >Reporter: Avery Ching > > Lots of applications just run for a fixed number of iterations. We can make > the code simpler if we make this feature part of the infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-583) Problem with authentication on Hadoop 0.23
[ https://issues.apache.org/jira/browse/GIRAPH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612163#comment-13612163 ] Eli Reisman commented on GIRAPH-583: Hi folks. I just wanted to mention the issue I saw with RWR was also an IOException, but it was in the tests where InternalVertexRunner was not finding an output file (which is what often happens when IVR hits a problem while running an integration test.) I have still not figured out what was doing it, or if the matter is resolved, or there is some configuration problem. I was running 2.0.3-alpha on trunk and it was only happening every 5-6 attempted builds. Very odd. Haven't seen it lately, but haven't built Giraph too much this week either. Hope you guys get this figured out. Looks like an authentication issue. I think Eugene will be the guy with the answers here. > Problem with authentication on Hadoop 0.23 > -- > > Key: GIRAPH-583 > URL: https://issues.apache.org/jira/browse/GIRAPH-583 > Project: Giraph > Issue Type: Bug >Reporter: Gianmarco De Francisci Morales > > Hi, > I am trying to run the RWR code on trunk and Hadoop 0.23 with Kerberos > authentication, but I get this exception: > {code} > 13/03/23 17:32:36 ERROR security.UserGroupInformation: > PriviledgedActionException as:gdfm (auth:KERBEROS) > cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 13/03/23 17:32:36 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > 13/03/23 17:32:36 ERROR security.UserGroupInformation: > PriviledgedActionException as:gdfm (auth:KERBEROS) cause:java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > 13/03/23 17:32:36 ERROR security.UserGroupInformation: > PriviledgedActionException as:gdfm (auth:KERBEROS) cause:java.io.IOException: > Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: > "gwta3005.tan.ygrid.yahoo.com/98.138.127.244"; destination host is: > ""tiberiumtan-nn1.tan.ygrid.yahoo.com":8020; > Exception in thread "main" java.io.IOException: Failed on local exception: > java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt)]; Host Details : local host is: > "gwta3005.tan.ygrid.yahoo.com/98.138.127.244"; destination host is: > ""tiberiumtan-nn1.tan.ygrid.yahoo.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:738) > at org.apache.hadoop.ipc.Client.call(Client.java:1092) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:195) > at $Proxy6.getDelegationToken(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:102) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:67) > at $Proxy6.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:603) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:786) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:466) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:444) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:122) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:101) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) > at > org.apache.giraph.io.formats.TextVertexOutputFormat.checkOutputSpecs(TextVertexOutputFormat.j
[jira] [Commented] (GIRAPH-510) Remove HBase Cruft
[ https://issues.apache.org/jira/browse/GIRAPH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609198#comment-13609198 ] Eli Reisman commented on GIRAPH-510: As far as miniclusters, having just seen this on GIRAPH-13, let me pipe in again: pick dir names and ports that will not collide with: InternalVertexRunner, various Hadoop mini cluster impls, OR MiniYARNCluster as they all run tests in parallel and can conflict in confusing ways when their dirs or ports collide. Including tests that only fail once in a while etc. Be careful out there! > Remove HBase Cruft > -- > > Key: GIRAPH-510 > URL: https://issues.apache.org/jira/browse/GIRAPH-510 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Nitay Joffe >Priority: Minor > Labels: easy, newbie > Attachments: GIRAPH-510.patch, GIRAPH-510-v2.patch > > > The HBase tests appear to leave around lots of cruft, namely graph.csv, > .graph.csv in the giraph folders and -ROOT-, simple_graph, hbase.version in > the user home directory. We should remove these (or better yet not create > them on first place). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-510) Remove HBase Cruft
[ https://issues.apache.org/jira/browse/GIRAPH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609195#comment-13609195 ] Eli Reisman commented on GIRAPH-510: Thanks Alessandro! I didn't comment here. I had piped in about System.getProperty("java.io.tmp.dir") or FileUtils in commons-io (which uses the same thing as a base dir I think when creating a test directory?) that seems to work out well for this sort of thing. But yes my home dir is filling up with hbase.version and other charming junk so I'm all for this happening! > Remove HBase Cruft > -- > > Key: GIRAPH-510 > URL: https://issues.apache.org/jira/browse/GIRAPH-510 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Nitay Joffe >Priority: Minor > Labels: easy, newbie > Attachments: GIRAPH-510.patch, GIRAPH-510-v2.patch > > > The HBase tests appear to leave around lots of cruft, namely graph.csv, > .graph.csv in the giraph folders and -ROOT-, simple_graph, hbase.version in > the user home directory. We should remove these (or better yet not create > them on first place). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-577) Create a testing framework that doesn't require I/O formats
[ https://issues.apache.org/jira/browse/GIRAPH-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608267#comment-13608267 ] Eli Reisman commented on GIRAPH-577: Thanks for the contribution. Could you re-submit your diff using "git diff --no-prefix trunk" to strip the a/ and b/ directory headings? So this is not meant to generate graph data, or to perform a no-op job, but to construct a small, hardcoded graph for reuse in small tests? One thing along these lines we really need is someone to convert GIRAPH-26 from Colt to Mahout math libraries so we can generate interesting synthetic graph data as well, if you're curious. > Create a testing framework that doesn't require I/O formats > --- > > Key: GIRAPH-577 > URL: https://issues.apache.org/jira/browse/GIRAPH-577 > Project: Giraph > Issue Type: New Feature >Affects Versions: 0.2.0 >Reporter: Alessandro Presta >Assignee: Veselin Stoyanov > Labels: patch > Attachments: GIRAPH-577.patch > > > Create a TestGraph class to conveniently build graphs stored in memory. > Add appropriate input/output formats to be used in InternalVertexRunner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605408#comment-13605408 ] Eli Reisman commented on GIRAPH-13: --- Hey Eugene, a better command line is on the current revision of this patch on RB (marked r5 there, its r4 in the patch here...sorry) in the explanation. Forgot to post it here. And yes, there are several yarn-site.xml values you need set I can pass along that are not well doc'ed that make the cluster happy if you run into trouble. So far, this version works well for me. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile
Eli Reisman created GIRAPH-574: -- Summary: Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile Key: GIRAPH-574 URL: https://issues.apache.org/jira/browse/GIRAPH-574 Project: Giraph Issue Type: Improvement Reporter: Eli Reisman Priority: Minor As folks read the Giraph on YARN code it is inevitable it will occur to someone "Well, if the job fails when the ApplicationMaster fails, could we move some or all of our Master task functions there and just call it master?" Yes. In two ways. One, we launch a dedicated master process marked as such with setup responsibilities, and we assess from the app master how the launch went. We keep launching "masters" until one takes. Then, we launch the workers. Another is to simply run MasterThread and associated stuff from the App Master directly, and when we know its up and running properly, only then does app master launch the workers. The YARN app master can be rebooted is designed to be a place for fault-tolerant "master node" stuff to happen. However, I think a larger purpose is to act as a meta-master for launching a DAG of jobs within the run of a single app master lifecycle. Or the app master cant act as any of these things, or something else I haven't thought of. The architecture is fairly malleable. This is not a requirement for us, and maybe not a good idea at all. This is just a placeholder JIRA to discuss and collect ideas since as I said above someone is going to bring it up ;) Thank you for reading. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile?
[ https://issues.apache.org/jira/browse/GIRAPH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-574: --- Summary: Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile? (was: Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile) > Move Giraph Master node functionality to AppMaster or launch directly from > AppMaster in YARN profile? > - > > Key: GIRAPH-574 > URL: https://issues.apache.org/jira/browse/GIRAPH-574 > Project: Giraph > Issue Type: Improvement >Reporter: Eli Reisman >Priority: Minor > > As folks read the Giraph on YARN code it is inevitable it will occur to > someone "Well, if the job fails when the ApplicationMaster fails, could we > move some or all of our Master task functions there and just call it master?" > Yes. In two ways. > One, we launch a dedicated master process marked as such with setup > responsibilities, and we assess from the app master how the launch went. We > keep launching "masters" until one takes. Then, we launch the workers. > Another is to simply run MasterThread and associated stuff from the App > Master directly, and when we know its up and running properly, only then does > app master launch the workers. > The YARN app master can be rebooted is designed to be a place for > fault-tolerant "master node" stuff to happen. However, I think a larger > purpose is to act as a meta-master for launching a DAG of jobs within the run > of a single app master lifecycle. Or the app master cant act as any of these > things, or something else I haven't thought of. The architecture is fairly > malleable. > This is not a requirement for us, and maybe not a good idea at all. This is > just a placeholder JIRA to discuss and collect ideas since as I said above > someone is going to bring it up ;) > Thank you for reading. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-574) Move Giraph Master node functionality to AppMaster or launch directly from AppMaster in YARN profile
[ https://issues.apache.org/jira/browse/GIRAPH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604468#comment-13604468 ] Eli Reisman commented on GIRAPH-574: Think aggregators, master compute, other things? also... > Move Giraph Master node functionality to AppMaster or launch directly from > AppMaster in YARN profile > > > Key: GIRAPH-574 > URL: https://issues.apache.org/jira/browse/GIRAPH-574 > Project: Giraph > Issue Type: Improvement >Reporter: Eli Reisman >Priority: Minor > > As folks read the Giraph on YARN code it is inevitable it will occur to > someone "Well, if the job fails when the ApplicationMaster fails, could we > move some or all of our Master task functions there and just call it master?" > Yes. In two ways. > One, we launch a dedicated master process marked as such with setup > responsibilities, and we assess from the app master how the launch went. We > keep launching "masters" until one takes. Then, we launch the workers. > Another is to simply run MasterThread and associated stuff from the App > Master directly, and when we know its up and running properly, only then does > app master launch the workers. > The YARN app master can be rebooted is designed to be a place for > fault-tolerant "master node" stuff to happen. However, I think a larger > purpose is to act as a meta-master for launching a DAG of jobs within the run > of a single app master lifecycle. Or the app master cant act as any of these > things, or something else I haven't thought of. The architecture is fairly > malleable. > This is not a requirement for us, and maybe not a good idea at all. This is > just a placeholder JIRA to discuss and collect ideas since as I said above > someone is going to bring it up ;) > Thank you for reading. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-573) Giraph is ready for port to Mesos or other cluster frameworks
Eli Reisman created GIRAPH-573: -- Summary: Giraph is ready for port to Mesos or other cluster frameworks Key: GIRAPH-573 URL: https://issues.apache.org/jira/browse/GIRAPH-573 Project: Giraph Issue Type: Bug Affects Versions: 0.2.0 Reporter: Eli Reisman Priority: Minor Fix For: 0.2.0 The refactors and general approach that worked with YARN set up a template that could be adapted easily to other cluster management platforms like Mesos. Or take-your-pick. I am not saying this is a priority or even desirable, I leave that to the community. But it would be easy now, if we want to. Ideas and opinions can be posted here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-572) The o.a.g.yarn package could be the top-level of a source tree of packages that mirrors core
[ https://issues.apache.org/jira/browse/GIRAPH-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-572: --- Summary: The o.a.g.yarn package could be the top-level of a source tree of packages that mirrors core (was: The o.a.g.yarn package could be the top-level of a source tree of packages that miorrors core) > The o.a.g.yarn package could be the top-level of a source tree of packages > that mirrors core > > > Key: GIRAPH-572 > URL: https://issues.apache.org/jira/browse/GIRAPH-572 > Project: Giraph > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Eli Reisman > > This might be a bad idea. But here goes: > There are possibilities to move all sorts of functionality out of the > Giraph/BSP parts of the code and into the YARN AppMaster, or into > separately-managed containers launched from the AppMaster. > For each functionality we decide to re-implement in YARN, it will need to > live in the yarn package tree to be selectively compiled and to use YARN-only > imports. > One possibility to begin doing this is to use GIRAPH-13's > Configuration#isPureYarnJob. We will use the isPureYarnJob in Giraph to > selectively "no-op" each functionality we replace. Then, we re-implement the > YARN way in our yarn package tree. > If we do this, we should begin early by mirroring the core source tree in > subpackages of yarn. So if we moved a functionality out of o.a.g.graph > package we would reimplement it in o.a.g.yarn.graph package. > I don't suggest doing it all at once, but as we add files to o.a.g.yarn, just > to get the idea out there before the files start to pile up. Anything that > uses YARN imports will have to choose between munge flags and being in the > o.a.g.yarn package, one way or another. > If we don't like this idea, mark it won't fix. I'm not attached to it, just > an idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-572) The o.a.g.yarn package could be the top-level of a source tree of packages that miorrors core
Eli Reisman created GIRAPH-572: -- Summary: The o.a.g.yarn package could be the top-level of a source tree of packages that miorrors core Key: GIRAPH-572 URL: https://issues.apache.org/jira/browse/GIRAPH-572 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eli Reisman This might be a bad idea. But here goes: There are possibilities to move all sorts of functionality out of the Giraph/BSP parts of the code and into the YARN AppMaster, or into separately-managed containers launched from the AppMaster. For each functionality we decide to re-implement in YARN, it will need to live in the yarn package tree to be selectively compiled and to use YARN-only imports. One possibility to begin doing this is to use GIRAPH-13's Configuration#isPureYarnJob. We will use the isPureYarnJob in Giraph to selectively "no-op" each functionality we replace. Then, we re-implement the YARN way in our yarn package tree. If we do this, we should begin early by mirroring the core source tree in subpackages of yarn. So if we moved a functionality out of o.a.g.graph package we would reimplement it in o.a.g.yarn.graph package. I don't suggest doing it all at once, but as we add files to o.a.g.yarn, just to get the idea out there before the files start to pile up. Anything that uses YARN imports will have to choose between munge flags and being in the o.a.g.yarn package, one way or another. If we don't like this idea, mark it won't fix. I'm not attached to it, just an idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-571) Giraph on YARN could launch a job-local ZK instance from the AppMaster
Eli Reisman created GIRAPH-571: -- Summary: Giraph on YARN could launch a job-local ZK instance from the AppMaster Key: GIRAPH-571 URL: https://issues.apache.org/jira/browse/GIRAPH-571 Project: Giraph Issue Type: Improvement Components: zookeeper Affects Versions: 0.2.0 Reporter: Eli Reisman Once GIRAPH-13 is in, we can think differently about a lot of things if we choose too. for one thing, we have had problems launching job-local ZK instances. We could (for YARN) move that functionality to the App Master, having it launch a container just for ZK and populating the Configuration's giraph.zkList setting so when the MRv1 ZK manager code sees the Conf, it will think we already have a non-job-local ZK at zkList's host and port, and will just connect instead of starting another local instance, making the whole affair transparent to existing Giraph code. Not important, but the YARN patch is currently defaulting to only execute jobs with a non-local ZK instance already running, and giraph.zkList populated with its host:port. Its quite possible when we get our MRv1 job local ZK working again, we can remove this and it will work right out of the box, there's no reason it won't. But managing extraneous services (especially those that hold up the job setup like launching a ZK) is what the YARN AppMaster is all about anyway. i haven't been able to get our local ZK instance to launch outside of test cases for a while now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-570) Create YARN RPC Records using BuilderUtils instead of populating them by hand
Eli Reisman created GIRAPH-570: -- Summary: Create YARN RPC Records using BuilderUtils instead of populating them by hand Key: GIRAPH-570 URL: https://issues.apache.org/jira/browse/GIRAPH-570 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eli Reisman Priority: Minor Fix For: 0.2.0 Good newbie JIRA. Might need to check as see how far back in the Hadoop-2.0.x line the BuilderUtils exist so we know if we are cutting ourselves off from a future backport, but if we don't care, this can happen: Instead of creating and hand-populating each RPC record Giraph uses to request resources from YARN like: {code} Record x = Records.newRecord( className ); x.setField(blah); x.setOtherField(blahblah); // ...and so on {code} we can use BuilderUtils: {code} Record readyToSend = BuilderUtils.MakeMyNewRecord( blah, blah ); {code} anyway you get the drill. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-569) Decided what the versioning story should be for Giraph on YARN
Eli Reisman created GIRAPH-569: -- Summary: Decided what the versioning story should be for Giraph on YARN Key: GIRAPH-569 URL: https://issues.apache.org/jira/browse/GIRAPH-569 Project: Giraph Issue Type: Task Affects Versions: 0.2.0 Reporter: Eli Reisman Priority: Minor Right now, Giraph straddles the fence between a new and old YARN API. The place we're starting is a good compromise but we will need to make some decisions if we want to backport. Pros: Service as many version of YARN as possible, going back potentially to 2.0.1 or 2.0.0. Cons: I would like to provide the slickest, most up-to-date example of how to run a framework like Giraph with a YARN cluster so that others can take an example from us. I have been told by folks who know that these newer API's are more concise and more robust. But this is currently looking like supporting 2.0.3-alpha at the very oldest, and newer versions up to trunk, and thats it. This sort sucks because we have legitimate, working profiles for the whole 2.0.x line and there may be some expectations there. On the other hand, by not backporting, could go the other direction and adopt some of the newest 2.0.4-alpha API and just assume YARN is maturing and folks using it now would update with each alpha release right away anyhow. Adding the new API's to the whole YARN impl (especially the GiraphApplicationMaster) would make the implementation a real nice example of how to use the new API's and would make the profile more robust in job runs. Opinions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-568) Giraph on YARN will need a WebUI to display job stats, Yammer metrics, whatever
Eli Reisman created GIRAPH-568: -- Summary: Giraph on YARN will need a WebUI to display job stats, Yammer metrics, whatever Key: GIRAPH-568 URL: https://issues.apache.org/jira/browse/GIRAPH-568 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eli Reisman Fix For: 0.2.0 In YARN, the Client is the driver program when you run a job at the commmand line. This lauches the application master, which is like an uber-master that manages the job lifecycle for all the Giraph worker/master tasks that actually run the BSP job. The Application Master can register an RPC Port and a Tracking URL with the YARN system (ResourceManager) which will be published on the YARN cluster WebUI in case folks running a Giraph job want to see detailed formatted web info such as Hadoop has. Previously we have hijacked Hadoop's counters and web ui. Now, we can start to think fresh about how to read logs, view job and node status, memory use, disk spills, Yammer metrics, whatever. Someone could get very creative with this. If someone is feeling up to it, I can show you where the YARN bits are you will want to interface with. The rest can really go any way you want it to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-560) Input filtering
[ https://issues.apache.org/jira/browse/GIRAPH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604357#comment-13604357 ] Eli Reisman commented on GIRAPH-560: Yes. Great idea. The current ways of forcing this are unfortunate. > Input filtering > --- > > Key: GIRAPH-560 > URL: https://issues.apache.org/jira/browse/GIRAPH-560 > Project: Giraph > Issue Type: Bug >Reporter: Nitay Joffe >Assignee: Nitay Joffe > > Add some simple filtering for user to be able to drop edges / vertices at > input time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-13) Port Giraph to YARN
[ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-13: -- Attachment: GIRAPH-13-9-r4.patch Thanks Eugene! This will be a bear to review so take your time. But make sure and use this copy, the integration tests would occasionally fail on the last one because tests that run InternalVertexRunner were occasionally stealing each other's test dirs and ports. All fixed here. I have run a bunch of jobs on this today and its running well now (I hope!) I'll put this on RB too. > Port Giraph to YARN > --- > > Key: GIRAPH-13 > URL: https://issues.apache.org/jira/browse/GIRAPH-13 > Project: Giraph > Issue Type: New Feature >Reporter: Jakob Homan >Assignee: Eli Reisman > Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, > GIRAPH-13-4.patch, GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, > GIRAPH-13-8.patch, GIRAPH-13-9.patch, GIRAPH-13-9-r1.patch, > GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch, GIRAPH-13-9-r4.patch > > > Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop > trunk, we should think about what it would take to separate out the graph > processing bits of Giraph from the MR1-specific code so as to take advantage > of the less-MR centric aspects of YARN, while still supporting both over the > medium term. > Review Board link (ready for review now): https://reviews.apache.org/r/9811/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-567) Tests on trunk are failing for giraph-examples at RandomWalk
[ https://issues.apache.org/jira/browse/GIRAPH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-567: --- Summary: Tests on trunk are failing for giraph-examples at RandomWalk (was: Tests on trunk are failing for giraph-examples at RandomWalks) > Tests on trunk are failing for giraph-examples at RandomWalk > > > Key: GIRAPH-567 > URL: https://issues.apache.org/jira/browse/GIRAPH-567 > Project: Giraph > Issue Type: Bug >Reporter: Eli Reisman > > Seems to be something has upset the tests in examples here, this is the > surefire report from "mvn verify" on trunk tonight: > {code} > --- > Test set: org.apache.giraph.examples.RandomWalkWithRestartVertexTest > --- > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.085 sec <<< > FAILURE! > testWeightedGraph(org.apache.giraph.examples.RandomWalkWithRestartVertexTest) > Time elapsed: 1.038 sec <<< ERROR! > java.io.FileNotFoundException: > /var/folders/wq/rrrp5_8s3wgby3ybwn87z5lcgn/T/giraph-RandomWalkWithRestartVertex-1996932221558672384/output/part-m-0 > (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:120) > at com.google.common.io.Files$1.getInput(Files.java:110) > at com.google.common.io.Files$1.getInput(Files.java:107) > at com.google.common.io.CharStreams$2.getInput(CharStreams.java:93) > at com.google.common.io.CharStreams$2.getInput(CharStreams.java:90) > at com.google.common.io.CharStreams.readLines(CharStreams.java:310) > at com.google.common.io.Files.readLines(Files.java:544) > at > org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:208) > at > org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:77) > at > org.apache.giraph.examples.RandomWalkWithRestartVertexTest.testWeightedGraph(RandomWalkWithRestartVertexTest.java:108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) > at org.junit.runners.ParentRunner.run(ParentRunner.java:236) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59) > at > org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120) > at > org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103) > at org.apache.maven.surefire.Surefire.run(Surefire.java:169) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350) > at > org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-554) Set PartitionContext in InternalVertexRunner
[ https://issues.apache.org/jira/browse/GIRAPH-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604105#comment-13604105 ] Eli Reisman commented on GIRAPH-554: Did we ever rerun this? Where's the SUCCESS log? > Set PartitionContext in InternalVertexRunner > > > Key: GIRAPH-554 > URL: https://issues.apache.org/jira/browse/GIRAPH-554 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Attachments: GIRAPH-554.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-567) Tests on trunk are failing for giraph-examples at RandomWalks
Eli Reisman created GIRAPH-567: -- Summary: Tests on trunk are failing for giraph-examples at RandomWalks Key: GIRAPH-567 URL: https://issues.apache.org/jira/browse/GIRAPH-567 Project: Giraph Issue Type: Bug Reporter: Eli Reisman Seems to be something has upset the tests in examples here, this is the surefire report from "mvn verify" on trunk tonight: {code} --- Test set: org.apache.giraph.examples.RandomWalkWithRestartVertexTest --- Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.085 sec <<< FAILURE! testWeightedGraph(org.apache.giraph.examples.RandomWalkWithRestartVertexTest) Time elapsed: 1.038 sec <<< ERROR! java.io.FileNotFoundException: /var/folders/wq/rrrp5_8s3wgby3ybwn87z5lcgn/T/giraph-RandomWalkWithRestartVertex-1996932221558672384/output/part-m-0 (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:120) at com.google.common.io.Files$1.getInput(Files.java:110) at com.google.common.io.Files$1.getInput(Files.java:107) at com.google.common.io.CharStreams$2.getInput(CharStreams.java:93) at com.google.common.io.CharStreams$2.getInput(CharStreams.java:90) at com.google.common.io.CharStreams.readLines(CharStreams.java:310) at com.google.common.io.Files.readLines(Files.java:544) at org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:208) at org.apache.giraph.utils.InternalVertexRunner.run(InternalVertexRunner.java:77) at org.apache.giraph.examples.RandomWalkWithRestartVertexTest.testWeightedGraph(RandomWalkWithRestartVertexTest.java:108) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103) at org.apache.maven.surefire.Surefire.run(Surefire.java:169) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350) at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-565) Make an easy way to gather some logs from workers on master
[ https://issues.apache.org/jira/browse/GIRAPH-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604101#comment-13604101 ] Eli Reisman commented on GIRAPH-565: This is great. Cool idea! +1 > Make an easy way to gather some logs from workers on master > --- > > Key: GIRAPH-565 > URL: https://issues.apache.org/jira/browse/GIRAPH-565 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-565.patch > > > When debugging jobs with a lot of workers, it's really useful to be able to > have some information from any of the workers at a single place, and not to > have to go through each worker's logs to find what you are looking for. > Every time I do this I find myself implementing some aggregator to gather > those logs from all the workers on the master, so might as well make this > aggregator an easy option for everyone. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-564) Input formats should provide GiraphContext
[ https://issues.apache.org/jira/browse/GIRAPH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603558#comment-13603558 ] Eli Reisman commented on GIRAPH-564: @Avery: yes this will be a big help. On the other hand, I have gotten the YARN impl to work without it, and in a stable way, so we're free to let this take a while if we want. Replacing the Mapper#Context connection that has been exposed now in GraphMapper/GraphTaskManager will get rid of the umbilical cord to Hadoop MRv1. On the other hand, this also means a big refactor to the IO formats since they depend on various Task-related objects handed off to us by the Mapper#Context, our Configuration is the "easy" one to deal with. @Alessandro: I like this idea, and I like the simplification. I'm thinking there were some places outside IO where the Immutable version is non-negotiable to get the generics plumbing to work on that reference down the road. So just placing the wrapper in the IO might not work. I think there were at least 2 places just internal to the YARN setup code and ConfigurationUtils where I had to wrap the class to keep the generics working. Other times it doesn't seem to matter. A cleaner solution here is inevitable soon. The Mapper#Context is the key to the whole thing. I was actually going to put up this JIRA this week myself ;) In the GiraphYarnTask in GIRAPH-13 you can see what stuff the Mapper#Context replacement will need to carry in for the engine to turn over on the Giraph side. So it could be simpler than Mapper#context in a bunch of ways also. > Input formats should provide GiraphContext > -- > > Key: GIRAPH-564 > URL: https://issues.apache.org/jira/browse/GIRAPH-564 > Project: Giraph > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Avery Ching > > Context is a MapReduce Context that input classes have to explicitly create a > ImmutableGiraphClassesConfiguration from (which is not intuitive). It would > be better to provide a GiraphContext that would provide a > ImmutableGiraphClassesConfiguration directly for the user, while still > providing the user access to the MapReduce Context if really necessary. This > might also help with the YARN port? Not sure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-547) Allow in-place modification of edges
[ https://issues.apache.org/jira/browse/GIRAPH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603536#comment-13603536 ] Eli Reisman commented on GIRAPH-547: Thanks! On Thu, Mar 14, 2013 at 4:52 PM, Alessandro Presta (JIRA) > Allow in-place modification of edges > > > Key: GIRAPH-547 > URL: https://issues.apache.org/jira/browse/GIRAPH-547 > Project: Giraph > Issue Type: New Feature >Reporter: Alessandro Presta >Assignee: Alessandro Presta > Attachments: GIRAPH-547.patch > > > This is a somewhat long term item. > Because of some optimized edge storage implementations (byte array, primitive > array), we have a contract with the user that Edge objects returned by > getEdges() are read-only. > One concrete example where in-place modification would be useful: in the > weighted version of PageRank, you can store the weight sum and normalize each > message sent, or you could more efficiently normalize the out-edges once in > superstep 0. > The Pregel paper describes an OutEdgeIterator that allows for in-place > modification of edges. I can see how that would be easy to implement in C++, > where there is no need to reuse objects. > Giraph "unofficially" supports this if one is using generic collections to > represent edges (e.g. ArrayList or HashMap). > It may be trickier in some optimized implementations, but in principle it > should be doable. > One way would be to have some special MutableEdge implementation which calls > back to the edge data structure in order to save modifications: > {code} > for (Edge edge : getEdges()) { > edge.setValue(newValue); > } > {code} > Another option would be to add a special set() method to our edge iterator, > where one can replace the current edge: > {code} > for (EdgeIterator it = getEdges().iterator(); it.hasNext();) { > Edge edge = it.next(); > edge.setValue(newValue); > it.set(edge); > } > {code} > We could actually implement the first version as syntactic sugar on top of > the second version (the special MutableEdge would need a reference to the > iterator in order to call set(this)). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira