[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper
[ https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426261#comment-13426261 ] Avery Ching commented on GIRAPH-273: I believe not, since the workers always communicated to the master using ZooKeeper in the past. See BspServiceMaster#becomeMaster(), which describes the master protocol to see if it is the master. BspService#masterElectionPath contains the location to the actual master, just need to factor out the common code to get the master address out of BspServiceMaster#becomeMaster() and perhaps put it in BspService. Note, you'll also have to add a few new methods for communication to the master as none exist currently and start up a Netty server there as well. > Aggregators shouldn't use Zookeeper > --- > > Key: GIRAPH-273 > URL: https://issues.apache.org/jira/browse/GIRAPH-273 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We use Zookeeper znodes to transfer aggregated values from workers to master > and back. Zookeeper is supposed to be used for coordination, and it also has > a memory limit which prevents users from having aggregators with large value > objects. These are the reasons why we should implement aggregators gathering > and distribution in a different way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426089#comment-13426089 ] Eli Reisman commented on GIRAPH-275: incidentally, if using tons of workers or tons of data to see a performance gain is not your cup of tea, I understand. I'm excited about it because this sort of scale out (lots of data + many workers) is exactly the model I'm working towards for my internship. As explained in the last post, if thats not your goal (or a realistic option on your cluster) this may be of less value unless we force the issue of where hadoop puts the GraphMappers. I would be excited to attempt to incorporate this into a larger locality fix if there is interest as data locality for Giraph has been on my mind for some time now. Either way, I will run lots more tests on both cases (workers saturating the cluster to exploit all possible locality, workers not on many nodes and not as likely to encounter the data where they happen to get placed) and post the results ASAP. > Restore data locality to workers reading InputSplits where possible without > querying NameNode, ZooKeeper > > > Key: GIRAPH-275 > URL: https://issues.apache.org/jira/browse/GIRAPH-275 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph >Affects Versions: 0.2.0 >Reporter: Eli Reisman >Assignee: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch > > > During INPUT_SUPERSTEP, workers wait on a barrier until the master has > created a complete list of available input splits. Once the barrier is past, > each worker iterates through this list of input splits, creating a znode to > lay claim to the next unprocessed split the worker encounters. > For a brief moment while the master is creating the input split znodes each > worker iterates through, it has access to InputSplit objects that also > contain a list of hostnames on which the blocks of the file are hosted. By > including that list of locations in each znode pathname we can allow each > worker reading the list of available splits to sort it so that splits the > worker attempts to claim first are the ones that contain a block that is > local to that worker's host. > This allows the possibility for many workers to end up reading at least one > split that is local to its own host. If the input split selected holds a > local block, the RecordReader Hadoop supplies us with will automatically read > from that block anyway. By supplying this locality data as part of the znode > name rather than info inside the znode, we avoid reading the data from each > znode while sorting, which is only currently done when a split is claimed and > which is IO intensive. Sorting the string path data is cheap and faster, and > making the final split znode's name longer doesn't seem to matter too much. > By using the BspMaster's InputSplit data to include locality information in > the znode path directly, we also avoid having to access the > FileSystem/BlockLocations directly from either master or workers, which could > also flood the name node with queries. This is the only place I've found > where some locality information is already available to Giraph free of > additional cost. > Finally, by sorting each worker's split list this way, we get the > contention-reduction of GIRAPH-250 for free, since only workers on the same > host will be likely to contend for a split instead of the current situation > in which all workers contend for the same input splits from the same list, > iterating from the same index. GIRAPH-250 has already been logged as reducing > pages of contention on the first pass (when using many 100's of workers) down > to 0-3 contentions before claiming a split to read. > This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If > anyone else could try this on an HDFS cluster where locality info is supplied > to InputSplit objects, I would be really interested to see other folks' > results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-272) cannot extend Vertex in user code
[ https://issues.apache.org/jira/browse/GIRAPH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426008#comment-13426008 ] Eli Reisman commented on GIRAPH-272: Hey I don't mean to bring up a dead issue, but Allesandro since you are now intimately familiar with this part of the code now, would you put something up in this thread or on the site wiki explaining a bit about what to subclass, what not to, and why, now that the vertex naming scheme etc. is changed and cleaned up? I have been asked this same question a lot, given an explanation similar to Avery's here, and left the questioners generally unsatisfied. Some more detail might be in order just for clarity's sake. Not everyone on the user end (around here at least) wants to dig around in the source to find out the answers, but they all seem to want to subclass Vertex :) > cannot extend Vertex in user code > - > > Key: GIRAPH-272 > URL: https://issues.apache.org/jira/browse/GIRAPH-272 > Project: Giraph > Issue Type: Bug > Components: graph >Affects Versions: 0.2.0 >Reporter: Joseph Adler >Priority: Critical > > As currently written, it's not possible to create a class that extends Vertex > in user code. I think this is because the abstract methods putMessages and > releaseResources are abstract and neither public nor protected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-275: --- Attachment: GIRAPH-275-2.patch OK, this patch actually works just as explained in the original Description, no extra reads to ZK or FileSystem needed, and no harm done if the location info doesn't exist (as in not using HDFS in your run) other than a List sort per worker. One observation to note as I have been testing this: This is "poor man's locality" since we have no control over where Hadoop puts our workers. In Hadoop, they can go where the data is given the location info we have to work with. In Giraph, the workers go wherever Hadoop puts them and we are just hoping some data blocks are also local. So...increasing the # of workers or the amount of data you run helps a lot. I saw performance improvements even when my # of workers per job was much lower than total # of cluster nodes where data could be located,but I expect the best performance will be when this ratio is closer to 1.0 The ultimate locality fix will involve major overhauls to how Giraph gets InputFormat/RecordReader and does task submission but might be too much until we're on YARN (or not worth it until we're on YARN.) This might be as good as it gets as far as minimal changes to the code and low-cost to get some locality. So it might be a nice hold-over while we still ride directly on Hadoop. I will run more tests to measure performance/memory gains in more detail and see what we really have here. If you do, please post results, small clusters should benefit as much as large ones (to the extent that there is a benefit.) so I'd love to see you results. > Restore data locality to workers reading InputSplits where possible without > querying NameNode, ZooKeeper > > > Key: GIRAPH-275 > URL: https://issues.apache.org/jira/browse/GIRAPH-275 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph >Affects Versions: 0.2.0 >Reporter: Eli Reisman >Assignee: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch > > > During INPUT_SUPERSTEP, workers wait on a barrier until the master has > created a complete list of available input splits. Once the barrier is past, > each worker iterates through this list of input splits, creating a znode to > lay claim to the next unprocessed split the worker encounters. > For a brief moment while the master is creating the input split znodes each > worker iterates through, it has access to InputSplit objects that also > contain a list of hostnames on which the blocks of the file are hosted. By > including that list of locations in each znode pathname we can allow each > worker reading the list of available splits to sort it so that splits the > worker attempts to claim first are the ones that contain a block that is > local to that worker's host. > This allows the possibility for many workers to end up reading at least one > split that is local to its own host. If the input split selected holds a > local block, the RecordReader Hadoop supplies us with will automatically read > from that block anyway. By supplying this locality data as part of the znode > name rather than info inside the znode, we avoid reading the data from each > znode while sorting, which is only currently done when a split is claimed and > which is IO intensive. Sorting the string path data is cheap and faster, and > making the final split znode's name longer doesn't seem to matter too much. > By using the BspMaster's InputSplit data to include locality information in > the znode path directly, we also avoid having to access the > FileSystem/BlockLocations directly from either master or workers, which could > also flood the name node with queries. This is the only place I've found > where some locality information is already available to Giraph free of > additional cost. > Finally, by sorting each worker's split list this way, we get the > contention-reduction of GIRAPH-250 for free, since only workers on the same > host will be likely to contend for a split instead of the current situation > in which all workers contend for the same input splits from the same list, > iterating from the same index. GIRAPH-250 has already been logged as reducing > pages of contention on the first pass (when using many 100's of workers) down > to 0-3 contentions before claiming a split to read. > This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If > anyone else could try this on an HDFS cluster where locality info is supplied > to InputSplit objects, I would be really interested to see other folks' > results. -- This
[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper
[ https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425936#comment-13425936 ] Maja Kabiljo commented on GIRAPH-273: - Ok, I'll implement it using just messaging then. Do we have the address of master stored somewhere on the worker? > Aggregators shouldn't use Zookeeper > --- > > Key: GIRAPH-273 > URL: https://issues.apache.org/jira/browse/GIRAPH-273 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We use Zookeeper znodes to transfer aggregated values from workers to master > and back. Zookeeper is supposed to be used for coordination, and it also has > a memory limit which prevents users from having aggregators with large value > objects. These are the reasons why we should implement aggregators gathering > and distribution in a different way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425922#comment-13425922 ] Eli Reisman commented on GIRAPH-275: The locality information is in the InputSplit objects the master works with, taking another run at this, should be possible to do this without querying the file system from the Giraph side after all. > Restore data locality to workers reading InputSplits where possible without > querying NameNode, ZooKeeper > > > Key: GIRAPH-275 > URL: https://issues.apache.org/jira/browse/GIRAPH-275 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph >Affects Versions: 0.2.0 >Reporter: Eli Reisman >Assignee: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-275-1.patch > > > During INPUT_SUPERSTEP, workers wait on a barrier until the master has > created a complete list of available input splits. Once the barrier is past, > each worker iterates through this list of input splits, creating a znode to > lay claim to the next unprocessed split the worker encounters. > For a brief moment while the master is creating the input split znodes each > worker iterates through, it has access to InputSplit objects that also > contain a list of hostnames on which the blocks of the file are hosted. By > including that list of locations in each znode pathname we can allow each > worker reading the list of available splits to sort it so that splits the > worker attempts to claim first are the ones that contain a block that is > local to that worker's host. > This allows the possibility for many workers to end up reading at least one > split that is local to its own host. If the input split selected holds a > local block, the RecordReader Hadoop supplies us with will automatically read > from that block anyway. By supplying this locality data as part of the znode > name rather than info inside the znode, we avoid reading the data from each > znode while sorting, which is only currently done when a split is claimed and > which is IO intensive. Sorting the string path data is cheap and faster, and > making the final split znode's name longer doesn't seem to matter too much. > By using the BspMaster's InputSplit data to include locality information in > the znode path directly, we also avoid having to access the > FileSystem/BlockLocations directly from either master or workers, which could > also flood the name node with queries. This is the only place I've found > where some locality information is already available to Giraph free of > additional cost. > Finally, by sorting each worker's split list this way, we get the > contention-reduction of GIRAPH-250 for free, since only workers on the same > host will be likely to contend for a split instead of the current situation > in which all workers contend for the same input splits from the same list, > iterating from the same index. GIRAPH-250 has already been logged as reducing > pages of contention on the first pass (when using many 100's of workers) down > to 0-3 contentions before claiming a split to read. > This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If > anyone else could try this on an HDFS cluster where locality info is supplied > to InputSplit objects, I would be really interested to see other folks' > results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper
[ https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425864#comment-13425864 ] Eli Reisman commented on GIRAPH-275: Since our IO formats/record readers fake the Hadoop side out when building splits right now, Hadoop is not going to hand us the ready-made locations and offsets we need. I will need to hand-check the block offsets etc. for our input files to ensure the right blocks local to not just the file the split comes from but the split itself are the ones we supply when we are populating the locations info, but that code already exists in Hadoop and can be adapted easily enough. i am also very interested in getting the data into the znode path for now rather than having the master (or worse, every worker) read the znode data to order its split list, but I will do what is needed to make this work, and refine it once it does, if this turns out to be unworkable. I have a couple of other ideas how to minimize the impact of these reads if this turns out to be the case. My main goal right now is to get this to work and run many tests see if it moves the needle on speeding up the INPUT_SUPERSTEP and does in fact lower network throughput at that stage for data load in (if its done right, I suspect it will.) If this helps, I can do various things to make this a more palatable solution at that point. Data locality seems like a real win for Giraph scale out in general where it can be taken advantage of. Will post more soon... > Restore data locality to workers reading InputSplits where possible without > querying NameNode, ZooKeeper > > > Key: GIRAPH-275 > URL: https://issues.apache.org/jira/browse/GIRAPH-275 > Project: Giraph > Issue Type: Improvement > Components: bsp, graph >Affects Versions: 0.2.0 >Reporter: Eli Reisman >Assignee: Eli Reisman > Fix For: 0.2.0 > > Attachments: GIRAPH-275-1.patch > > > During INPUT_SUPERSTEP, workers wait on a barrier until the master has > created a complete list of available input splits. Once the barrier is past, > each worker iterates through this list of input splits, creating a znode to > lay claim to the next unprocessed split the worker encounters. > For a brief moment while the master is creating the input split znodes each > worker iterates through, it has access to InputSplit objects that also > contain a list of hostnames on which the blocks of the file are hosted. By > including that list of locations in each znode pathname we can allow each > worker reading the list of available splits to sort it so that splits the > worker attempts to claim first are the ones that contain a block that is > local to that worker's host. > This allows the possibility for many workers to end up reading at least one > split that is local to its own host. If the input split selected holds a > local block, the RecordReader Hadoop supplies us with will automatically read > from that block anyway. By supplying this locality data as part of the znode > name rather than info inside the znode, we avoid reading the data from each > znode while sorting, which is only currently done when a split is claimed and > which is IO intensive. Sorting the string path data is cheap and faster, and > making the final split znode's name longer doesn't seem to matter too much. > By using the BspMaster's InputSplit data to include locality information in > the znode path directly, we also avoid having to access the > FileSystem/BlockLocations directly from either master or workers, which could > also flood the name node with queries. This is the only place I've found > where some locality information is already available to Giraph free of > additional cost. > Finally, by sorting each worker's split list this way, we get the > contention-reduction of GIRAPH-250 for free, since only workers on the same > host will be likely to contend for a split instead of the current situation > in which all workers contend for the same input splits from the same list, > iterating from the same index. GIRAPH-250 has already been logged as reducing > pages of contention on the first pass (when using many 100's of workers) down > to 0-3 contentions before claiming a split to read. > This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If > anyone else could try this on an HDFS cluster where locality info is supplied > to InputSplit objects, I would be really interested to see other folks' > results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.
Extending Giraph classes from test folder
When I try to extend some of the Giraph classes (Vertex, WorkerContext, etc) from test folder, and run the test in pseudo distributed mode, I get ClassNotFoundException on calling conf.getClass in corresponding BspUtils.createX method. Is there a way around this? Or if not, is there something we can do, since it would be really nice not to have to put things for tests in src (and also remove all those just test classes from src)? Thanks, Maja
[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper
[ https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425681#comment-13425681 ] Gianmarco De Francisci Morales commented on GIRAPH-273: --- IMHO, even as an option it does not make much sense. What is the advantage of persisting (and replicating) aggregators on disk? Especially if they are many and small, HDFS is the worst place. > Aggregators shouldn't use Zookeeper > --- > > Key: GIRAPH-273 > URL: https://issues.apache.org/jira/browse/GIRAPH-273 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We use Zookeeper znodes to transfer aggregated values from workers to master > and back. Zookeeper is supposed to be used for coordination, and it also has > a memory limit which prevents users from having aggregators with large value > objects. These are the reasons why we should implement aggregators gathering > and distribution in a different way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-259) TestBspBasic.testBspPageRank is broken
[ https://issues.apache.org/jira/browse/GIRAPH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425632#comment-13425632 ] Maja Kabiljo commented on GIRAPH-259: - Those errors were introduced with GIRAPH-244 (see https://issues.apache.org/jira/browse/GIRAPH-269, and the other one I didn't want to create a task for since it only happens with RPC) Another thing, I see in master we have: {code} collectAndProcessAggregatorValues(getSuperstep()); runMasterCompute(getSuperstep()); saveAggregatorValues(getSuperstep()); {code} So I guess I got it wrong and that the idea is that vertices should actually see what master did with aggregators in current super step? If so, I'll fix it, and I think this should be documented somewhere. > TestBspBasic.testBspPageRank is broken > -- > > Key: GIRAPH-259 > URL: https://issues.apache.org/jira/browse/GIRAPH-259 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-259-1.patch, GIRAPH-259-2.patch, > GIRAPH-259-3.patch > > > Test crashes on line 152 in class SimplePageRankVertex in distributed mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-259) TestBspBasic.testBspPageRank is broken
[ https://issues.apache.org/jira/browse/GIRAPH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425626#comment-13425626 ] Avery Ching commented on GIRAPH-259: Hi Maja, this looks pretty good, passed my local regressions. But for distributed tests, I got a few failures, can you please take a look? Failed tests: testPartitioners(org.apache.giraph.TestGraphPartitioner) testMutateGraph(org.apache.giraph.TestMutateGraphVertex) Here is the error I saw for testMutateGraph java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) 2012-07-31 01:47:03,730 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.IllegalStateException: run: Caught an unrecoverable exception Call to achingmbp15.local/192.168.1.109:30001 failed on local exception: java.io.IOException: Can't write: (TargetVertexId = 1, value = 0.0) as class org.apache.giraph.graph.Edge at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:663) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.io.IOException: Call to achingmbp15.local/192.168.1.109:30001 failed on local exception: java.io.IOException: Can't write: (TargetVertexId = 1, value = 0.0) as class org.apache.giraph.graph.Edge at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) at org.apache.hadoop.ipc.Client.call(Client.java:1033) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at $Proxy3.addEdge(Unknown Source) at org.apache.giraph.comm.BasicRPCCommunications.addEdgeRequest(BasicRPCCommunications.java:1028) at org.apache.giraph.graph.MutableVertex.addEdgeRequest(MutableVertex.java:110) at org.apache.giraph.examples.SimpleMutateGraphVertex.compute(SimpleMutateGraphVertex.java:93) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:599) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:654) ... 7 more Caused by: java.io.IOException: Can't write: (TargetVertexId = 1, value = 0.0) as class org.apache.giraph.graph.Edge at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:162) at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741) at org.apache.hadoop.ipc.Client.call(Client.java:1011) ... 14 more 2012-07-31 01:47:03,733 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task > TestBspBasic.testBspPageRank is broken > -- > > Key: GIRAPH-259 > URL: https://issues.apache.org/jira/browse/GIRAPH-259 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Attachments: GIRAPH-259-1.patch, GIRAPH-259-2.patch, > GIRAPH-259-3.patch > > > Test crashes on line 152 in class SimplePageRankVertex in distributed mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-274) Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
[ https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425619#comment-13425619 ] Avery Ching commented on GIRAPH-274: I'm not a big fan of the infinite thread. An equivalent approach could be to simply set the timeout (mapred.task.timeout) essentially to infinity. I do think we should call progress when it's expected. The danger in setting the timeout very high is that it will take a while for users to realize their jobs are failed. > Jobs still failing due to tasks timeout during INPUT_SUPERSTEP > -- > > Key: GIRAPH-274 > URL: https://issues.apache.org/jira/browse/GIRAPH-274 > Project: Giraph > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Jaeho Shin >Assignee: Jaeho Shin > Fix For: 0.2.0 > > Attachments: GIRAPH-274.patch > > > Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some > workers don't get to reserve an input split, while others were loading > vertices for a long time. (related to GIRAPH-246 and GIRAPH-267) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper
[ https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425610#comment-13425610 ] Avery Ching commented on GIRAPH-273: I think that as an option, writing to HDFS should be fine, but the default should be in-memory, as writing to HDFS is likely to be a bit slow. Again, moving this out of Zookeeper should improve our scalability a lot, even with say 100k aggregators, this shouldn't be an issue (assuming they are small objects). The master doesn't require a lot of memory for other things, so keeping it in memory should be fine. > Aggregators shouldn't use Zookeeper > --- > > Key: GIRAPH-273 > URL: https://issues.apache.org/jira/browse/GIRAPH-273 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We use Zookeeper znodes to transfer aggregated values from workers to master > and back. Zookeeper is supposed to be used for coordination, and it also has > a memory limit which prevents users from having aggregators with large value > objects. These are the reasons why we should implement aggregators gathering > and distribution in a different way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira