[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426261#comment-13426261
 ] 

Avery Ching commented on GIRAPH-273:


I believe not, since the workers always communicated to the master using 
ZooKeeper in the past.  See BspServiceMaster#becomeMaster(), which describes 
the master protocol to see if it is the master.  BspService#masterElectionPath 
contains the location to the actual master, just need to factor out the common 
code to get the master address out of BspServiceMaster#becomeMaster() and 
perhaps put it in BspService.  Note, you'll also have to add a few new methods 
for communication to the master as none exist currently and start up a Netty 
server there as well.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

2012-07-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426089#comment-13426089
 ] 

Eli Reisman commented on GIRAPH-275:


incidentally, if using tons of workers or tons of data to see a performance 
gain is not your cup of tea, I understand. I'm excited about it because this 
sort of scale out (lots of data + many workers) is exactly the model I'm 
working towards for my internship. As explained in the last post, if thats not 
your goal (or a realistic option on your cluster) this may be of less value 
unless we force the issue of where hadoop puts the GraphMappers. I would be 
excited to attempt to incorporate this into a larger locality fix if there is 
interest as data locality for Giraph has been on my mind for some time now.

Either way, I will run lots more tests on both cases (workers saturating the 
cluster to exploit all possible locality, workers not on many nodes and not as 
likely to encounter the data where they happen to get placed) and post the 
results ASAP.



> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> 
>
> Key: GIRAPH-275
> URL: https://issues.apache.org/jira/browse/GIRAPH-275
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp, graph
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-272) cannot extend Vertex in user code

2012-07-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426008#comment-13426008
 ] 

Eli Reisman commented on GIRAPH-272:


Hey I don't mean to bring up a dead issue, but Allesandro since you are now 
intimately familiar with this part of the code now, would you put something up 
in this thread or on the site wiki explaining a bit about what to subclass, 
what not to, and why, now that the vertex naming scheme etc. is changed and 
cleaned up? I have been asked this same question a lot, given an explanation 
similar to Avery's here, and left the questioners generally unsatisfied. Some 
more detail might be in order just for clarity's sake. Not everyone on the user 
end (around here at least) wants to dig around in the source to find out the 
answers, but they all seem to want to subclass Vertex :)


> cannot extend Vertex in user code
> -
>
> Key: GIRAPH-272
> URL: https://issues.apache.org/jira/browse/GIRAPH-272
> Project: Giraph
>  Issue Type: Bug
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Joseph Adler
>Priority: Critical
>
> As currently written, it's not possible to create a class that extends Vertex 
> in user code. I think this is because the abstract methods putMessages and 
> releaseResources  are abstract and neither public nor protected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

2012-07-31 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated GIRAPH-275:
---

Attachment: GIRAPH-275-2.patch

OK, this patch actually works just as explained in the original Description, no 
extra reads to ZK or FileSystem needed, and no harm done if the location info 
doesn't exist (as in not using HDFS in your run) other than a List sort per 
worker. One observation to note as I have been testing this:

This is "poor man's locality" since we have no control over where Hadoop puts 
our workers. In Hadoop, they can go where the data is given the location info 
we have to work with. In Giraph, the workers go wherever Hadoop puts them and 
we are just hoping some data blocks are also local.

So...increasing the # of workers or the amount of data you run helps a lot. I 
saw performance improvements even when my # of workers per job was much lower 
than total # of cluster nodes where data could be located,but I expect the best 
performance will be when this ratio is closer to 1.0

The ultimate locality fix will involve major overhauls to how Giraph gets 
InputFormat/RecordReader and does task submission but might be too much until 
we're on YARN (or not worth it until we're on YARN.) This might be as good as 
it gets as far as minimal changes to the code and low-cost to get some 
locality. So it might be a nice hold-over while we still ride directly on 
Hadoop.

I will run more tests to measure performance/memory gains in more detail and 
see what we really have here. If you do, please post results, small clusters 
should benefit as much as large ones (to the extent that there is a benefit.) 
so I'd love to see you results.


> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> 
>
> Key: GIRAPH-275
> URL: https://issues.apache.org/jira/browse/GIRAPH-275
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp, graph
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-275-1.patch, GIRAPH-275-2.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This 

[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425936#comment-13425936
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

Ok, I'll implement it using just messaging then.

Do we have the address of master stored somewhere on the worker?

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

2012-07-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425922#comment-13425922
 ] 

Eli Reisman commented on GIRAPH-275:


The locality information is in the InputSplit objects the master works with, 
taking another run at this, should be possible to do this without querying the 
file system from the Giraph side after all.


> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> 
>
> Key: GIRAPH-275
> URL: https://issues.apache.org/jira/browse/GIRAPH-275
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp, graph
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-275-1.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

2012-07-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425864#comment-13425864
 ] 

Eli Reisman commented on GIRAPH-275:


Since our IO formats/record readers fake the Hadoop side out when building 
splits right now, Hadoop is not going to hand us the ready-made locations and 
offsets we need. I will need to hand-check the block offsets etc. for our input 
files to ensure the right blocks local to not just the file the split comes 
from but the split itself are the ones we supply when we are populating the 
locations info, but that code already exists in Hadoop and can be adapted 
easily enough. i am also very interested in getting the data into the znode 
path for now rather than having the master (or worse, every worker) read the 
znode data to order its split list, but I will do what is needed to make this 
work, and refine it once it does, if this turns out to be unworkable. I have a 
couple of other ideas how to minimize the impact of these reads if this turns 
out to be the case.

My main goal right now is to get this to work and run many tests see if it 
moves the needle on speeding up the INPUT_SUPERSTEP and does in fact lower 
network throughput at that stage for data load in (if its done right, I suspect 
it will.) If this helps, I can do various things to make this a more palatable 
solution at that point. Data locality seems like a real win for Giraph scale 
out in general where it can be taken advantage of. Will post more soon...



> Restore data locality to workers reading InputSplits where possible without 
> querying NameNode, ZooKeeper
> 
>
> Key: GIRAPH-275
> URL: https://issues.apache.org/jira/browse/GIRAPH-275
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp, graph
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
> Fix For: 0.2.0
>
> Attachments: GIRAPH-275-1.patch
>
>
> During INPUT_SUPERSTEP, workers wait on a barrier until the master has 
> created a complete list of available input splits. Once the barrier is past, 
> each worker iterates through this list of input splits, creating a znode to 
> lay claim to the next unprocessed split the worker encounters.
> For a brief moment while the master is creating the input split znodes each 
> worker iterates through, it has access to InputSplit objects that also 
> contain a list of hostnames on which the blocks of the file are hosted. By 
> including that list of locations in each znode pathname we can allow each 
> worker reading the list of available splits to sort it so that splits the 
> worker attempts to claim first are the ones that contain a block that is 
> local to that worker's host.
> This allows the possibility for many workers to end up reading at least one 
> split that is local to its own host. If the input split selected holds a 
> local block, the RecordReader Hadoop supplies us with will automatically read 
> from that block anyway. By supplying this locality data as part of the znode 
> name rather than info inside the znode, we avoid reading the data from each 
> znode while sorting, which is only currently done when a split is claimed and 
> which is IO intensive. Sorting the string path data is cheap and faster, and 
> making the final split znode's name longer doesn't seem to matter too much.
> By using the BspMaster's InputSplit data to include locality information in 
> the znode path directly, we also avoid having to access the 
> FileSystem/BlockLocations directly from either master or workers, which could 
> also flood the name node with queries. This is the only place I've found 
> where some locality information is already available to Giraph free of 
> additional cost.
> Finally, by sorting each worker's split list this way, we get the 
> contention-reduction of GIRAPH-250 for free, since only workers on the same 
> host will be likely to contend for a split instead of the current situation 
> in which all workers contend for the same input splits from the same list, 
> iterating from the same index. GIRAPH-250 has already been logged as reducing 
> pages of contention on the first pass (when using many 100's of workers) down 
> to 0-3 contentions before claiming a split to read.
> This passes 'mvn verify' etc. I will post results of cluster testing ASAP. If 
> anyone else could try this on an HDFS cluster where locality info is supplied 
> to InputSplit objects, I would be really interested to see other folks' 
> results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.

Extending Giraph classes from test folder

2012-07-31 Thread Maja Kabiljo
When I try to extend some of the Giraph classes (Vertex, WorkerContext, etc) 
from test folder, and run the test in pseudo distributed mode, I get 
ClassNotFoundException on calling conf.getClass in corresponding 
BspUtils.createX method.

Is there a way around this? Or if not, is there something we can do, since it 
would be really nice not to have to put things for tests in src (and also 
remove all those just test classes from src)?

Thanks,
Maja


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425681#comment-13425681
 ] 

Gianmarco De Francisci Morales commented on GIRAPH-273:
---

IMHO, even as an option it does not make much sense.
What is the advantage of persisting (and replicating) aggregators on disk?
Especially if they are many and small, HDFS is the worst place.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-259) TestBspBasic.testBspPageRank is broken

2012-07-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425632#comment-13425632
 ] 

Maja Kabiljo commented on GIRAPH-259:
-

Those errors were introduced with GIRAPH-244 (see 
https://issues.apache.org/jira/browse/GIRAPH-269, and the other one I didn't 
want to create a task for since it only happens with RPC)

Another thing, I see in master we have:
{code}
collectAndProcessAggregatorValues(getSuperstep());
runMasterCompute(getSuperstep());
saveAggregatorValues(getSuperstep());
{code}
So I guess I got it wrong and that the idea is that vertices should actually 
see what master did with aggregators in current super step? If so, I'll fix it, 
and I think this should be documented somewhere.

> TestBspBasic.testBspPageRank is broken
> --
>
> Key: GIRAPH-259
> URL: https://issues.apache.org/jira/browse/GIRAPH-259
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-259-1.patch, GIRAPH-259-2.patch, 
> GIRAPH-259-3.patch
>
>
> Test crashes on line 152 in class SimplePageRankVertex in distributed mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-259) TestBspBasic.testBspPageRank is broken

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425626#comment-13425626
 ] 

Avery Ching commented on GIRAPH-259:


Hi Maja, this looks pretty good, passed my local regressions.  But for 
distributed tests, I got a few failures, can you please take a look?

Failed tests: 
  testPartitioners(org.apache.giraph.TestGraphPartitioner)
  testMutateGraph(org.apache.giraph.TestMutateGraphVertex)

Here is the error I saw for testMutateGraph

java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2012-07-31 01:47:03,730 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalStateException: run: Caught an unrecoverable exception Call to 
achingmbp15.local/192.168.1.109:30001 failed on local exception: 
java.io.IOException: Can't write: (TargetVertexId = 1, value = 0.0) as class 
org.apache.giraph.graph.Edge
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:663)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.IOException: Call to achingmbp15.local/192.168.1.109:30001 
failed on local exception: java.io.IOException: Can't write: (TargetVertexId = 
1, value = 0.0) as class org.apache.giraph.graph.Edge
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy3.addEdge(Unknown Source)
at 
org.apache.giraph.comm.BasicRPCCommunications.addEdgeRequest(BasicRPCCommunications.java:1028)
at 
org.apache.giraph.graph.MutableVertex.addEdgeRequest(MutableVertex.java:110)
at 
org.apache.giraph.examples.SimpleMutateGraphVertex.compute(SimpleMutateGraphVertex.java:93)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:599)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:654)
... 7 more
Caused by: java.io.IOException: Can't write: (TargetVertexId = 1, value = 0.0) 
as class org.apache.giraph.graph.Edge
at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:162)
at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
at org.apache.hadoop.ipc.Client.call(Client.java:1011)
... 14 more
2012-07-31 01:47:03,733 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

> TestBspBasic.testBspPageRank is broken
> --
>
> Key: GIRAPH-259
> URL: https://issues.apache.org/jira/browse/GIRAPH-259
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-259-1.patch, GIRAPH-259-2.patch, 
> GIRAPH-259-3.patch
>
>
> Test crashes on line 152 in class SimplePageRankVertex in distributed mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-274) Jobs still failing due to tasks timeout during INPUT_SUPERSTEP

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425619#comment-13425619
 ] 

Avery Ching commented on GIRAPH-274:


I'm not a big fan of the infinite thread.  An equivalent approach could be to 
simply set the timeout (mapred.task.timeout) essentially to infinity.  I do 
think we should call progress when it's expected.  The danger in setting the 
timeout very high is that it will take a while for users to realize their jobs 
are failed.

> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --
>
> Key: GIRAPH-274
> URL: https://issues.apache.org/jira/browse/GIRAPH-274
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Jaeho Shin
>Assignee: Jaeho Shin
> Fix For: 0.2.0
>
> Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some 
> workers don't get to reserve an input split, while others were loading 
> vertices for a long time.  (related to GIRAPH-246 and GIRAPH-267)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425610#comment-13425610
 ] 

Avery Ching commented on GIRAPH-273:


I think that as an option, writing to HDFS should be fine, but the default 
should be in-memory, as writing to HDFS is likely to be a bit slow.  Again, 
moving this out of Zookeeper should improve our scalability a lot, even with 
say 100k aggregators, this shouldn't be an issue (assuming they are small 
objects).  The master doesn't require a lot of memory for other things, so 
keeping it in memory should be fine.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira