[jira] [Commented] (GIRAPH-191) Random Walks on Graphs

2012-05-18 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279092#comment-13279092
 ] 

Sebastian Schelter commented on GIRAPH-191:
---

I meant the code in the patch, sry.



> Random Walks on Graphs
> --
>
> Key: GIRAPH-191
> URL: https://issues.apache.org/jira/browse/GIRAPH-191
> Project: Giraph
>  Issue Type: New Feature
>  Components: examples
>Affects Versions: 0.2.0
>Reporter: Gianmarco De Francisci Morales
> Attachments: GIRAPH-191-1.patch, GIRAPH-191.patch
>
>
> Implementing RWR on Giraph should be a very simple modification of the 
> SimplePageRankVertex code.
> {code}
> if ( myID == sourceID )
>   DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
> else
>   DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
> {code}
> It would be nice to make it as configurable as possible by using parametric 
> damping factors, preference vectors, strongly preferential, etc...
> More or less along these lines:
> http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-191) Random Walks on Graphs

2012-05-18 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279089#comment-13279089
 ] 

Sebastian Schelter commented on GIRAPH-191:
---

I tested PageRank in the test with 200M vertices on 6 machines without
problems.
Am 18.05.2012 19:59 schrieb "Gianmarco De Francisci Morales (JIRA)" <



> Random Walks on Graphs
> --
>
> Key: GIRAPH-191
> URL: https://issues.apache.org/jira/browse/GIRAPH-191
> Project: Giraph
>  Issue Type: New Feature
>  Components: examples
>Affects Versions: 0.2.0
>Reporter: Gianmarco De Francisci Morales
> Attachments: GIRAPH-191-1.patch, GIRAPH-191.patch
>
>
> Implementing RWR on Giraph should be a very simple modification of the 
> SimplePageRankVertex code.
> {code}
> if ( myID == sourceID )
>   DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
> else
>   DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
> {code}
> It would be nice to make it as configurable as possible by using parametric 
> damping factors, preference vectors, strongly preferential, etc...
> More or less along these lines:
> http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-191) Random Walks on Graphs

2012-05-18 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-191:
--

Attachment: GIRAPH-191-1.patch

> Random Walks on Graphs
> --
>
> Key: GIRAPH-191
> URL: https://issues.apache.org/jira/browse/GIRAPH-191
> Project: Giraph
>  Issue Type: New Feature
>  Components: examples
>Affects Versions: 0.2.0
>Reporter: Gianmarco De Francisci Morales
> Attachments: GIRAPH-191-1.patch, GIRAPH-191.patch
>
>
> Implementing RWR on Giraph should be a very simple modification of the 
> SimplePageRankVertex code.
> {code}
> if ( myID == sourceID )
>   DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
> else
>   DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
> {code}
> It would be nice to make it as configurable as possible by using parametric 
> damping factors, preference vectors, strongly preferential, etc...
> More or less along these lines:
> http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-191) Random Walks on Graphs

2012-05-18 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-191:
--

  Component/s: examples
Affects Version/s: 0.2.0
  Summary: Random Walks on Graphs  (was: Random Walk with Restart)

> Random Walks on Graphs
> --
>
> Key: GIRAPH-191
> URL: https://issues.apache.org/jira/browse/GIRAPH-191
> Project: Giraph
>  Issue Type: New Feature
>  Components: examples
>Affects Versions: 0.2.0
>Reporter: Gianmarco De Francisci Morales
> Attachments: GIRAPH-191.patch
>
>
> Implementing RWR on Giraph should be a very simple modification of the 
> SimplePageRankVertex code.
> {code}
> if ( myID == sourceID )
>   DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
> else
>   DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
> {code}
> It would be nice to make it as configurable as possible by using parametric 
> damping factors, preference vectors, strongly preferential, etc...
> More or less along these lines:
> http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-191) Random Walk with Restart

2012-05-17 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-191:
--

Attachment: GIRAPH-191.patch

A first draft for the code. It contains an abstract RandomWalkVertex which 
PageRank, RWR and others can extend.

> Random Walk with Restart
> 
>
> Key: GIRAPH-191
> URL: https://issues.apache.org/jira/browse/GIRAPH-191
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Gianmarco De Francisci Morales
> Attachments: GIRAPH-191.patch
>
>
> Implementing RWR on Giraph should be a very simple modification of the 
> SimplePageRankVertex code.
> {code}
> if ( myID == sourceID )
>   DoubleWritable vertexValue = new DoubleWritable((0.15f + 0.85f * sum);
> else
>   DoubleWritable vertexValue = new DoubleWritable(0.85f * sum);
> {code}
> It would be nice to make it as configurable as possible by using parametric 
> damping factors, preference vectors, strongly preferential, etc...
> More or less along these lines:
> http://law.dsi.unimi.it/software/docs/it/unimi/dsi/law/rank/PageRank.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Giraph is now an Apache top level project

2012-05-16 Thread Sebastian Schelter
Great news !
Am 16.05.2012 22:41 schrieb "Avery Ching" :

> Thanks Owen for helping us through this process!
>
> Avery
>
> On 5/16/12 1:33 PM, Owen O'Malley wrote:
>
>> Today the Apache board voted to graduate Giraph to a top level
>> project. Congratulations, all!
>>
>> -- Owen
>>
>
>


Re: Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-10 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/
---

(Updated 2012-05-10 09:32:10.140325)


Review request for giraph.


Changes
---

Updated the patch to reflect Avery's comments.

The line FileUtils:87 has to be kept, we initially delete the new file (if it 
existed) as it will be recreated later.


Summary
---

All temporary files that the tests generate are now written to 
/tmp/_giraphTests including zooKeeper files, checkpoints etc. 

This behavior will be automatically configured whenever 
InternalVertexRunner.run() or BspCase.prepareJob() is used.

Usually I can't stop myself once I have my refactoring hat on, therefore I also 
tidied up a lot of minor stuff, removed code duplications etc.


This addresses bug GIRAPH-20.
https://issues.apache.org/jira/browse/GIRAPH-20


Diffs (updated)
-

  trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
1336504 
  trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1336504 
  trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 1336504 
  trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 
  trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1336504 
  trunk/src/test/java/org/apache/giraph/BspCase.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1336506 
  trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1336506 
  trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1336504 
  trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1336504 
  trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1336504 

Diff: https://reviews.apache.org/r/5077/diff


Testing
---

successfully passed local and pseudo-distributed tests with Hadoop 0.20.203


Thanks,

Sebastian



[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph

2012-05-10 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272209#comment-13272209
 ] 

Sebastian Schelter commented on GIRAPH-141:
---

No need to excuse.

Maybe we simply misunderstand each other. I just wanted to say that it might be 
a lot of effort to implement real multigraph support deep into the system as a 
lot of special cases might have to be kept in mind when doing this, e.g. when 
mutating or partitioning the graph. I just wanted to say that we should first 
search for simple ways to add multigraph support transparently on top of what 
we already have.

> mulitgraph support in giraph
> 
>
> Key: GIRAPH-141
> URL: https://issues.apache.org/jira/browse/GIRAPH-141
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Reporter: André Kelpe
>
> The current vertex API only supports simple graphs, meaning that there can 
> only ever be one edge between two vertices. Many graphs like the road network 
> are in fact multigraphs, where many edges can connect two vertices at the 
> same time.
> Support for this could be added by introducing an Iterator 
> getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
> Connector between the edge and the vertex is also a good idea, so that you 
> could do something like:
> {code} 
> for (final Connector conn: getEdgeValues(){
>  final EdgeWritable edge = conn.getEdge();
>  final VertexWritable otherVertex = conn.getOther();
>  doInterestingStuff(otherVertex);
>  doMoreInterestingStuff(edge);
> }
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph

2012-05-10 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272189#comment-13272189
 ] 

Sebastian Schelter commented on GIRAPH-141:
---

I don't think it would be a good idea to force all users to use multigraphs. 
Furthermore a good principle of software engineering is to always search for 
the simplest solution. I'm not sure whether it would really be so easy to 
implement real multigraph support in the whole system.

Why not have a base vertex class that transparently (!) wraps multigraphs into 
simple graphs?

> mulitgraph support in giraph
> 
>
> Key: GIRAPH-141
> URL: https://issues.apache.org/jira/browse/GIRAPH-141
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Reporter: André Kelpe
>
> The current vertex API only supports simple graphs, meaning that there can 
> only ever be one edge between two vertices. Many graphs like the road network 
> are in fact multigraphs, where many edges can connect two vertices at the 
> same time.
> Support for this could be added by introducing an Iterator 
> getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
> Connector between the edge and the vertex is also a good idea, so that you 
> could do something like:
> {code} 
> for (final Connector conn: getEdgeValues(){
>  final EdgeWritable edge = conn.getEdge();
>  final VertexWritable otherVertex = conn.getOther();
>  doInterestingStuff(otherVertex);
>  doMoreInterestingStuff(edge);
> }
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-10 Thread Sebastian Schelter


> On 2012-05-10 06:57:01, Avery Ching wrote:
> > Overall, looks great.  Can you address the questions/comments and then I'll 
> > re-review?

Thanks for the quick review!

I'll address your comments, merge this with the current trunk and posted a new 
patch.


> On 2012-05-10 06:57:01, Avery Ching wrote:
> > trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java, 
> > lines 85-87
> > <https://reviews.apache.org/r/5077/diff/1/?file=108155#file108155line85>
> >
> > Just out of curiosity, why this change?

If one uses writeUTF() and then reads the resulting file with a buffered 
reader, each line starts with a broken char. Directly writing the bytes out 
solved this.


> On 2012-05-10 06:57:01, Avery Ching wrote:
> > trunk/src/main/java/org/apache/giraph/utils/FileUtils.java, line 87
> > <https://reviews.apache.org/r/5077/diff/1/?file=108156#file108156line87>
> >
> > Why delete it?

I'll remove this.


> On 2012-05-10 06:57:01, Avery Ching wrote:
> > trunk/src/test/java/org/apache/giraph/TestBspBasic.java, line 242
> > <https://reviews.apache.org/r/5077/diff/1/?file=108160#file108160line242>
> >
> > shouldn't it be 49 not 491?

its a small L not a 1 :)


- Sebastian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/#review7756
---


On 2012-05-09 11:37:47, Sebastian Schelter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5077/
> ---
> 
> (Updated 2012-05-09 11:37:47)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> All temporary files that the tests generate are now written to 
> /tmp/_giraphTests including zooKeeper files, checkpoints etc. 
> 
> This behavior will be automatically configured whenever 
> InternalVertexRunner.run() or BspCase.prepareJob() is used.
> 
> Usually I can't stop myself once I have my refactoring hat on, therefore I 
> also tidied up a lot of minor stuff, removed code duplications etc.
> 
> 
> This addresses bug GIRAPH-20.
> https://issues.apache.org/jira/browse/GIRAPH-20
> 
> 
> Diffs
> -
> 
>   trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1332106 
>   trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1332106 
>   trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1332106 
>   trunk/src/test/java/org/apache/giraph/BspCase.java 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1332106 
>   trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 
> 1332106 
>   trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5077/diff
> 
> 
> Testing
> ---
> 
> successfully passed local and pseudo-distributed tests with Hadoop 0.20.203
> 
> 
> Thanks,
> 
> Sebastian
> 
>



Re: Review Request: Implemented a netty client/server protocol a a faster alternative to Hadoop RPC (3x improvement)

2012-05-09 Thread Sebastian Schelter
I agree that we should try to not give out the map, there should a
method to get a single entry, a method to get an Iterable for all
entries and a method to clear it.

--sebastian

On 09.05.2012 18:28, Avery Ching wrote:
> 
> 
>> On 2012-05-09 10:10:46, Sebastian Schelter wrote:
>>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java,
>>>  line 1465
>>> <https://reviews.apache.org/r/5074/diff/2/?file=108120#file108120line1465>
>>>
>>> I don't like it that a collection is changed outside of the class that 
>>> owns it. 
>>> 
>>> This makes code hard to read and debug. We should rather introduce a 
>>> method for this in the class that owns this map to have all mutations in 
>>> one place.
> 
> Good point, it's a little heard to understand.  Since this is a Map, we can 
> do as you suggested, keep it in a class and then add a method to do the 
> clear().  We can even add calls to do the methods that iterate over the map 
> as well to not have to do any synchronization outside of the map.  I'll do 
> this for all our synchronized objects in the next patch if that's okay with 
> you (the current code does this as well).  It will be a somewhat medium sized 
> change.
> 
> 
> - Avery
> 
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5074/#review7728
> ---
> 
> 
> On 2012-05-09 09:22:36, Avery Ching wrote:
>>
>> ---
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/5074/
>> ---
>>
>> (Updated 2012-05-09 09:22:36)
>>
>>
>> Review request for giraph.
>>
>>
>> Summary
>> ---
>>
>> * Implemented a request/response protocol with netty as a NettyClient and 
>> NettyServer.  There is a NettyClientWorker and NettyClientServer that 
>> implements WorkerClient and WorkerServer, respectively.  Netty is a lot 
>> faster since it's non-blocking and we can interleave computation and 
>> communication as opposed to Hadoop RPC (blocking).
>> * The netty server implementation uses concurrent hash maps to improved 
>> concurrency instead of synchronized blocks around maps.
>> * By default netty is used, but Hadoop RPC can be used with 
>> -Dgiraph.useNetty=false
>> * Changed the class hierarchy of ServerInterface to WorkerClientServer 
>> (WorkerClient and WorkerServer) to support a request/response protocol 
>> instead of just RPC
>> * In netty, the messages/mutations are gathered by partition and send out as 
>> a partition's worth of messages/mutations
>> * Added two new test classes (RequestTest.java and ConnectionTest.java) to 
>> test all requests and check netty connections.
>> * PageRankBenchmark uses EdgeListVertex as a default
>>
>>
>> This addresses bug GIRAPH-37.
>> https://issues.apache.org/jira/browse/GIRAPH-37
>>
>>
>> Diffs
>> -
>>
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
>>  PRE-CREATION 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
>>  PRE-CREATION 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
>>  1332888 
>>   
>> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src

Review Request: GIRAPH-20 Move temporary test files from the project directory

2012-05-09 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5077/
---

Review request for giraph.


Summary
---

All temporary files that the tests generate are now written to 
/tmp/_giraphTests including zooKeeper files, checkpoints etc. 

This behavior will be automatically configured whenever 
InternalVertexRunner.run() or BspCase.prepareJob() is used.

Usually I can't stop myself once I have my refactoring hat on, therefore I also 
tidied up a lot of minor stuff, removed code duplications etc.


This addresses bug GIRAPH-20.
https://issues.apache.org/jira/browse/GIRAPH-20


Diffs
-

  trunk/src/test/java/org/apache/giraph/TestZooKeeperExt.java 1332106 
  trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestNotEnoughMapTasks.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java 1332106 
  trunk/src/test/java/org/apache/giraph/TestBspBasic.java 1332106 
  trunk/src/test/java/org/apache/giraph/BspCase.java 1332106 
  trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1332106 
  trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
1332106 
  trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1332106 
  trunk/src/main/java/org/apache/giraph/graph/TextAggregatorWriter.java 1332106 
  trunk/src/main/java/org/apache/giraph/utils/FileUtils.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5077/diff


Testing
---

successfully passed local and pseudo-distributed tests with Hadoop 0.20.203


Thanks,

Sebastian



Re: Review Request: Implemented a netty client/server protocol a a faster alternative to Hadoop RPC (3x improvement)

2012-05-09 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/#review7728
---

Ship it!


I went through the code (although I don't have much experience with networking 
code), everything looks very well.

I tested this patch by computing the connected components of the undirected 
wikipedia pagelink graph (6M vertices, 250M edges) on a 6 machine cluster. 
Everything went fine and I even saw a small improvement in runtime although the 
job only takes 4 minutes.




http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java


I don't like it that a collection is changed outside of the class that owns 
it. 

This makes code hard to read and debug. We should rather introduce a method 
for this in the class that owns this map to have all mutations in one place.


- Sebastian


On 2012-05-09 09:22:36, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5074/
> ---
> 
> (Updated 2012-05-09 09:22:36)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> * Implemented a request/response protocol with netty as a NettyClient and 
> NettyServer.  There is a NettyClientWorker and NettyClientServer that 
> implements WorkerClient and WorkerServer, respectively.  Netty is a lot 
> faster since it's non-blocking and we can interleave computation and 
> communication as opposed to Hadoop RPC (blocking).
> * The netty server implementation uses concurrent hash maps to improved 
> concurrency instead of synchronized blocks around maps.
> * By default netty is used, but Hadoop RPC can be used with 
> -Dgiraph.useNetty=false
> * Changed the class hierarchy of ServerInterface to WorkerClientServer 
> (WorkerClient and WorkerServer) to support a request/response protocol 
> instead of just RPC
> * In netty, the messages/mutations are gathered by partition and send out as 
> a partition's worth of messages/mutations
> * Added two new test classes (RequestTest.java and ConnectionTest.java) to 
> test all requests and check netty connections.
> * PageRankBenchmark uses EdgeListVertex as a default
> 
> 
> This addresses bug GIRAPH-37.
> https://issues.apache.org/jira/browse/GIRAPH-37
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
>  1332888 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
>  1332888 
>   
> http://s

[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph

2012-05-08 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270336#comment-13270336
 ] 

Sebastian Schelter commented on GIRAPH-141:
---

We could also think about a way to make a single Edge between two vertices 
represent the whole set of edges between these. This would be more like a 
"multivalued" edge, but would allow easier integration with the current API.

I rarely work with multigraphs, so I don't know whether this approach is 
feasible or would be a quirky hack :)

> mulitgraph support in giraph
> 
>
> Key: GIRAPH-141
> URL: https://issues.apache.org/jira/browse/GIRAPH-141
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Reporter: André Kelpe
>
> The current vertex API only supports simple graphs, meaning that there can 
> only ever be one edge between two vertices. Many graphs like the road network 
> are in fact multigraphs, where many edges can connect two vertices at the 
> same time.
> Support for this could be added by introducing an Iterator 
> getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
> Connector between the edge and the vertex is also a good idea, so that you 
> could do something like:
> {code} 
> for (final Connector conn: getEdgeValues(){
>  final EdgeWritable edge = conn.getEdge();
>  final VertexWritable otherVertex = conn.getOther();
>  doInterestingStuff(otherVertex);
>  doMoreInterestingStuff(edge);
> }
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [DISCUSS] Giraph graduation resolution

2012-05-07 Thread Sebastian Schelter
+1 from me too.

On 07.05.2012 12:12, Claudio Martella wrote:
> +1 from me as well
> 
> On Mon, May 7, 2012 at 12:11 AM, Eugene Koontz  wrote:
>> +1 for me on the resolution text.
>> -Eugene
> 
> 
> 



[jira] [Commented] (GIRAPH-20) Move temporary test files from the project directory

2012-05-04 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268364#comment-13268364
 ] 

Sebastian Schelter commented on GIRAPH-20:
--

the patch still misses a single file, I'll provide an updated one soon.

> Move temporary test files from the project directory
> 
>
> Key: GIRAPH-20
> URL: https://issues.apache.org/jira/browse/GIRAPH-20
> Project: Giraph
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.2.0
>Reporter: Owen O'Malley
>Assignee: Sebastian Schelter
> Attachments: GIRAPH-20.patch
>
>
> We shouldn't use the project directory as the location for temporary files 
> generated by the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-20) Move temporary test files from the project directory

2012-05-04 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-20:
-

Attachment: GIRAPH-20.patch

> Move temporary test files from the project directory
> 
>
> Key: GIRAPH-20
> URL: https://issues.apache.org/jira/browse/GIRAPH-20
> Project: Giraph
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.2.0
>Reporter: Owen O'Malley
>    Assignee: Sebastian Schelter
> Attachments: GIRAPH-20.patch
>
>
> We shouldn't use the project directory as the location for temporary files 
> generated by the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-20) Move temporary test files from the project directory

2012-05-04 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter reassigned GIRAPH-20:


Assignee: Sebastian Schelter

> Move temporary test files from the project directory
> 
>
> Key: GIRAPH-20
> URL: https://issues.apache.org/jira/browse/GIRAPH-20
> Project: Giraph
>  Issue Type: Improvement
>  Components: test
>Reporter: Owen O'Malley
>    Assignee: Sebastian Schelter
>
> We shouldn't use the project directory as the location for temporary files 
> generated by the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Out-of-core messaging

2012-05-03 Thread Sebastian Schelter
Hi Claudio,

Great to hear that!

Please send me the 4-liner, maybe I (or my colleagues) can be helpful!

'Out-of-core messaging' would be a great topic for the BB workshop, I'll
keep that in mind :)

Best,
Sebastian



On 03.05.2012 16:03, Claudio Martella wrote:
> Hi Sebastian,
> 
> I definitely agree with you on this one.
> 
> I'm currently working on it, but I'm kind of stuck with a small bug to
> be accounted to some concurrency we can't understand (I have a 4
> liners that can reproduce it, if you want to help out). Avery and I
> are currently discussing on the possibility to write a paper on the
> solution, so hopefully I should be able to let you know better in a
> couple of weeks.
> 
> 
> On Thu, May 3, 2012 at 3:44 PM, Sebastian Schelter  wrote:
>> Hi,
>>
>> I'd like to ask whether someone is currently working on out-of-core
>> messaging for Giraph (e.g. by spilling messages to disk in case of
>> memory pressure).
>>
>> I ran some experiments with Giraph on a small 6-machine cluster and got
>> really nice results for smaller datasets such as the wikipedia pagelink
>> graph (6M vertices, ~250M edges in its undirected version).
>>
>> For larger graphs with a even more skewed degree distribution such as
>> the twitter follower graph from [1], Giraph crashes in the first
>> superstep unfortunately. My colleagues observed the same, when they ran
>> benchmarks of Giraph against the Stratosphere system [2], where Giraph
>> did kind of well for small datasets, but again crashed for larger ones...
>>
>> I think the lack of out-of-core messages is currently the biggest
>> obstacle to recommending people to test Giraph in production use.
>>
>> Best,
>> Sebastian
>>
>>
>> [1] http://konect.uni-koblenz.de/networks/twitter
>> [2] http://www.stratosphere.eu/
> 
> 
> 



Out-of-core messaging

2012-05-03 Thread Sebastian Schelter
Hi,

I'd like to ask whether someone is currently working on out-of-core
messaging for Giraph (e.g. by spilling messages to disk in case of
memory pressure).

I ran some experiments with Giraph on a small 6-machine cluster and got
really nice results for smaller datasets such as the wikipedia pagelink
graph (6M vertices, ~250M edges in its undirected version).

For larger graphs with a even more skewed degree distribution such as
the twitter follower graph from [1], Giraph crashes in the first
superstep unfortunately. My colleagues observed the same, when they ran
benchmarks of Giraph against the Stratosphere system [2], where Giraph
did kind of well for small datasets, but again crashed for larger ones...

I think the lack of out-of-core messages is currently the biggest
obstacle to recommending people to test Giraph in production use.

Best,
Sebastian


[1] http://konect.uni-koblenz.de/networks/twitter
[2] http://www.stratosphere.eu/


Re: Please welcome our newest committer and PMC member, Eugene!

2012-05-01 Thread Sebastian Schelter
Welcome Eugene!

On 02.05.2012 03:28, Eugene Koontz wrote:
> Thank you Hyunsik and Jakob! I'm really looking forward to working with
> you all!
> 
> -Eugene
> 
> On 5/1/12 5:18 PM, Hyunsik Choi wrote:
>> Congrats and welcome Eugene!
>> I'm looking forward to your contribution.
>>
>> --
>> Hyunsik Choi
>>
>> On Wed, May 2, 2012 at 5:39 AM, Jakob Homan  wrote:
>>
>>> I'm happy to announce that the Giraph PMC has voted Eugene Koontz in
>>> as a committer and PMC member.  Eugene has been pitching in with great
>>> patches that have been very useful, such as helping us sort out our
>>> terrifying munging situation (GIRAPH-168).
>>>
>>> Welcome aboard, Eugene!
>>>
>>> -Jakob
>>>
>>
> 



[jira] [Commented] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

2012-04-19 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257702#comment-13257702
 ] 

Sebastian Schelter commented on GIRAPH-170:
---

??Independent of the number of workers, my Giraph job only uses about 30% of a 
24 node machine. And I would like to utilise all available processing 
resources.??

It surprises me, that you don't get a higher load. If you configure your 
cluster to use one worker/map instance per core you should get a much higher 
CPU load. Could it be that either the cluster is too powerful for your graph or 
that your algorithm doesn't work on the whole graph all the time?

> Workflow for loading RDF graph data into Giraph
> ---
>
> Key: GIRAPH-170
> URL: https://issues.apache.org/jira/browse/GIRAPH-170
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Dan Brickley
>Priority: Minor
>
> W3C RDF provides a family of Web standards for exchanging graph-based data. 
> RDF uses sets of simple binary relationships, labeling nodes and links with 
> Web identifiers (URIs). Many public datasets are available as RDF, including 
> the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many 
> such datasets are listed at http://thedatahub.org/
> RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
> line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
> language is Turtle. Apache Jena and Any23 provide software to handle all 
> these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/
> This JIRA leaves open the strategy for loading RDF data into Giraph. There 
> are various possibilites, including exploitation of intermediate 
> Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a 
> more Giraph-friendly form, or writing custom loaders. Even a HOWTO document 
> or implementor notes here would be an advance on the current state of the 
> art. The BluePrints Graph API (Gremlin etc.) has also been aligned with 
> various RDF datasources.
> Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
> touches on the issue (since we can't currently easily represent fully general 
> RDF graphs since two nodes might be connected by more than one typed edge). 
> Even without multigraphs it ought to be possible to bring RDF-sourced data
> into Giraph, e.g. perhaps some app is only interested in say the Movies + 
> People subset of a big RDF collection.
> From Avery in email: "a helper VertexInputFormat (and maybe 
> VertexOutputFormat) would certainly [despite GIRAPH-141] still help"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [DISCUSS] Giraph Graduation (was Re: Giraph status (Was: [Incubator Wiki] Update of "April2012" by OwenOmalley))

2012-04-13 Thread Sebastian Schelter
+1 from me too.


On 13.04.2012 18:49, Jake Mannix wrote:
> +1 from me, wish I'd been able to contribute more in the past 6mo or so...
> 
> On Fri, Apr 13, 2012 at 4:45 AM, Claudio Martella <
> claudio.marte...@gmail.com> wrote:
> 
>> +1.
>>
>> --
>>Claudio Martella
>>claudio.marte...@gmail.com
>>
> 
> 
> 



[jira] [Commented] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.

2012-03-18 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232211#comment-13232211
 ] 

Sebastian Schelter commented on GIRAPH-157:
---

Hello Eli,

I'm not sure how your algorithm works, but finding a minimal coloring for a 
graph is an NP-hard problem, so I doubt your implementation really achieves 
this (at least in polynomial time).

Could you sketch the idea of the algorithm in pseudocode for us?



> Vertex to perform graph coloring on simple, connected, undirected graphs and 
> related test.
> --
>
> Key: GIRAPH-157
> URL: https://issues.apache.org/jira/browse/GIRAPH-157
> Project: Giraph
>  Issue Type: Test
>  Components: examples, test
>Affects Versions: 0.2.0
>Reporter: Eli Reisman
>Assignee: Eli Reisman
>Priority: Trivial
>  Labels: newbie
> Attachments: GIRAPH-157.patch
>
>
> Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to 
> write a simple client application for Giraph to help me learn the ins and 
> outs of it. This is a simple unit test and vertex modeled after the 
> ConnectedComponentsVertex and related test. The vertex test runs whenever you 
> run the "mvn test" or "mvn verify" suite of tests. When finished processing, 
> each vertex will have an integer value that is its color.
> This is a pretty simple implementation, and although I have tested it on a 
> number of small graphs of varied trickiness and it seems to rapidly arrive at 
> a minimal coloring, its hard (for me at least) to guess which possible 
> coloring it will arrive at and I have no idea how it will do on really big 
> graphs yet without finding some more pre-colored larger test graphs to try it 
> on. Ideas anyone?
> Anyway, it was fun to put this together, and I'd be happy to improve it or 
> receive some help or advice to further the cause. Thanks again, I am hoping 
> this will be the first of many (hopefully more useful) contributions!
> Eli

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-18 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-156:
--

Attachment: GIRAPH-156-2.patch

> Users should be able to set simple 'custom arguments' via 
> org.apache.giraph.GiraphRunner
> 
>
> Key: GIRAPH-156
> URL: https://issues.apache.org/jira/browse/GIRAPH-156
> Project: Giraph
>  Issue Type: Improvement
>  Components: conf and scripts
>Affects Versions: 0.1.0
>Reporter: Sebastian Schelter
>Assignee: Sebastian Schelter
> Fix For: 0.2.0
>
> Attachments: GIRAPH-156-1.patch, GIRAPH-156-2.patch, GIRAPH-156.patch
>
>
> Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
> example needs to know the source vertex for the computation which is saved in 
> the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
> be able to apply such simple custom arguments via GiraphRunner. 
> I propose to add a new option _--customArguments_ where users can supply 
> arguments in the form _=,=_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-16 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-156:
--

Attachment: GIRAPH-156-1.patch

had a missing whitespace in the last patch :)

> Users should be able to set simple 'custom arguments' via 
> org.apache.giraph.GiraphRunner
> 
>
> Key: GIRAPH-156
> URL: https://issues.apache.org/jira/browse/GIRAPH-156
> Project: Giraph
>  Issue Type: Improvement
>  Components: conf and scripts
>Affects Versions: 0.1.0
>Reporter: Sebastian Schelter
>Assignee: Sebastian Schelter
> Attachments: GIRAPH-156-1.patch, GIRAPH-156.patch
>
>
> Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
> example needs to know the source vertex for the computation which is saved in 
> the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
> be able to apply such simple custom arguments via GiraphRunner. 
> I propose to add a new option _--customArguments_ where users can supply 
> arguments in the form _=,=_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-16 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-156:
--

Attachment: GIRAPH-156.patch

> Users should be able to set simple 'custom arguments' via 
> org.apache.giraph.GiraphRunner
> 
>
> Key: GIRAPH-156
> URL: https://issues.apache.org/jira/browse/GIRAPH-156
> Project: Giraph
>  Issue Type: Improvement
>  Components: conf and scripts
>Affects Versions: 0.1.0
>Reporter: Sebastian Schelter
>Assignee: Sebastian Schelter
> Attachments: GIRAPH-156.patch
>
>
> Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
> example needs to know the source vertex for the computation which is saved in 
> the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
> be able to apply such simple custom arguments via GiraphRunner. 
> I propose to add a new option _--customArguments_ where users can supply 
> arguments in the form _=,=_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-15 Thread Sebastian Schelter (Created) (JIRA)
Users should be able to set simple 'custom arguments' via 
org.apache.giraph.GiraphRunner


 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter


Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
example needs to know the source vertex for the computation which is saved in 
the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
be able to apply such simple custom arguments via GiraphRunner. 

I propose to add a new option _--customArguments_ where users can supply 
arguments in the form _=,=_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: How to contribute page

2012-03-14 Thread Sebastian Schelter
I added the 'Be involved' part from Mahout's [1] 'How to contribute'
page. Maybe we could even copy a little more from there :)

Best,
Sebastian

[1] https://cwiki.apache.org/MAHOUT/how-to-contribute.html

On 14.03.2012 17:39, Avery Ching wrote:
> Yes, that is thanks to Sebastian.  We should probably make that another
> confluence page though based on his notes.  Anyone want to do it? =)
> 
> Avery
> 
> On 3/14/12 7:43 AM, Benjamin Heitmann wrote:
>> On 14 Mar 2012, at 07:08, Avery Ching wrote:
>>
>>> I just added a "How to contribute" page.
>>>
>>> https://cwiki.apache.org/confluence/display/GIRAPH/How+to+Contribute
>> Thanks for setting up this page!
>>
>> Also, the link about "running giraph's unit test in pseudo distributed
>> mode" [1] is very interesting.
>>
>>
>>
>> [1] http://ssc.io/running-giraphs-unit-tests-in-pseudo-distributed-mode/
> 



Re: Graph clustering via LinLog force directed layout

2012-03-06 Thread Sebastian Schelter
Hi Timmy,

Sounds like a really cool idea to use giraph for layouting graphs, what
is the complexity of that algorithm you plan to implement?

--sebastian

On 06.03.2012 22:29, Avery Ching wrote:
> Hi Timmy,
> 
> I don't know much about force directed layout, but it certainly sounds
> like a very interesting application for Giraph.  Keep us posted on your
> progress and let us know how we can help.
> 
> Avery
> 
> On 3/6/12 8:34 AM, Claudio Martella wrote:
>> Hi,
>>
>> I'm not definitely familiar with the algorithm or implementation of
>> LinLog, I've been just a user. It should be doable with Giraph if you
>> can express it in terms of message-passing between vertices and
>> without a dependency on a global view of the graph (except for the
>> convergence criteria, such as total energy).
>>
>> Please consider that Giraph's data model is based on a directed graph,
>> this should be a quite "interesting" constraint for you, if your
>> implementation is going to modify energy associated with edges (you'd
>> have two views over the undirected edge, one in each endpoint).
>>
>> In general, a good way of doing community analysis would be to look at
>> algorithms that belong to the family of label-propagation clustering
>> algorithms.
>>
>>
>> Hope this helps,
>> Claudio
>>
>> On Tue, Mar 6, 2012 at 3:28 PM, Timmy Wilson 
>> wrote:
>>> Hi giraph community,
>>>
>>> I'm interested in using giraph for distributed n-body simulation.
>>>
>>> Initially, i'm interested in force directed layouts -- ie, graph
>>> drawing:
>>>
>>> http://en.wikipedia.org/wiki/Force-based_algorithms_(graph_drawing)
>>>
>>> I'm interested specifically in Dr. Andreas Noack's LinLog energy model
>>> -- which performs well w/ community detection:
>>>
>>> http://www.informatik.tu-cottbus.de/~an/GD/linlog.html
>>>
>>> I have a few examples of a serial implementation here:
>>>
>>> http://www.smarttypes.org/
>>>
>>> The model maximizes the distance between all nodes while minimizing
>>> the distance between connected nodes.
>>>
>>> Without getting into too much detail, i'm curious if anyone has
>>> considered using giraph for force directed graph embedding (yet
>>> another name for it)?
>>>
>>> I'm also considering something like http://www.mcs.anl.gov/petsc/ or
>>> http://www.cs.cmu.edu/~scandal/alg/nbody.html -- which have fast
>>> n-body simulation implementations (Barnes-Hut + Fast Multipole).
>>>
>>> That said, i think giraph may be a good fit -- curious what the
>>> community thinks?
>>>
>>>
>>> Thanks,
>>> Timmy Wilson
>>> Cleveland, OH
>>
>>
> 



[jira] [Commented] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created

2012-02-20 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211991#comment-13211991
 ] 

Sebastian Schelter commented on GIRAPH-150:
---

haven't tested it, but looks reasonable, +1 from me

> PageRankBenchmark accesses wrong conf after GiraphJob is created
> 
>
> Key: GIRAPH-150
> URL: https://issues.apache.org/jira/browse/GIRAPH-150
> Project: Giraph
>  Issue Type: Bug
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-150.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-16 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209741#comment-13209741
 ] 

Sebastian Schelter commented on GIRAPH-40:
--

Had a quick look, if it is ensured that one can build the project even with 
violations of the coding style (for development), I'm +1 on this too.

> Adding checkstyle enforcement of Giraph code conventions
> 
>
> Key: GIRAPH-40
> URL: https://issues.apache.org/jira/browse/GIRAPH-40
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
> GIRAPH-40.patch
>
>
> Now that we have some code conventions (see GIRAPH-21), we should enforce 
> them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-120) Add Sebastian Schelter to site

2012-02-01 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-120:
--

Attachment: GIRAPH-120.patch

> Add Sebastian Schelter to site
> --
>
> Key: GIRAPH-120
> URL: https://issues.apache.org/jira/browse/GIRAPH-120
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>    Reporter: Sebastian Schelter
>    Assignee: Sebastian Schelter
> Fix For: 0.1.0
>
> Attachments: GIRAPH-120.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: on the semantics of the combiner

2012-01-13 Thread Sebastian Schelter
+1 on Iterable <= messages.size() also from me.


On 13.01.2012 19:51, Avery Ching wrote:
> +1
> 
> I'm fine with this.  If we agree to return an Iterable, then we should
> make sure to either throw if the size of the Iterable > messages.size()
> to at the very least LOG.warn("This combiner is likely to be implemented
> wrong").  I prefer an exception, since we have no use case for expanding
> the set of messages.
> 
> Also, I'd like to have something in the javadoc saying something like
> "While the number of messages returned can be equal to the same number
> of messages that was inputted, the purpose of the combiner is to reduced
> the number of messages from the input."
> 
> Avery
> 
> On 1/13/12 9:34 AM, Claudio Martella wrote:
>> Ok,
>>
>> I guess we can vote then about this, what do you think?
>> Shall we take 72h?
>>
>> I'm +1 for returning an iterable that can be empty.
>> I'm +1 for the returned iterable to be<= messages.size()
>>
>>
>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter 
>> wrote:
>>> I think we should make the combiner return a list/iterable that can
>>> potentially be empty. However we should assume that the number of
>>> elements returned is smaller than or equal to the number of input
>>> elements (whats the use of a combiner if this is not given?). I also
>>> concur that the code should not depend on the combiner being applied
>>> (similar to the way combiners work in hadoop).
>>>
>>> --sebastian
>>>
>>> 2012/1/10 Jakob Homan:
>>>> A composite object would essentially be a wrapper around a list and
>>>> introduce the need for all vertices to be ready to extract that list
>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>> messages.  If we allow zero or one return elements, the combiner would
>>>> have to create a composite object with a list of those four messages,
>>>> whereas if we return a list, it just skips that step and returns the
>>>> four messages.  Additionally, the receiving vertex would have to
>>>> handle the possibility of a composite object every time even though
>>>> the combiner may or may not have been run during the superstep, or
>>>> even included in that job (since combiners are optional to the job
>>>> itself).  It would be better if one could write a Giraph application
>>>> that was completely agnostic of whether or not a combiner was
>>>> included.
>>>>
>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>   wrote:
>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>> stand :) Once you give them any API they have the power to do anything
>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>> composite object would make the grouping (List) but I thought
>>>>> we were talking about simplifying life to users :). I think it would
>>>>> be more flexible (for the present and for the future) and also more
>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>> for free).
>>>>>
>>>>> Very cool discussion.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan 
>>>>> wrote:
>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>> so they can't send messages to other vertices.
>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>> vertex
>>>>>> before processing as well).  When would they be forwarded to their
>>>>>> new
>>>>>> destinations at that point?  It would be possible to get into a
>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>> ever
>>>>>> actually be done.
>>>>>>
>>>>>> That being said, our inability to think of a good application doesn't
>

Fwd: Call for Submission Berlin Buzzwords 2012all for Submission Berlin Buzzwords - http://berlinbuzzwords.de

2012-01-11 Thread Sebastian Schelter
Forwarding Simon's call for Berlin Buzzwords.

Does anybody plan to give a talk about Giraph at Buzzwords? I'll
definitely be at the conference as I'm living in Berlin. We should
also try to organize a Giraph meeting in the evening maybe together
with the Mahout people.

Best,
Sebastian


-- Forwarded message --
From: Simon Willnauer 
Date: 2012/1/11
Subject: Call for Submission Berlin Buzzwords 2012all for Submission
Berlin Buzzwords - http://berlinbuzzwords.de
To: java-user , d...@lucene.apache.org,
solr-u...@lucene.apache.org, mahout-...@lucene.apache.org,
lucy-...@incubator.apache.org, lucy-u...@incubator.apache.org,
mapreduce-u...@hadoop.apache.org, hdfs-u...@hadoop.apache.org,
hdfs-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org,
gene...@lucene.apache.org


Call for Submission Berlin Buzzwords 2012 - Search, Store, Scale  --
June 4 / 5. 2012

The event will comprise presentations on scalable data processing. We
invite you to submit talks on the topics:
 * IR / Search - Lucene, Solr, katta, ElasticSearch or comparable solutions
 * NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
 * Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

Related topics not explicitly listed above are more than welcome. We are
looking for presentations on the implementation of the systems
themselves, technical talks,
real world applications and case studies.

Important Dates (all dates in GMT +2)
 * Submission deadline: March 11th 2012, 23:59 MEZ
 * Notification of accepted speakers: April 6st, 2012, MEZ
 * Publication of final schedule: April 13th, 2012
 * Conference: June 4/5. 2012

High quality, technical submissions are called for, ranging from
principles to practice. We are looking for real world use cases,
background on the architecture of specific projects and a deep dive
into architectures built on top of e.g. Hadoop clusters.

To submit your proposal please register to our website [1] and log in
[2] once you received the confirmation email. Once this is done you
can submit your proposal here [3]; please do so no later than March
11th, 2012. Acceptance notifications will be sent out soon after the
submission deadline. Please include your name, bio and email, the
title of the talk, a brief abstract in English language. Please
indicate whether you want to give a lightning (10min), short (20min)
or long (40min) presentation and indicate the level of experience with
the topic your audience should have (e.g. whether your talk will be
suitable for newbies or is targeted for experienced users.) If you'd
like to pitch your brand new product in your talk, please let us know
as well -
there will be extra space for presenting new ideas, awesome products
and great new projects.

The presentation format is short. We will be enforcing the schedule rigorously.

If you are interested in sponsoring the event (e.g. we would be happy
to provide videos after the event, free drinks for attendees as well
as an after-show party), please contact us.

Follow @berlinbuzzwords on Twitter for updates. Tickets, news on the
conference, and the final schedule are be published at
http://berlinbuzzwords.de.

Program Committee Chairs:

 *  Isabel Drost (Nokia & Apache Mahout)
 *  Jan Lehnardt (CouchBase & Apache CouchDB)
 *  Simon Willnauer (SearchWorkings & Apache Lucene)
 *  Grant Ingersoll (Lucid Imagination & Apache Lucene)
 *  Owen O’Malley (Yahoo Inc. & Apache Hadoop)
 *  Jim Webber (Neo Technology & Neo4j)
 *  Sean Treadway (Soundcloud)


Please re-distribute this CfP to people who might be interested.

Contact us at:

newthinking communications
GmbH Schönhauser Allee 6/7
10119 Berlin,
Germany
Julia Gemählich 
Isabel Drost 
Simon Willnauer 
 +49(0)30-9210 596

[1] http://berlinbuzzwords.de/user/register
[2] http://berlinbuzzwords.de/user
[3] http://berlinbuzzwords.de/node/add/session


Re: on the semantics of the combiner

2012-01-10 Thread Sebastian Schelter
t;>>
>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>>> all.
>>>>>>>>>>>
>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>>> has
>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>> adapting
>>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>     wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>>> that
>>>>>>>>>>>>> means
>>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>>> case
>>>>>>>>>>>>> when
>>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>>> that
>>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>>> functionality
>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>>> of work
>>>>>>>>>>>>> on
>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>> return Iterable, basicallly letting it define what to combine
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>> awkward, it
>>&

Re: on the semantics of the combiner

2012-01-09 Thread Sebastian Schelter
I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code only
invokes the combiner only if at least one message for the target vertex
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.

--sebastian

On 09.01.2012 13:29, Claudio Martella wrote:
> Hello list,
> 
> for GIRAPH-45 I'm touching the incoming messages and hit an
> interesting problem with the combiner semantics.
> currently, my code fails testBspCombiner for the following reason:
> 
> SimpleSumCombiner::compute() returns a value even if there are no
> messages in the iterator (in this case it returns 0) and for this
> reason the vertices get activated at each superstep.
> 
> At each superstep, under-the-hood, I pass the combiner for each vertex
> an Iterable, which can be empty:
> 
> public Iterable getMessages(I vertexId) {
>   Iterable messages = inMessages.getMessages(vertexId);
>   if (combiner != null) {
>   M combinedMsg;
>   try {
>   combinedMsg = combiner.combine(vertexId, messages);
>   }  catch (IOException e) {
>   throw new RuntimeException("could not combine", e);
>   }
>   if (combinedMsg != null) {
>   List tmp = new ArrayList(1);
>   tmp.add(combinedMsg);
>   messages = tmp;
>   } else {
>   messages = new ArrayList(0);
>   }
>   }
>   return messages;
> }
> 
> the Iterable returned by this methods is passed to
> basicVertex.putMessages() right before the compute().
> Now, the question is: who's wrong? The combiner code that returns a
> sum of 0 over no values, or the framework that calls the combiner with
> 0 messages?
> 
> 
> 



Re: some giraph code

2012-01-09 Thread Sebastian Schelter
Hi Eugenia,

can you share some details about your experiments on this list? Your
report mentions some design decisions you were unhappy with. Would be a
interesting to know what these are, so we can improve them!

Best,
Sebastian



On 08.01.2012 03:40, Eugenia Gabrielova wrote:
> Hello,
> 
> My name is Eugenia and I am a PhD student at UC Irvine. I apologize for my
> delayed reply, I have been traveling and am catching up on email. I'd love
> to chat about how to share our project code with the Giraph development
> team; this quarter, my project partner Inci and I worked on porting a
> variety of algorithms to Giraph.
> 
> Sincerely,
> Eugenia
> 
> On Fri, Jan 6, 2012 at 9:13 AM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
> 
>> Hi Sebastian,
>>
>> Gotcha, thanks for the clarification and for keeping the conversation
>> on-list!
>>
>> Cheers,
>> Chris
>>
>> On Jan 6, 2012, at 9:39 AM, Sebastian Schelter wrote:
>>
>>> @Chris I have to clarify this. I haven't been discussing Giraph-specific
>>> issues outside of the mailinglist, I just know Eugenia and know that she
>>> is working on porting some algorithms onto Pregel-like platforms.
>>>
>>> @Eugenia Sorry for the confusion :) We (the Giraph community) stumbled
>>> upon your current work at
>>>
>> https://grape.ics.uci.edu/wiki/asterix/raw-attachment/wiki/cs295-2011-fall-ProjectTeams/Team_8_Report.pdf
>> .
>>>
>>>
>>> It would be great if you could share your experiences and tell us about
>>> the problems you encountered (especially the design decisions you're
>>> criticizing). If you want to, it would also be great if you could
>>> contribute back your code.
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>> On 06.01.2012 15:22, Mattmann, Chris A (388J) wrote:
>>>> Hi Sebastian,
>>>>
>>>> Great. It might be good to either (a) copy giraph-dev@ on the
>>>> discussion to keep the rest of the community outside of the specific
>>>> sub-group you're talking to aware of whats up; or (b) send a summary
>>>> of the discussion here. It's important to demonstrate that discussion
>>>> within Apache Giraph is happening *on the mailing lists* here.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Jan 6, 2012, at 8:25 AM, Sebastian Schelter wrote:
>>>>
>>>>> I have already been in email contact with Eugenia, because a colleague
>>>>> of mine is currently at UCI. I'll cc this mail to her :)
>>>>>
>>>>> --sebastians
>>>>>
>>>>> On 06.01.2012 14:20, Claudio Martella wrote:
>>>>>> Hello guys,
>>>>>>
>>>>>> today I stumbled upon this:
>>>>>>
>>>>>>
>> https://grape.ics.uci.edu/wiki/asterix/raw-attachment/wiki/cs295-2011-fall-ProjectTeams/Team_8_Report.pdf
>>>>>>
>>>>>> the code can be found here:
>>>>>> http://code.google.com/p/graph-algorithm-ports-giraph-hyracks/
>>>>>>
>>>>>> Was anybody aware of this? Do you think we could take advantage of
>> this code?
>>>>>>
>>>>>
>>>>
>>>>
>>>> ++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++
>>>>
>>>
>>
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
> 



[jira] [Created] (GIRAPH-120) Add Sebastian Schelter to site

2012-01-07 Thread Sebastian Schelter (Created) (JIRA)
Add Sebastian Schelter to site
--

 Key: GIRAPH-120
 URL: https://issues.apache.org/jira/browse/GIRAPH-120
 Project: Giraph
  Issue Type: Task
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: some giraph code

2012-01-06 Thread Sebastian Schelter
@Chris I have to clarify this. I haven't been discussing Giraph-specific
issues outside of the mailinglist, I just know Eugenia and know that she
is working on porting some algorithms onto Pregel-like platforms.

@Eugenia Sorry for the confusion :) We (the Giraph community) stumbled
upon your current work at
https://grape.ics.uci.edu/wiki/asterix/raw-attachment/wiki/cs295-2011-fall-ProjectTeams/Team_8_Report.pdf.


It would be great if you could share your experiences and tell us about
the problems you encountered (especially the design decisions you're
criticizing). If you want to, it would also be great if you could
contribute back your code.

Best,
Sebastian


On 06.01.2012 15:22, Mattmann, Chris A (388J) wrote:
> Hi Sebastian,
> 
> Great. It might be good to either (a) copy giraph-dev@ on the
> discussion to keep the rest of the community outside of the specific
> sub-group you're talking to aware of whats up; or (b) send a summary
> of the discussion here. It's important to demonstrate that discussion
> within Apache Giraph is happening *on the mailing lists* here.
>  
> Cheers,
> Chris
> 
> On Jan 6, 2012, at 8:25 AM, Sebastian Schelter wrote:
> 
>> I have already been in email contact with Eugenia, because a colleague
>> of mine is currently at UCI. I'll cc this mail to her :)
>>
>> --sebastians
>>
>> On 06.01.2012 14:20, Claudio Martella wrote:
>>> Hello guys,
>>>
>>> today I stumbled upon this:
>>>
>>> https://grape.ics.uci.edu/wiki/asterix/raw-attachment/wiki/cs295-2011-fall-ProjectTeams/Team_8_Report.pdf
>>>
>>> the code can be found here:
>>> http://code.google.com/p/graph-algorithm-ports-giraph-hyracks/
>>>
>>> Was anybody aware of this? Do you think we could take advantage of this 
>>> code?
>>>
>>
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 



Re: some giraph code

2012-01-06 Thread Sebastian Schelter
I have already been in email contact with Eugenia, because a colleague
of mine is currently at UCI. I'll cc this mail to her :)

--sebastian

On 06.01.2012 14:20, Claudio Martella wrote:
> Hello guys,
> 
> today I stumbled upon this:
> 
> https://grape.ics.uci.edu/wiki/asterix/raw-attachment/wiki/cs295-2011-fall-ProjectTeams/Team_8_Report.pdf
> 
> the code can be found here:
> http://code.google.com/p/graph-algorithm-ports-giraph-hyracks/
> 
> Was anybody aware of this? Do you think we could take advantage of this code?
> 



Re: Time to roll a release?

2012-01-05 Thread Sebastian Schelter
+1 from me too, as Jake already said: release early, release often.


On 04.01.2012 23:07, Mattmann, Chris A (388J) wrote:
> Super +1, thanks for pushing this Jakob.
> 
> Cheers,
> Chris
> 
> On Jan 4, 2012, at 3:15 PM, Jakob Homan wrote:
> 
>> I think there's been enough work done since Giraph entered incubation
>> that we're ready to do a release.  We've had significant performance
>> and usability improvements, to the point where anyone interested in
>> Giraph/Pregal/BSP should definitely take a look at the code and try it
>> out.  Rolling a release would signal anyone left on the fence that
>> it's worth their time.  This is also a required criterion for
>> advancing through the incubator, as we're doing well on the others
>> currently.
>>
>> Having been peripherally involved in Kafka's recent first release, I
>> can tell you it's quite a lot of paperwork, but I'm happy to volunteer
>> to roll the first one.  Any objections? Ideas?  Hysterical laughter?
>>
>> -Jakob
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 



Re: Review Request: Make EdgeListVertex the default vertex implementation, fix bugs related to EdgeListVertex.

2012-01-02 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3349/#review4170
---

Ship it!


Ok, I understood this wrong. I'm fine with the changes then.

- Sebastian


On 2012-01-02 02:35:50, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3349/
> ---
> 
> (Updated 2012-01-02 02:35:50)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> * Changed Vertex.java to HashMapVertex.java.  This makes it less likely folks 
> will use it as a default.  I have included comments that suggest 
> EdgeListVertex for static graphs (most cases).
> 
> * Found and fixed bugs in EdgeListVertex with the way that binarySearch was 
> being used.  Added unittests to check for adding/getting/removing edges.
> 
> * Changed classes that extend Vertex to extending EdgeListVertex instead.
> 
> * Changed MutableVertex to BasicVertex for addVertex, addVertexReq to be a 
> little safer
> 
> * Tried to make sure that when a class that extends MutableVertex is 
> instantiated that it also will call readFields() or initialize().  This fixed 
> several bugs.
> 
> * Changed the interface of BasicVertex#initialize from  public abstract void 
> initialize(I vertexId, V vertexValue, Map edges, List messages) to 
> initialize(I vertexId, V vertexValue, Map edges, Iterable messages) 
> to better fit the recent changes to BasicVertex getting/setting messages with 
> an Iterable.
> 
> * Found and removed duplicated code from several MutableVertex extended 
> classes for addVertexRequest, removeVertexRequest, addEdgeRequest and 
> removeEdgeRequest.
> 
> * Changed Vertex cast to BasicVertex cast in Partition and MockUtils.
> 
> * There are some tabs --> spaces conversions done automatically from my 
> Ecipse settings for the files I touched.
> 
> 
> This addresses bug GIRAPH-116.
> https://issues.apache.org/jira/browse/GIRAPH-116
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCombinerVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleFailVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleShortestPathsVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/HashMapVertex.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
>  1226330 
>   
> http://svn.apache.org/re

[jira] [Updated] (GIRAPH-117) DefaultWorkerContext should preserve the method signatures of WorkerContext

2012-01-02 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-117:
--

Attachment: GIRAPH-117.patch

> DefaultWorkerContext should preserve the method signatures of WorkerContext
> ---
>
> Key: GIRAPH-117
> URL: https://issues.apache.org/jira/browse/GIRAPH-117
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>    Assignee: Sebastian Schelter
>Priority: Trivial
> Attachments: GIRAPH-117.patch
>
>
> DefaultWorkerContext.preApplication() swallows the InstantiationException and 
> IllegalAccessException of WorkerContext.preApplication(). These should be 
> preserved for applications that want to register an aggregator in this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-117) DefaultWorkerContext should preserve the method signatures of WorkerContext

2012-01-02 Thread Sebastian Schelter (Created) (JIRA)
DefaultWorkerContext should preserve the method signatures of WorkerContext
---

 Key: GIRAPH-117
 URL: https://issues.apache.org/jira/browse/GIRAPH-117
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Trivial


DefaultWorkerContext.preApplication() swallows the InstantiationException and 
IllegalAccessException of WorkerContext.preApplication(). These should be 
preserved for applications that want to register an aggregator in this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Persisting global values

2012-01-02 Thread Sebastian Schelter
Hi,

I'm working on an algorithm that computes a global value of the graph
(its so called effective diameter) and I have an Aggregator with which
this value can be computed. What would be the correct place to implement
this computation?

I thought about WorkerContext first, but it seems that this is run on
each worker, which doesn't really fit my problem. Another question would
be how to persist that global value after the algorithm is done.

--sebastian


Re: Review Request: Make EdgeListVertex the default vertex implementation, fix bugs related to EdgeListVertex.

2012-01-02 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3349/#review4168
---


Had a quick look over your changes and everything looked good. I think it's 
right to assume that most implementations will use static graphs and to offer 
EdgeListVertex as the default extension point for this. The only thing I don't 
like is the name change from MutableVertex to BasicVertex, I liked the former 
better because it sounds much more expressive to me.


http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java


good thing to have the Iterable<> abstraction here


- Sebastian


On 2012-01-02 02:35:50, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3349/
> ---
> 
> (Updated 2012-01-02 02:35:50)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> * Changed Vertex.java to HashMapVertex.java.  This makes it less likely folks 
> will use it as a default.  I have included comments that suggest 
> EdgeListVertex for static graphs (most cases).
> 
> * Found and fixed bugs in EdgeListVertex with the way that binarySearch was 
> being used.  Added unittests to check for adding/getting/removing edges.
> 
> * Changed classes that extend Vertex to extending EdgeListVertex instead.
> 
> * Changed MutableVertex to BasicVertex for addVertex, addVertexReq to be a 
> little safer
> 
> * Tried to make sure that when a class that extends MutableVertex is 
> instantiated that it also will call readFields() or initialize().  This fixed 
> several bugs.
> 
> * Changed the interface of BasicVertex#initialize from  public abstract void 
> initialize(I vertexId, V vertexValue, Map edges, List messages) to 
> initialize(I vertexId, V vertexValue, Map edges, Iterable messages) 
> to better fit the recent changes to BasicVertex getting/setting messages with 
> an Iterable.
> 
> * Found and removed duplicated code from several MutableVertex extended 
> classes for addVertexRequest, removeVertexRequest, addEdgeRequest and 
> removeEdgeRequest.
> 
> * Changed Vertex cast to BasicVertex cast in Partition and MockUtils.
> 
> * There are some tabs --> spaces conversions done automatically from my 
> Ecipse settings for the files I touched.
> 
> 
> This addresses bug GIRAPH-116.
> https://issues.apache.org/jira/browse/GIRAPH-116
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCheckpointVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleCombinerVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleFailVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleShortestPathsVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleVertexWithWorkerContext.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
>  1226330 
>   
> http://svn.apache.org/repos/asf/incubato

Re: Unable to load vertices

2011-12-27 Thread Sebastian Schelter
You were write it was an issue with writing/reading the vertex value.
Only took me three days of searching to find out that I simply forgot to
call setVertexValue() ... :)

--sebastian



On 23.12.2011 18:28, Avery Ching wrote:
> Without looking at your code, maybe your I, V, E, or M types might have
> Writable issues?  In the single worker case, does checkpointing work? 
> That would verify the writing part of Writable is okay, but not the
> reading part...(well you can do a manual checkpoint restart I guess to
> verify that).
> 
> Avery
> 
> On 12/23/11 9:23 AM, Sebastian Schelter wrote:
>> I'm extending org.apache.giraph.graph.Vertex directly. I also created
>> unit tests for the serialization of the Writables (writing them to a
>> byte array and reading them back) without finding something. Thank you
>> for the advice however, I'll continue searching :)
>>
>> --sebastian
>>
>>
>> On 23.12.2011 18:14, Avery Ching wrote:
>>> What MutableVertex implementation are you using?  Sounds like the issue
>>> only happens during the RPC to send the vertex to another worker.  Maybe
>>> a bug in the Writable implementation?
>>>
>>> Avery
>>>
>>> On 12/23/11 3:14 AM, Sebastian Schelter wrote:
>>>> Hmm, the job works if I use a single worker only locally, strange...
>>>>
>>>> On 23.12.2011 11:07, Claudio Martella wrote:
>>>>> With a super quick look, so i might be completely wrong, this looks
>>>>> like you're running a different hadoop locally and on your test. Is
>>>>> there any chance you're not using hadoop non_secure locally but you're
>>>>> in your distributed mode?
>>>>>
>>>>> On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm currently implementing an algorithm for diameter and radius
>>>>>> estimation. It already works when I run it on toy data via
>>>>>> InternalVertexRunner in a unit test.
>>>>>>
>>>>>> Unfortunately, in my tests with a single node hadoop instance and
>>>>>> real
>>>>>> cluster, I always run into the attached exception during startup.
>>>>>> Does
>>>>>> anybody have an idea what might cause this?
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>>
>>>>>> 2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>>>> Error from attempt_201112230924_0006_m_01_0:
>>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable
>>>>>> exception
>>>>>> setup: Offlining servers due to exception...
>>>>>>  at
>>>>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>>>>>>  at
>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>>  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>>  at java.security.AccessController.doPrivileged(Native
>>>>>> Method)
>>>>>>  at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>  at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>>
>>>>>>
>>>>>>  at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>> Caused by: java.lang.RuntimeException: setup: Offlining servers
>>>>>> due to
>>>>>> exception...
>>>>>>  at
>>>>>> org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
>>>>>>  at
>>>>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>>>>>  ... 7 more
>>>>>> Caused by: java.lang.IllegalStateException: setup: loadVertices
>>>>>> failed
>>>>>>  at
>>>>>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
>>>>>>
>>>>>>
>>>>>>  at
>>>>>> org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>>>>>  ... 8 more
>>>>>> Caused by: java.lang.Ru

Re: Review Request: Port of the HCC algorithm for identifying all connected components of a graph

2011-12-25 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
---

(Updated 2011-12-25 09:36:39.177630)


Review request for giraph.


Changes
---

Reworked code to reflect Avery's comments regarding the code conventions.


Summary
---

Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest 
vertex id in its component.

I created a very memory-efficient abstract vertex in 
org.apache.giraph.graph.IntIntNullIntVertex and had 
org.apache.giraph.examples.ConnectedComponentsVertex extend that. 
org.apache.giraph.examples.ConnectedComponentsVertexTest contains an 
"integration" test on toy data.

I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of 
combiners and to shutdown() the local zookeeper instance in the tests.

Local and pseudo-distributed unit tests were passed. I also tested the 
algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph 
(5.7M vertices, 130M edges).


This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115


Diffs (updated)
-

  
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java 
PRE-CREATION 
  
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
 PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 
PRE-CREATION 
  
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
 PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java 
PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
1222837 
  
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java 
PRE-CREATION 
  
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
 PRE-CREATION 
  /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 
PRE-CREATION 
  
/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
 1222837 

Diff: https://reviews.apache.org/r/3313/diff


Testing
---


Thanks,

Sebastian



Review Request: Port of the HCC algorithm for identifying all connected components of a graph

2011-12-24 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
---

Review request for giraph.


Summary
---

Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest 
vertex id in its component.

I created a very memory-efficient abstract vertex in 
org.apache.giraph.graph.IntIntNullIntVertex and had 
org.apache.giraph.examples.ConnectedComponentsVertex extend that. 
org.apache.giraph.examples.ConnectedComponentsVertexTest contains an 
"integration" test on toy data.

I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of 
combiners and to shutdown() the local zookeeper instance in the tests.

Local and pseudo-distributed unit tests were passed. I also tested the 
algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph 
(5.7M vertices, 130M edges).


This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115


Diffs
-

  
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java 
PRE-CREATION 
  
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
 PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 
PRE-CREATION 
  
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
 PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java 
PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 
1222837 
  
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java 
PRE-CREATION 
  
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
 PRE-CREATION 
  /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3313/diff


Testing
---


Thanks,

Sebastian



[jira] [Created] (GIRAPH-115) Port of the HCC algorithm for identifying all connected components of a graph

2011-12-24 Thread Sebastian Schelter (Created) (JIRA)
Port of the HCC algorithm for identifying all connected components of a graph
-

 Key: GIRAPH-115
 URL: https://issues.apache.org/jira/browse/GIRAPH-115
 Project: Giraph
  Issue Type: New Feature
Affects Versions: 0.70.0
Reporter: Sebastian Schelter


Port of the HCC algorithm that identifies connected components and assigns a 
componented id (the smallest vertex id in the component) to each vertex.

The idea behind the algorithm is very simple: propagate the smallest vertex id 
along the edges to all vertices of a connected component until convergence. The 
number of supersteps necessary is equal to the length of the maximum diameter 
of all components + 1

The original Hadoop-based variant of this algorithm was proposed by Kang, 
Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 
2010

http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Unable to load vertices

2011-12-23 Thread Sebastian Schelter
I'm extending org.apache.giraph.graph.Vertex directly. I also created
unit tests for the serialization of the Writables (writing them to a
byte array and reading them back) without finding something. Thank you
for the advice however, I'll continue searching :)

--sebastian


On 23.12.2011 18:14, Avery Ching wrote:
> What MutableVertex implementation are you using?  Sounds like the issue
> only happens during the RPC to send the vertex to another worker.  Maybe
> a bug in the Writable implementation?
> 
> Avery
> 
> On 12/23/11 3:14 AM, Sebastian Schelter wrote:
>> Hmm, the job works if I use a single worker only locally, strange...
>>
>> On 23.12.2011 11:07, Claudio Martella wrote:
>>> With a super quick look, so i might be completely wrong, this looks
>>> like you're running a different hadoop locally and on your test. Is
>>> there any chance you're not using hadoop non_secure locally but you're
>>> in your distributed mode?
>>>
>>> On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter 
>>> wrote:
>>>> Hi,
>>>>
>>>> I'm currently implementing an algorithm for diameter and radius
>>>> estimation. It already works when I run it on toy data via
>>>> InternalVertexRunner in a unit test.
>>>>
>>>> Unfortunately, in my tests with a single node hadoop instance and real
>>>> cluster, I always run into the attached exception during startup. Does
>>>> anybody have an idea what might cause this?
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> 2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
>>>> Error from attempt_201112230924_0006_m_01_0:
>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
>>>> setup: Offlining servers due to exception...
>>>> at
>>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>> Caused by: java.lang.RuntimeException: setup: Offlining servers due to
>>>> exception...
>>>> at
>>>> org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
>>>> at
>>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>>> ... 7 more
>>>> Caused by: java.lang.IllegalStateException: setup: loadVertices failed
>>>> at
>>>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
>>>>
>>>> at
>>>> org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>>> ... 8 more
>>>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to
>>>> poodle-6/127.0.1.1:30002 failed on local exception:
>>>> java.io.EOFException
>>>> at
>>>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)
>>>>
>>>> at
>>>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>>>
>>>> at
>>>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)
>>>>
>>>> ... 9 more
>>>> Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
>>>> on local exception: java.io.EOFException
>>>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>>> at $Proxy3.putVertexList(Unknown Source)
>>>> at
>>>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)
>>>>
>>>> ... 11 more
>>>> Caused by: java.io.EOFException
>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>> at
>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>>>
>>>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
>>>
>>>
> 



Re: Unable to load vertices

2011-12-23 Thread Sebastian Schelter
Hmm, the job works if I use a single worker only locally, strange...

On 23.12.2011 11:07, Claudio Martella wrote:
> With a super quick look, so i might be completely wrong, this looks
> like you're running a different hadoop locally and on your test. Is
> there any chance you're not using hadoop non_secure locally but you're
> in your distributed mode?
> 
> On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter  wrote:
>> Hi,
>>
>> I'm currently implementing an algorithm for diameter and radius
>> estimation. It already works when I run it on toy data via
>> InternalVertexRunner in a unit test.
>>
>> Unfortunately, in my tests with a single node hadoop instance and real
>> cluster, I always run into the attached exception during startup. Does
>> anybody have an idea what might cause this?
>>
>> --sebastian
>>
>>
>> 2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
>> Error from attempt_201112230924_0006_m_01_0:
>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
>> setup: Offlining servers due to exception...
>>at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>at java.security.AccessController.doPrivileged(Native Method)
>>at javax.security.auth.Subject.doAs(Subject.java:396)
>>at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> Caused by: java.lang.RuntimeException: setup: Offlining servers due to
>> exception...
>>at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
>>at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>... 7 more
>> Caused by: java.lang.IllegalStateException: setup: loadVertices failed
>>at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
>>at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>... 8 more
>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to
>> poodle-6/127.0.1.1:30002 failed on local exception: java.io.EOFException
>>at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)
>>at
>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)
>>... 9 more
>> Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
>> on local exception: java.io.EOFException
>>at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>at $Proxy3.putVertexList(Unknown Source)
>>at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)
>>... 11 more
>> Caused by: java.io.EOFException
>>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>at 
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
> 
> 
> 



Re: Unable to load vertices

2011-12-23 Thread Sebastian Schelter
Shouldn't be the case, I always use 0.20.203... I also implemented a
very similar algorithm to find all connected components in a graph and
that one worked fine.

--sebastian

On 23.12.2011 11:07, Claudio Martella wrote:
> With a super quick look, so i might be completely wrong, this looks
> like you're running a different hadoop locally and on your test. Is
> there any chance you're not using hadoop non_secure locally but you're
> in your distributed mode?
> 
> On Fri, Dec 23, 2011 at 10:49 AM, Sebastian Schelter  wrote:
>> Hi,
>>
>> I'm currently implementing an algorithm for diameter and radius
>> estimation. It already works when I run it on toy data via
>> InternalVertexRunner in a unit test.
>>
>> Unfortunately, in my tests with a single node hadoop instance and real
>> cluster, I always run into the attached exception during startup. Does
>> anybody have an idea what might cause this?
>>
>> --sebastian
>>
>>
>> 2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
>> Error from attempt_201112230924_0006_m_01_0:
>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
>> setup: Offlining servers due to exception...
>>at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>at java.security.AccessController.doPrivileged(Native Method)
>>at javax.security.auth.Subject.doAs(Subject.java:396)
>>at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> Caused by: java.lang.RuntimeException: setup: Offlining servers due to
>> exception...
>>at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
>>at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>... 7 more
>> Caused by: java.lang.IllegalStateException: setup: loadVertices failed
>>at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
>>at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>... 8 more
>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to
>> poodle-6/127.0.1.1:30002 failed on local exception: java.io.EOFException
>>at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)
>>at
>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)
>>... 9 more
>> Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
>> on local exception: java.io.EOFException
>>at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>at $Proxy3.putVertexList(Unknown Source)
>>at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)
>>... 11 more
>> Caused by: java.io.EOFException
>>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>at 
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
> 
> 
> 



Unable to load vertices

2011-12-23 Thread Sebastian Schelter
Hi,

I'm currently implementing an algorithm for diameter and radius
estimation. It already works when I run it on toy data via
InternalVertexRunner in a unit test.

Unfortunately, in my tests with a single node hadoop instance and real
cluster, I always run into the attached exception during startup. Does
anybody have an idea what might cause this?

--sebastian


2011-12-23 10:43:09,769 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from attempt_201112230924_0006_m_01_0:
java.lang.IllegalStateException: run: Caught an unrecoverable exception
setup: Offlining servers due to exception...
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: setup: Offlining servers due to
exception...
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices failed
at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:582)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
poodle-6/127.0.1.1:30002 failed on local exception: java.io.EOFException
at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:768)
at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:575)
... 9 more
Caused by: java.io.IOException: Call to poodle-6/127.0.1.1:30002 failed
on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy3.putVertexList(Unknown Source)
at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:765)
... 11 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)


[jira] [Updated] (GIRAPH-109) GiraphRunner should provide support for combiners

2011-12-21 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-109:
--

Attachment: GIRAPH-109.patch

Patch that allows specifying VertexCombiner, AggregatorWriter and WorkerContext.


> GiraphRunner should provide support for combiners
> -
>
> Key: GIRAPH-109
> URL: https://issues.apache.org/jira/browse/GIRAPH-109
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
> Attachments: GIRAPH-109.patch
>
>
> Currently there's no way to tell GiraphRunner that you want to use a 
> Combiner. A simple option should be added, similar to the way in- and 
> outputformats are specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-114) Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor

2011-12-21 Thread Sebastian Schelter (Created) (JIRA)
Inconsistent message map handling in 
BasicRPCCommunications.LargeMessageFlushExecutor
-

 Key: GIRAPH-114
 URL: https://issues.apache.org/jira/browse/GIRAPH-114
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Critical
 Attachments: GIRAPH-114.patch

I'm currently implementing a simple algorithm to identify all the connected 
components of a graph. The algorithm ran well in a local IDE unit tests on toy 
data and in a local single node hadoop instance using a graph of ~100k edges.

When I tested it on a real cluster with the wikipedia pagelink graph (5.7M 
vertices, 130M edges), I ran into strange exceptions like this:

{noformat} 
2011-12-21 12:03:57,015 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from attempt_201112131541_0034_m_27_0: java.lang.IllegalStateException: 
run: Caught an unrecoverable exception flush: Got ExecutionException
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: flush: Got ExecutionException
at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:946)
at 
org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:916)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:588)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:632)
... 7 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalStateException: run: Impossible for no messages in 1603276
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:941)
... 10 more
Caused by: java.lang.IllegalStateException: run: Impossible for no messages in 
1603276
at 
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:245)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat} 

The exception is thrown because a vertex with no message to send to is found in 
the datastructure holding the outgoing messages.

I tracked this behavior down:


In *BasicRPCCommunications:541-546* the map holding the outgoing messages for 
vertices of a particular machine is created. It's stored in two places 
_BasicRPCCommunications.outMessages_ and as member variable 
_outMessagesPerPeer_ of its _PeerConnection_ :

{noformat} 
outMsgMap = new HashMap>();
outMessages.put(addrUnresolved, outMsgMap);

PeerConnection peerConnection = new PeerConnection(outMsgMap, peer, isProxy);
{noformat} 

In case that there are a lot of messages available for a particular vertex, a 
large flush is trigged via _LargeMessageFlushExecutor_ (I guess this only 
happened in the wikipedia test). During this flush the list of messages for the 
vertex is sent out and replaced with an empty list in 
*BasicRPCCommunications:341*

{noformat}
outMessageList = peerConnection.outMessagesPerPeer.get(destVertex);
peerConnection.outMessagesPerPeer.put(destVertex, new MsgList());
{noformat}

Now in the last flush that is trigggered at the end of the superstep we 
encounter an empty message list for the vertex and therefore the exception is 
thrown in *BasicRPCCommunications:228-247*

{noformat}
for (Entry> entry : peerConnection.outMessagesPerPeer.entrySet()) 
{
...
  if (entry.getValue().isEmpty()) {
throw new IllegalStateException(...);
}
{noformat}

Simply removing the list for the vertex when executing the large flush solved 
the issue (patch to come).

I'd like to note that it is generally very dangerous to let different classes 
have access to a datastructure directly and it produces subtle bugs like this. 
It would be better to think of a centralized way of handling the datastructure. 



--
This mes

[jira] [Updated] (GIRAPH-114) Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor

2011-12-21 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-114:
--

Attachment: GIRAPH-114.patch

> Inconsistent message map handling in 
> BasicRPCCommunications.LargeMessageFlushExecutor
> -
>
> Key: GIRAPH-114
> URL: https://issues.apache.org/jira/browse/GIRAPH-114
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Critical
> Attachments: GIRAPH-114.patch
>
>
> I'm currently implementing a simple algorithm to identify all the connected 
> components of a graph. The algorithm ran well in a local IDE unit tests on 
> toy data and in a local single node hadoop instance using a graph of ~100k 
> edges.
> When I tested it on a real cluster with the wikipedia pagelink graph (5.7M 
> vertices, 130M edges), I ran into strange exceptions like this:
> {noformat} 
> 2011-12-21 12:03:57,015 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
> from attempt_201112131541_0034_m_27_0: java.lang.IllegalStateException: 
> run: Caught an unrecoverable exception flush: Got ExecutionException
>   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>   at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: flush: Got ExecutionException
>   at 
> org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:946)
>   at 
> org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:916)
>   at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:588)
>   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:632)
>   ... 7 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IllegalStateException: run: Impossible for no messages in 1603276
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at 
> org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:941)
>   ... 10 more
> Caused by: java.lang.IllegalStateException: run: Impossible for no messages 
> in 1603276
>   at 
> org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:245)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {noformat} 
> The exception is thrown because a vertex with no message to send to is found 
> in the datastructure holding the outgoing messages.
> I tracked this behavior down:
> In *BasicRPCCommunications:541-546* the map holding the outgoing messages for 
> vertices of a particular machine is created. It's stored in two places 
> _BasicRPCCommunications.outMessages_ and as member variable 
> _outMessagesPerPeer_ of its _PeerConnection_ :
> {noformat} 
> outMsgMap = new HashMap>();
> outMessages.put(addrUnresolved, outMsgMap);
> PeerConnection peerConnection = new PeerConnection(outMsgMap, peer, isProxy);
> {noformat} 
>   
> In case that there are a lot of messages available for a particular vertex, a 
> large flush is trigged via _LargeMessageFlushExecutor_ (I guess this only 
> happened in the wikipedia test). During this flush the list of messages for 
> the vertex is sent out and replaced with an empty list in 
> *BasicRPCCommunications:341*
> {noformat}
> outMessageList = peerConnection.outMessagesPerPeer.get(destVertex);
> peerConnection.outMessagesPerPeer.put(destVertex, new MsgList());
> {noformat}
> Now in the last flush that is trigggered at the end of the superstep we 
> encounter an empty message list for the vertex and therefore 

Re: Review Request: GIRAPH-112: Use elements() properly in LongDoubleFloatDoubleVertex

2011-12-20 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3287/#review4036
---

Ship it!


I ran into the same issue yesterday and the solution presented here is correct. 
For reasons of efficiency, list.elements() returns the internal underlying 
array for the list, which might be bigger than the number of elements stored in 
the list. Therefore you should only iterate until list.size() or use the 
foreachKey() callback.

- Sebastian


On 2011-12-21 07:50:20, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3287/
> ---
> 
> (Updated 2011-12-21 07:50:20)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> As pointed out by YuanYua, the array returned by elements() cannot have its 
> length used since the array contains all the elements currently stored in the 
> mahout collections, even including invalid elements between size and capacity.
> 
> Whenever possible I converted elements() into forEach(), forEachKey(), 
> forEachPair().  Used size() in other cases.
> 
> Fixed some formatting violations as well in LongDoubleFloatDoubleVertex.java.
> 
> 
> This addresses bug GIRAPH-112.
> https://issues.apache.org/jira/browse/GIRAPH-112
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
>  1221634 
> 
> Diff: https://reviews.apache.org/r/3287/diff
> 
> 
> Testing
> ---
> 
> Local unittests and MR unittests.
> 
> 
> Thanks,
> 
> Avery
> 
>



[jira] [Commented] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173459#comment-13173459
 ] 

Sebastian Schelter commented on GIRAPH-110:
---

Like it :)

> Add guide to setup the enviroment for running the unit tests in a 
> pseudo-distributed hadoop instance
> 
>
> Key: GIRAPH-110
> URL: https://issues.apache.org/jira/browse/GIRAPH-110
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Fix For: 0.70.0
>
> Attachments: GIRAPH-110.2.patch, GIRAPH-110.patch
>
>
> Giraph should provide a small guide for setting up the local environment to 
> run the unit tests in a pseudo-distributed hadoop instance as there are some 
> non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-110:
--

Attachment: GIRAPH-110.patch

> Add guide to setup the enviroment for running the unit tests in a 
> pseudo-distributed hadoop instance
> 
>
> Key: GIRAPH-110
> URL: https://issues.apache.org/jira/browse/GIRAPH-110
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Fix For: 0.70.0
>
> Attachments: GIRAPH-110.patch
>
>
> Giraph should provide a small guide for setting up the local environment to 
> run the unit tests in a pseudo-distributed hadoop instance as there are some 
> non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Sebastian Schelter (Created) (JIRA)
Add guide to setup the enviroment for running the unit tests in a 
pseudo-distributed hadoop instance


 Key: GIRAPH-110
 URL: https://issues.apache.org/jira/browse/GIRAPH-110
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor


Giraph should provide a small guide for setting up the local environment to run 
the unit tests in a pseudo-distributed hadoop instance as there are some 
non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-109) GiraphRunner should provide support for combiners

2011-12-20 Thread Sebastian Schelter (Created) (JIRA)
GiraphRunner should provide support for combiners
-

 Key: GIRAPH-109
 URL: https://issues.apache.org/jira/browse/GIRAPH-109
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter


Currently there's no way to tell GiraphRunner that you want to use a Combiner. 
A simple option should be added, similar to the way in- and outputformats are 
specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: GIRAPH-106: Change prepareSuperstep() to make setMessages(Iterable messages) package-private

2011-12-19 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3255/#review3972
---

Ship it!


nice and simple!

- Sebastian


On 2011-12-19 08:25:55, Avery Ching wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3255/
> ---
> 
> (Updated 2011-12-19 08:25:55)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> ---
> 
> Added method assignMessagesToVertex() to bypass the package-private access 
> for setMessages().  Cleaned up some missed formatting for GIRAPH-80 as well.
> 
> 
> This addresses bug GIRAPH-106.
> https://issues.apache.org/jira/browse/GIRAPH-106
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
>  1220642 
>   
> http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java
>  1220642 
> 
> Diff: https://reviews.apache.org/r/3255/diff
> 
> 
> Testing
> ---
> 
> Passed local unittests.
> 
> 
> Thanks,
> 
> Avery
> 
>



[jira] [Commented] (GIRAPH-73) A little refactoring

2011-12-18 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171809#comment-13171809
 ] 

Sebastian Schelter commented on GIRAPH-73:
--

You are right. 

The problem is that if there is an exception thrown in the try { ... } block 
and another exception is thrown in the finally { ... } block, than you only get 
to see the second one. That's why people usually choose to swallow and only log 
the exceptions arising from close. 

Should I rework the patch to ensure the job will be failed in case there are 
exceptions in closeQuietly()? 


Here are some details about stream handling in java (it's a mess...)
http://illegalargumentexception.blogspot.com/2008/10/java-how-not-to-make-mess-of-stream.html

> A little refactoring
> 
>
> Key: GIRAPH-73
> URL: https://issues.apache.org/jira/browse/GIRAPH-73
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-73-2.patch, GIRAPH-73.patch
>
>
> Hi, I'm currently reading Giraph's sources and starting to play with it. I 
> fixed some small things along the way (like making sure writers are closed, 
> exceptions are logged, etc.), thought that maybe helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-18 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-105:
--

Attachment: GIRAPH-105-2.patch

updated patch to reflect aching's suggestions, ran local and pseudo distributed 
unit-tests

> BspServiceMaster.checkWorkers() should return empty lists instead of null
> -
>
> Key: GIRAPH-105
> URL: https://issues.apache.org/jira/browse/GIRAPH-105
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-105-2.patch, GIRAPH-105.patch
>
>
> BspServiceMaster.checkWorkers() is invoked in 
> BspServiceMaster.coordinateSuperstep() and in 
> BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
> the job in case something has gone wrong. However, checkWorkers() returns 
> null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-18 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171795#comment-13171795
 ] 

Sebastian Schelter commented on GIRAPH-105:
---

Makes sense. Main issue is to make caller and callee consistent.

> BspServiceMaster.checkWorkers() should return empty lists instead of null
> -
>
> Key: GIRAPH-105
> URL: https://issues.apache.org/jira/browse/GIRAPH-105
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-105.patch
>
>
> BspServiceMaster.checkWorkers() is invoked in 
> BspServiceMaster.coordinateSuperstep() and in 
> BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
> the job in case something has gone wrong. However, checkWorkers() returns 
> null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Running tests in pseudo-distributed mode

2011-12-17 Thread Sebastian Schelter
Wrote a blogpost about it :) Maybe it can be copied to giraph's wiki?

http://ssc.io/running-giraphs-unit-tests-in-pseudo-distributed-mode/

--sebastian


On 17.12.2011 18:38, Avery Ching wrote:
> We should document this somewhere.  It is not intuitive as you mention.
> 
> Avery
> 
> On 12/17/11 1:41 AM, Sebastian Schelter wrote:
>> A small hint for everyone who wants to run giraph's unit tests on a
>> pseudo-distributed single node hadoop cluster:
>>
>> You have to adjust the configuration to allow 4 concurrent map tasks per
>> node (default in hadoop-0.20.203 is 2), otherwise the tests will fail!
>>
>> You have to adjust mapred.tasktracker.map.tasks.maximum and
>> mapred.map.tasks in mapred-site.xml. Took me a while to figure out :)
>>
>> --sebastian
> 



[jira] [Updated] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-17 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-105:
--

Attachment: GIRAPH-105.patch

> BspServiceMaster.checkWorkers() should return empty lists instead of null
> -
>
> Key: GIRAPH-105
> URL: https://issues.apache.org/jira/browse/GIRAPH-105
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-105.patch
>
>
> BspServiceMaster.checkWorkers() is invoked in 
> BspServiceMaster.coordinateSuperstep() and in 
> BspServiceMaster.createInputSplits(). Both check for an empty list to fail 
> the job in case something has gone wrong. However, checkWorkers() returns 
> null in case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-17 Thread Sebastian Schelter (Created) (JIRA)
BspServiceMaster.checkWorkers() should return empty lists instead of null
-

 Key: GIRAPH-105
 URL: https://issues.apache.org/jira/browse/GIRAPH-105
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor


BspServiceMaster.checkWorkers() is invoked in 
BspServiceMaster.coordinateSuperstep() and in 
BspServiceMaster.createInputSplits(). Both check for an empty list to fail the 
job in case something has gone wrong. However, checkWorkers() returns null in 
case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-73) A little refactoring

2011-12-17 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-73:
-

Attachment: GIRAPH-73-2.patch

> A little refactoring
> 
>
> Key: GIRAPH-73
> URL: https://issues.apache.org/jira/browse/GIRAPH-73
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-73-2.patch, GIRAPH-73.patch
>
>
> Hi, I'm currently reading Giraph's sources and starting to play with it. I 
> fixed some small things along the way (like making sure writers are closed, 
> exceptions are logged, etc.), thought that maybe helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Running tests in pseudo-distributed mode

2011-12-17 Thread Sebastian Schelter
A small hint for everyone who wants to run giraph's unit tests on a
pseudo-distributed single node hadoop cluster:

You have to adjust the configuration to allow 4 concurrent map tasks per
node (default in hadoop-0.20.203 is 2), otherwise the tests will fail!

You have to adjust mapred.tasktracker.map.tasks.maximum and
mapred.map.tasks in mapred-site.xml. Took me a while to figure out :)

--sebastian


Re: Review Request: GIRAPH-80 Refactor vertices to not expose the internal datastructure for holding messages

2011-12-17 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3203/
---

(Updated 2011-12-17 09:36:06.229668)


Review request for giraph.


Summary (updated)
---

refactoring that gives BasicVertex this 3 new methods:

public abstract Iterable getMessages()

returns an unmodifiable iterable allowing access to the current messages

public abstract void setMessages(Iterable messages);

replacement for getMsgList().clear() followed by getMsgList().addAll(...);

public abstract void releaseResources();

after a vertex voted to halt, all references to messages it could still hold 
should be removed to enable earlier GC, instead of externally calling 
replacement for getMsgList().clear(), this method should be used

Local unit tests pass, unfortunately I wasn't yet able to run the tests on my 
hadoop cluster (still have problems because I can only access it via a socks 
proxy)


This addresses bug GIRAPH-80.
https://issues.apache.org/jira/browse/GIRAPH-80


Diffs
-

  /trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
1215442 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 
1215442 
  /trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java 1215442 
  /trunk/src/main/java/org/apache/giraph/utils/ComparisonUtils.java 
PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java 1215442 
  /trunk/src/test/java/org/apache/giraph/utils/ComparisonUtilsTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3203/diff


Testing
---


Thanks,

Sebastian



Re: Review Request: Refactor vertices to not expose the internal datastructure for holding messages

2011-12-17 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3203/
---

(Updated 2011-12-17 09:35:51.778265)


Review request for giraph.


Changes
---

Updated the patch to include proposed changes. Unfortunately setMessages(...) 
cannot be package-private as it is invoked by BasicRPCCommunications.

Ran tests in local and pseudo-distributed mode.


Summary
---

refactoring that gives BasicVertex this 3 new methods:

public abstract Iterable getMessages()

returns an unmodifiable iterable allowing access to the current messages

public abstract void setMessages(Iterable messages);

replacement for getMsgList().clear() followed by getMsgList().addAll(...);

public abstract void releaseResources();

after a vertex voted to halt, all references to messages it could still hold 
should be removed to enable earlier GC, instead of externally calling 
replacement for getMsgList().clear(), this method should be used

Local unit tests pass, unfortunately I wasn't yet able to run the tests on my 
hadoop cluster (still have problems because I can only access it via a socks 
proxy)


This addresses bug GIRAPH-80.
https://issues.apache.org/jira/browse/GIRAPH-80


Diffs (updated)
-

  /trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
1215442 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 
1215442 
  /trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1215442 
  /trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java 1215442 
  /trunk/src/main/java/org/apache/giraph/utils/ComparisonUtils.java 
PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java 1215442 
  /trunk/src/test/java/org/apache/giraph/utils/ComparisonUtilsTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3203/diff


Testing
---


Thanks,

Sebastian



[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-12-15 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170299#comment-13170299
 ] 

Sebastian Schelter commented on GIRAPH-80:
--

No problem. I can rework them to fit GIRAPH-104 once it's committed.

> Don't expose the list holding the messages in BasicVertex
> -
>
> Key: GIRAPH-80
> URL: https://issues.apache.org/jira/browse/GIRAPH-80
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>
> I'm currently trying to implement my own memory efficient vertex (similar to 
> LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()
> This method returns a list pointing to the messages of the vertex and it is 
> modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
> This makes it very hard to use something else than a java.util.List 
> internally (LongDoubleFloatDoubleVertex "hacked" around this) and it is 
> generally dangerous to have the internal state of an object be modified 
> externally. It also makes the code harder to read and understand.
> I'd suggest to change the API to let a vertex handle the modifications itself 
> internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-12-15 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170192#comment-13170192
 ] 

Sebastian Schelter commented on GIRAPH-80:
--

It should apply to current trunk, do you experience any problems?

> Don't expose the list holding the messages in BasicVertex
> -
>
> Key: GIRAPH-80
> URL: https://issues.apache.org/jira/browse/GIRAPH-80
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>
> I'm currently trying to implement my own memory efficient vertex (similar to 
> LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()
> This method returns a list pointing to the messages of the vertex and it is 
> modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
> This makes it very hard to use something else than a java.util.List 
> internally (LongDoubleFloatDoubleVertex "hacked" around this) and it is 
> generally dangerous to have the internal state of an object be modified 
> externally. It also makes the code harder to read and understand.
> I'd suggest to change the API to let a vertex handle the modifications itself 
> internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: Refactor vertices to not expose the internal datastructure for holding messages

2011-12-15 Thread Sebastian Schelter

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3203/
---

Review request for giraph.


Summary
---

refactoring that gives BasicVertex this 3 new methods:

public abstract Iterable getMessages()

returns an unmodifiable iterable allowing access to the current messages

public abstract void setMessages(Iterable messages);

replacement for getMsgList().clear() followed by getMsgList().addAll(...);

public abstract void releaseResources();

after a vertex voted to halt, all references to messages it could still hold 
should be removed to enable earlier GC, instead of externally calling 
replacement for getMsgList().clear(), this method should be used

Local unit tests pass, unfortunately I wasn't yet able to run the tests on my 
hadoop cluster (still have problems because I can only access it via a socks 
proxy)


This addresses bug GIRAPH-80.
https://issues.apache.org/jira/browse/GIRAPH-80


Diffs
-

  /trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 
1214675 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1214675 
  /trunk/src/main/java/org/apache/giraph/graph/BasicVertexResolver.java 1214675 
  /trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java 1214675 
  /trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1214675 
  /trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 
1214675 
  /trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1214675 
  /trunk/src/main/java/org/apache/giraph/graph/VertexResolver.java 1214675 
  /trunk/src/main/java/org/apache/giraph/utils/ComparisonUtils.java 
PRE-CREATION 
  /trunk/src/main/java/org/apache/giraph/utils/MemoryUtils.java 1214675 
  /trunk/src/test/java/org/apache/giraph/utils/ComparisonUtilsTest.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3203/diff


Testing
---


Thanks,

Sebastian



[jira] [Updated] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-24 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-51:
-

Attachment: GIRAPH-51-3.patch

fixed the javadoc errors in the patch

> Provide unit testing tool for Giraph algorithms
> ---
>
> Key: GIRAPH-51
> URL: https://issues.apache.org/jira/browse/GIRAPH-51
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>    Assignee: Sebastian Schelter
> Attachments: GIRAPH-51-2.patch, GIRAPH-51-3.patch, GIRAPH-51.patch
>
>
> It would be nice to have a little tool, similar to MRUnit, that would allow 
> Giraph application writers to quickly unit test their algorithms.  The tool 
> could take a Vertex implementation, a set of input and expected output and 
> verify that after the specified number of supersteps, we've gotten what we 
> expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-16 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-51:
-

Attachment: GIRAPH-51-2.patch

Hi Jakob,

I moved InternalVertexRunner to the src as you suggested.

I think we have to separate the unit tests into two categories:

The first one would be testing single methods, usually invoking compute() and 
verifying the behavior of the vertex (I suppose that's what you aimed at with 
your last comment). To accomplish that one need not really run the system, it 
should be sufficient to injected mocked dependencies.  I added two tests for 
SimpleShortestPathVertex as an example for such tests. I created a helper class 
for convenient mocking of dependencies like the hadoop configuration e.g. and 
for configuring the vertex.

The second category, which InternalVertexRunner aims at would be something like 
a local "integration" test on toy data. It runs the system in a single JVM and 
executes the whole lifecycle of an algorithm (reading input from disk, running 
the supersteps, writing output etc). Although these tests are no compensation 
for real integration testing, they are often very helpful in finding subtle 
bugs, that normal unit testing cannot discover.



> Provide unit testing tool for Giraph algorithms
> ---
>
> Key: GIRAPH-51
> URL: https://issues.apache.org/jira/browse/GIRAPH-51
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Sebastian Schelter
> Attachments: GIRAPH-51-2.patch, GIRAPH-51.patch
>
>
> It would be nice to have a little tool, similar to MRUnit, that would allow 
> Giraph application writers to quickly unit test their algorithms.  The tool 
> could take a Vertex implementation, a set of input and expected output and 
> verify that after the specified number of supersteps, we've gotten what we 
> expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Sebastian Schelter
Yes, that fixes it. Thank you!

--sebastian

On 15.11.2011 22:12, Avery Ching wrote:
> This should fix it.  It passed local unittests.  Let me know.
> 
> Avery
> 
> On 11/15/11 1:03 PM, Avery Ching wrote:
>> Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.
>>
>> Avery
>>
>> On 11/15/11 12:42 PM, Sebastian Schelter wrote:
>>> Hi,
>>>
>>> I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
>>> continue to work on GIRAPH-51 where I use a small toy graph to test
>>> SimpleShortestPathVertex.
>>>
>>> Unfortunately my code did not work anymore and I guess I tracked it down
>>> to the fact that vertex that voted to halt are not reacted anymore when
>>> new messages arrive.
>>>
>>> In SimpleShortestPathVertex every vertex always votes to halt and only
>>> gets reactivated when a shorter path to it has been found. However my
>>> test run always finished after superstep 0.
>>>
>>> I don't know too much about Giraph's internals yet, but my guess is that
>>> the number of sent messages is not tracked correctly anymore. Therefore
>>> giraph finishes the algorithm (as all vertices voted to halt) although
>>> there should still be messages in the pipeline.
>>>
>>> I think I tracked it down to this behavior:
>>>
>>> GraphMapper declares a variable workerSentMessages = 0 and never
>>> increases it. This variable is given to
>>> BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
>>> it to compute the GlobalStats afterwards, which are used to decide
>>> whether a new superstep has to be scheduled. As it has never been
>>> increased, the algorithm will always stop when all vertices voted to
>>> halt.
>>>
>>> It would be great if someone could confirm/disprove this speculation and
>>> help me to continue work on GIRAPH-51
>>>
>>> --sebastian
>>
> 



Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Sebastian Schelter
Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to halt.

It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian


[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-11-15 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150367#comment-13150367
 ] 

Sebastian Schelter commented on GIRAPH-80:
--

sendMsg() is used for sending out messages to other vertices. New messages for 
a particular vertex are currently added via getMsgList().clear() and 
getMsgList().add(...) This is bad style as the internal datastructure of the 
vertex is externally modified. It also prevents convenient usage of more memory 
efficient datastructures that don't implement java.util.List like those 
provided by mahout-collections.

> Don't expose the list holding the messages in BasicVertex
> -
>
> Key: GIRAPH-80
> URL: https://issues.apache.org/jira/browse/GIRAPH-80
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>
> I'm currently trying to implement my own memory efficient vertex (similar to 
> LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()
> This method returns a list pointing to the messages of the vertex and it is 
> modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
> This makes it very hard to use something else than a java.util.List 
> internally (LongDoubleFloatDoubleVertex "hacked" around this) and it is 
> generally dangerous to have the internal state of an object be modified 
> externally. It also makes the code harder to read and understand.
> I'd suggest to change the API to let a vertex handle the modifications itself 
> internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-11-14 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150012#comment-13150012
 ] 

Sebastian Schelter commented on GIRAPH-80:
--

I'd also favor using Iterable

> Don't expose the list holding the messages in BasicVertex
> -
>
> Key: GIRAPH-80
> URL: https://issues.apache.org/jira/browse/GIRAPH-80
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>
> I'm currently trying to implement my own memory efficient vertex (similar to 
> LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()
> This method returns a list pointing to the messages of the vertex and it is 
> modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
> This makes it very hard to use something else than a java.util.List 
> internally (LongDoubleFloatDoubleVertex "hacked" around this) and it is 
> generally dangerous to have the internal state of an object be modified 
> externally. It also makes the code harder to read and understand.
> I'd suggest to change the API to let a vertex handle the modifications itself 
> internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-80) Don't expose the list holding the messages in BasicVertex

2011-11-14 Thread Sebastian Schelter (Created) (JIRA)
Don't expose the list holding the messages in BasicVertex
-

 Key: GIRAPH-80
 URL: https://issues.apache.org/jira/browse/GIRAPH-80
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter


I'm currently trying to implement my own memory efficient vertex (similar to 
LongDoubleFloatDoubleVertex) and ran into problems with getMsgList()

This method returns a list pointing to the messages of the vertex and it is 
modified externally (BasicRPCCommunications calls clear() and addAll() e.g.). 
This makes it very hard to use something else than a java.util.List internally 
(LongDoubleFloatDoubleVertex "hacked" around this) and it is generally 
dangerous to have the internal state of an object be modified externally. It 
also makes the code harder to read and understand.

I'd suggest to change the API to let a vertex handle the modifications itself 
internally (e.g. add something like pushMessages(...))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-73) A little refactoring

2011-11-11 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148620#comment-13148620
 ] 

Sebastian Schelter commented on GIRAPH-73:
--

Oh, I didn't know about the type interference JVM bug. I will rework the patch. 
It passed local unittests but I haven't run them on our cluster yet. Will do so 
next week.

> A little refactoring
> 
>
> Key: GIRAPH-73
> URL: https://issues.apache.org/jira/browse/GIRAPH-73
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-73.patch
>
>
> Hi, I'm currently reading Giraph's sources and starting to play with it. I 
> fixed some small things along the way (like making sure writers are closed, 
> exceptions are logged, etc.), thought that maybe helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-11 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-51:
-

Attachment: GIRAPH-51.patch

Hi, I'm currently trying to find a way to easily unit test giraph algorithms as 
suggested here. I plan to implement some complicated stuff on top of Giraph and 
definitely need this functionality for debugging and development.

I want to be able to repeatedly run a Vertex class in a unit test from my IDE 
without having to cleanup directories or rely on an external zookeeper instance.

I managed to get a working prototype of a class that can internally run a 
Vertex. I had to add configuration options to disable the starting of a 
zookeeper instance via ProcessBuilder in ZooKeeperManager as this didn't work 
out of my IDE (Intellij). My testing class simply starts its own zookeeper in 
an extra thread for the duration of the test. This works pretty well and I was 
already able to unit the SimpleShortestPathVertex. 

Although everything seems to work, I'm still getting some strange warnings from 
the local zookeeper instance. As I'm not that familiar with zookeeper yet, I 
wanted ask here whether these warning can simply be ignored?

{quote}
11/11/11 09:55:16 INFO server.PrepRequestProcessor: Got user-level 
KeeperException when processing sessionid:0x13391d6887b0001 type:create 
cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a Error 
Path:/_hadoopBsp/job_local_0001 Error:KeeperErrorCode = NoNode for 
/_hadoopBsp/job_local_0001
{quote}

{quote}
WARN graph.BspService: process: Unknown and unprocessed event 
(path=/_hadoopBsp/job_local_0001/_applicationAttemptsDir/0/_superstepDir, 
type=NodeChildrenChanged, state=SyncConnected)
{quote}

Already putting up a first patch for clarity.

Best,
Sebastian


> Provide unit testing tool for Giraph algorithms
> ---
>
> Key: GIRAPH-51
> URL: https://issues.apache.org/jira/browse/GIRAPH-51
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
> Attachments: GIRAPH-51.patch
>
>
> It would be nice to have a little tool, similar to MRUnit, that would allow 
> Giraph application writers to quickly unit test their algorithms.  The tool 
> could take a Vertex implementation, a set of input and expected output and 
> verify that after the specified number of supersteps, we've gotten what we 
> expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-73) A little refactoring

2011-11-10 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-73:
-

Attachment: GIRAPH-73.patch

> A little refactoring
> 
>
> Key: GIRAPH-73
> URL: https://issues.apache.org/jira/browse/GIRAPH-73
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>    Reporter: Sebastian Schelter
>Priority: Minor
> Attachments: GIRAPH-73.patch
>
>
> Hi, I'm currently reading Giraph's sources and starting to play with it. I 
> fixed some small things along the way (like making sure writers are closed, 
> exceptions are logged, etc.), thought that maybe helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-73) A little refactoring

2011-11-10 Thread Sebastian Schelter (Created) (JIRA)
A little refactoring


 Key: GIRAPH-73
 URL: https://issues.apache.org/jira/browse/GIRAPH-73
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor


Hi, I'm currently reading Giraph's sources and starting to play with it. I 
fixed some small things along the way (like making sure writers are closed, 
exceptions are logged, etc.), thought that maybe helpful.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-10-10 Thread Sebastian Schelter (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124553#comment-13124553
 ] 

Sebastian Schelter commented on GIRAPH-51:
--

It would be great to have something like this, I have my students look at 
giraph currently and one of their first problems was how to write a simple unit 
case for some algorithm they wanna implement.

> Provide unit testing tool for Giraph algorithms
> ---
>
> Key: GIRAPH-51
> URL: https://issues.apache.org/jira/browse/GIRAPH-51
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>
> It would be nice to have a little tool, similar to MRUnit, that would allow 
> Giraph application writers to quickly unit test their algorithms.  The tool 
> could take a Vertex implementation, a set of input and expected output and 
> verify that after the specified number of supersteps, we've gotten what we 
> expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-09-14 Thread Sebastian Schelter (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104363#comment-13104363
 ] 

Sebastian Schelter commented on GIRAPH-21:
--

I'm currently reading a lot of giraph code as I'm evaluating it for usage in 
research and I must admit that 80 chars per line really makes the code hard to 
read. 

Although I'm not involved with your project, I'd suggest 2 space indent and 120 
chars per line. Mahout uses the same.

> Revise CODE_CONVENTIONS
> ---
>
> Key: GIRAPH-21
> URL: https://issues.apache.org/jira/browse/GIRAPH-21
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: GIRAPH-21.diff
>
>
> Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
> fairly sparse and we have been assuming an 80 char limit per line.  It's good 
> to have common conventions so that the code doesn't get too messy.  Does 
> anyone have any opinions on this now?  Probably best to tackle early and then 
> have something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira