Please welcome our newest committer, Igor Kabiljo!

2015-02-10 Thread Maja Kabiljo
I am pleased to announce that Igor Kabiljo has been invited to become a 
committer by the Project Management Committee (PMC) of Apache Giraph, and he 
accepted.

Igor's most important contributions are implementing reduce/broadcast that 
generalizes aggregators and working on primitive message/edge storages that 
make applications more efficient, as well as around using specific partitioners 
that utilize good partitioning. He is also coming up with issues for beginners 
and guiding them along the way. Igor, we are looking forward to your future 
work and deeper involvement in the project.

Thanks,
Maja

List of Igor’s contributions:
GIRAPH-785: Improve GraphPartitionerFactory usage
GIRAPH-786: XSparseVector create a lot of objects in add/write
GIRAPH-848: Allowing plain computation with types being configurable
GIRAPH-934: Allow having state in aggregators
GIRAPH-935: Loosen modifiers when needed
GIRAPH-938: Allow fast working with primitives generically
GIRAPH-939: Reduce/broadcast API
GIRAPH-954: Allow configurable Aggregators/Reducers again
GIRAPH-955: Allow vertex/edge/message value to be configurable
GIRAPH-961: Internals of MasterLoggingAggregator have been incorrectly removed
GIRAPH-965: Improving and adding reducers
GIRAPH-986: Add more stuff to TypeOps
GIRAPH-987: Improve naming for ReduceOperation
Beginner issues he guided:
GIRAPH-891: Make MessageStoreFactory configurable
GIRAPH-895: Trim the edges in Giraph
GIRAPH-921: Create ByteValueVertex to store vertex values as bytes without 
object instance
GIRAPH-988: Allow object to be specified as next Computation in Giraph


Please welcome our newest committer, Sergey Edunov!

2014-12-03 Thread Maja Kabiljo
I am happy to announce that the Project Management Committee (PMC) for Apache 
Giraph has elected Sergey Edunov to become a committer, and he accepted.

Sergey has been an active member of Giraph community, finding issues, 
submitting patches and reviewing code. We’re looking forward to Sergey’s larger 
involvement and future work.

List of his contributions:
GIRAPH-895: Trim the edges in Giraph
GIRAPH-896: Memory leak in SuperstepMetricsRegistry
GIRAPH-897: Add an option to dump only live objects to JMap
GIRAPH-898: Remove giraph-accumulo from Facebook profile
GIRAPH-903: Detect crashes on Netty threads
GIRAPH-924: Fix checkpointing
GIRAPH-925: Unit tests should pass even if zookeeper port not available
GIRAPH-927: Decouple netty server threads from message processing
GIRAPH-933: Checkpointing improvements
GIRAPH-936: Decouple netty server threads from message processing
GIRAPH-940: Cleanup the list of supported hadoop versions
GIRAPH-950: Auto-restart from checkpoint doesn't pick up latest checkpoint
GIRAPH-963: Aggregators may fail with IllegalArgumentException upon 
deserialization

Best,
Maja


Re: [RESULT] [VOTE] Apache Giraph 1.1.0 RC2

2014-11-24 Thread Maja Kabiljo
Thank you for your work on release, Roman!

On 11/18/14, 10:55 AM, Avery Ching ach...@apache.org wrote:

Thanks for pushing this though Roman. Looks great!

On 11/18/14, 4:30 AM, Roman Shaposhnik wrote:
 Hi!

 with 3 binding +1, one non-binding +1,
 no 0s or -1s the vote to publish
 Apache Giraph  1.1.0 RC2 as the 1.1.0 release of
 Apache Giraph passes. Thanks to everybody who
 spent time on validating the bits!

 The vote tally is
+1s:
Claudio Martella (binding)
Maja Kabiljo (binding)
Eli Reisman (binding)
Roman Shaposhnik  (non-binding)

 I'll do the publishing tonight and will send an announcement!

 Thanks,
 Roman (AKA 1.1.0 RM)

 On Thu, Nov 13, 2014 at 5:28 AM, Roman Shaposhnik
ro...@shaposhnik.org wrote:
 This vote is for Apache Giraph, version 1.1.0 release

 It fixes the following issues:
http://s.apache.org/a8X

 *** Please download, test and vote by Mon 11/17 noon PST

 Note that we are voting upon the source (tag):
 release-1.1.0-RC2

 Source and binary files are available at:
 http://people.apache.org/~rvs/giraph-1.1.0-RC2/

 Staged website is available at:
 http://people.apache.org/~rvs/giraph-1.1.0-RC2/site/

 Maven staging repo is available at:
 
https://repository.apache.org/content/repositories/orgapachegiraph-1003

 Please notice, that as per earlier agreement two sets
 of artifacts are published differentiated by the version ID:
* version ID 1.1.0 corresponds to the artifacts built for
   the hadoop_1 profile
* version ID 1.1.0-hadoop2 corresponds to the artifacts
   built for hadoop_2 profile.

 The tag to be voted upon (release-1.1.0-RC1):

https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=log;h=refs/tags/r
elease-1.1.0-RC2

 The KEYS file containing PGP keys we use to sign the release:
 http://svn.apache.org/repos/asf/bigtop/dist/KEYS

 Thanks,
 Roman.




Re: [VOTE] Apache Giraph 1.1.0 RC2

2014-11-13 Thread Maja Kabiljo
+1, thanks Roman!

From: Claudio Martella 
claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, November 13, 2014 at 5:53 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Cc: d...@giraph.apache.orgmailto:d...@giraph.apache.org 
d...@giraph.apache.orgmailto:d...@giraph.apache.org
Subject: Re: [VOTE] Apache Giraph 1.1.0 RC2

+1.

On Thu, Nov 13, 2014 at 2:28 PM, Roman Shaposhnik 
ro...@shaposhnik.orgmailto:ro...@shaposhnik.org wrote:
This vote is for Apache Giraph, version 1.1.0 release

It fixes the following issues:
  
http://s.apache.org/a8Xhttps://urldefense.proofpoint.com/v1/url?u=http://s.apache.org/a8Xk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=98fdbc39e759f0f8254e4120fe3a4a426e7e75bfe43864fa9a914e984c21b102

*** Please download, test and vote by Mon 11/17 noon PST

Note that we are voting upon the source (tag):
   release-1.1.0-RC2

Source and binary files are available at:
   
http://people.apache.org/~rvs/giraph-1.1.0-RC2/https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs/giraph-1.1.0-RC2/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=01b912a347fdd1dfc2c35c01db12a183688c325bce3c0eaa11df1ed1459ee927

Staged website is available at:
   
http://people.apache.org/~rvs/giraph-1.1.0-RC2/site/https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs/giraph-1.1.0-RC2/site/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=0d2cb5857cfadba81d3e1939e04401dceb156f66678d16775eee739b0b1894d3

Maven staging repo is available at:
   
https://repository.apache.org/content/repositories/orgapachegiraph-1003https://urldefense.proofpoint.com/v1/url?u=https://repository.apache.org/content/repositories/orgapachegiraph-1003k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=57e18b19672722ecde807aefeac57da03345594eebf281643487cd51b8662826

Please notice, that as per earlier agreement two sets
of artifacts are published differentiated by the version ID:
  * version ID 1.1.0 corresponds to the artifacts built for
 the hadoop_1 profile
  * version ID 1.1.0-hadoop2 corresponds to the artifacts
 built for hadoop_2 profile.

The tag to be voted upon (release-1.1.0-RC1):
  
https://git-wip-us.apache.org/repos/asf?p=giraph.git;a=log;h=refs/tags/release-1.1.0-RC2https://urldefense.proofpoint.com/v1/url?u=https://git-wip-us.apache.org/repos/asf?p%3Dgiraph.git%3Ba%3Dlog%3Bh%3Drefs/tags/release-1.1.0-RC2k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=3c1f5c6b52e87279c2925a3352fb3fd6b73ef37c725a474962b0c03789104a2f

The KEYS file containing PGP keys we use to sign the release:
   
http://svn.apache.org/repos/asf/bigtop/dist/KEYShttps://urldefense.proofpoint.com/v1/url?u=http://svn.apache.org/repos/asf/bigtop/dist/KEYSk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=dXTURlrjPHD41cP709FuweEHwlsqnR66FLHgQvqgh0I%3D%0As=ed7250c04fdf453e2b35faadab093ecd7fa188ad7cee912177148c5736163659

Thanks,
Roman.



--
   Claudio Martella



Re: [VOTE] Apache Giraph 1.1.0 RC1

2014-11-03 Thread Maja Kabiljo
We¹ve been running code which is the same as release candidate plus fix on
GIRAPH-961 in production for 5 days now, no problems. This is
hadoop_facebook profile, using only hive-io from all io modules.

On 11/1/14, 3:49 PM, Roman Shaposhnik ro...@shaposhnik.org wrote:

Ping! Any progress on testing the current RC?

Thanks,
Roman.

On Fri, Oct 31, 2014 at 9:00 AM, Claudio Martella
claudio.marte...@gmail.com wrote:
 Oh, thanks for the info!

 On Fri, Oct 31, 2014 at 3:06 PM, Roman Shaposhnik ro...@shaposhnik.org
 wrote:

 On Fri, Oct 31, 2014 at 3:26 AM, Claudio Martella
 claudio.marte...@gmail.com wrote:
  Hi Roman,
 
  thanks again for this. I have had a look at the staging site so far
(our
  cluster has been down whole week... universities...), and I was
  wondering if
  you have an insight why some of the docs are missing, e.g. gora and
  rexster
  documentation.

 None of them are missing. The links moved to a User Docs - Modules
 though:

https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs
/giraph-1.1.0-RC1/site/gora.htmlk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg
8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=8PzjCy0QzsbRm9lbAnj
1Sreanb81jw%2FnRRX1Zju8ZvM%3D%0As=aabb0575b0830bb2c1b05645279b426e8789e
fca3a6049073b214e2fbf832ec7

https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs
/giraph-1.1.0-RC1/site/rexster.htmlk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=
RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=8PzjCy0QzsbRm9lb
Anj1Sreanb81jw%2FnRRX1Zju8ZvM%3D%0As=08f4a813900872e6085eea6d3569bf7db0
078c050d4aac784c0d61ba8f70504d
 and so forth.

 Thanks,
 Roman.




 --
Claudio Martella




Re: [VOTE] Apache Giraph 1.1.0 RC1

2014-10-29 Thread Maja Kabiljo
Roman, again thanks for taking care of the release.

We found one issue https://issues.apache.org/jira/browse/GIRAPH-961 - any
application using MasterLoggingAggregator fails without this fix. Can we
backport it to the release?

Thanks,
Maja

On 10/26/14, 12:25 AM, Roman Shaposhnik ro...@shaposhnik.org wrote:

This vote is for Apache Giraph, version 1.1.0 release

It fixes the following issues:
  
https://urldefense.proofpoint.com/v1/url?u=http://s.apache.org/a8Xk=ZVNjl
DMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJod
KY%3D%0Am=vuRjLsufH81dyuj4l%2BAg5A3PGGSsvGxYyv1pMk0nLgA%3D%0As=cbb0287e4
058ca62c8f79b319121949555bdfba84aaf17a71e3191bde97e8110

*** Please download, test and vote by Mon 11/3 noon PST

Note that we are voting upon the source (tag):
   release-1.1.0-RC1

Source and binary files are available at:
   
https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs/g
iraph-1.1.0-RC1/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD
1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=vuRjLsufH81dyuj4l%2BAg5A3PGGSsvGxYyv1pM
k0nLgA%3D%0As=d120562d308957a39a2f31942102a17fc1c243c3450f92793536568d601
7230c

Staged website is available at:
   
https://urldefense.proofpoint.com/v1/url?u=http://people.apache.org/~rvs/g
iraph-1.1.0-RC1/site/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K9
5hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=vuRjLsufH81dyuj4l%2BAg5A3PGGSsvGxY
yv1pMk0nLgA%3D%0As=3dfee9d283999801f5c4c52b6ab99ef60bfa517884f693b8b31124
2bb1371871

Maven staging repo is available at:
   
https://urldefense.proofpoint.com/v1/url?u=https://repository.apache.org/c
ontent/repositories/orgapachegiraph-1002k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A
r=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=vuRjLsufH81dyuj
4l%2BAg5A3PGGSsvGxYyv1pMk0nLgA%3D%0As=fb75b2940fd8ea0b956b85303801d08020b
be748d0d93ec82cc2cf18729f9b74

Please notice, that as per earlier agreement two sets
of artifacts are published differentiated by the version ID:
  * version ID 1.1.0 corresponds to the artifacts built for
 the hadoop_1 profile
  * version ID 1.1.0-hadoop2 corresponds to the artifacts
 built for hadoop_2 profile.

The tag to be voted upon (release-1.1.0-RC1):
   
https://urldefense.proofpoint.com/v1/url?u=https://git-wip-us.apache.org/r
epos/asf?p%3Dgiraph.git%3Ba%3Dcommit%3Bh%3D1f0fc23c26ce3addb746e3e57cc155f
82afbab87k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1C
Q%2BbcclArMcjzJodKY%3D%0Am=vuRjLsufH81dyuj4l%2BAg5A3PGGSsvGxYyv1pMk0nLgA%
3D%0As=1b7468aca53943fc0794e58a1b884342dae353c5da294d32e9b698cf0abfe254

The KEYS file containing PGP keys we use to sign the release:
   
https://urldefense.proofpoint.com/v1/url?u=http://svn.apache.org/repos/asf
/bigtop/dist/KEYSk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnY
D1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=vuRjLsufH81dyuj4l%2BAg5A3PGGSsvGxYyv1p
Mk0nLgA%3D%0As=5f9023144fb07f177de59c5843259cbd8dfb9ba0a2d25444140781a36f
b45010

Thanks,
Roman.



Re: Running one compute function after another..

2014-01-11 Thread Maja Kabiljo
Hi Jyoti,

A cleaner way to do this is to switch Computation class which is used in the 
moment your condition is satisfied. So you can have an aggregator to check 
whether the condition is met, and then in your MasterCompute you call 
setComputation(SecondComputationClass.class) when needed.

Regards,
Maja

From: Jyoti Yadav 
rao.jyoti26ya...@gmail.commailto:rao.jyoti26ya...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Saturday, January 11, 2014 10:48 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Running one compute function after another..

Hi ? ??π???...
I will go by this..
Thanks...


On Sat, Jan 11, 2014 at 10:52 PM, ? ??π??? 
ikapo...@csd.auth.grmailto:ikapo...@csd.auth.gr wrote:
Hey,

You can have a boolean variable initially set to true(or false, whatever). Then 
you divide your code based on the value of that variable with an if-else 
statement. For my example, if the value is true then it goes through the first 
'if'. When the condition you want is fullfilled, change the value of the 
variable to false (at all nodes) and then the second part will be executed.

Ilias

 11/1/2014 6:18 ??, ?/? Jyoti Yadav ??:

Hi folks..


In my algorithm,all vertices execute one compute function upto certain 
condition, when that condition is fulfilled,i want that all vertices now 
execute another compute function.Is it possible??

Any ideas are highly appreciated..

Thanks
Jyoti




Re: About writing our own aggregator..

2014-01-09 Thread Maja Kabiljo
Hi Jyoti,

You can take a look inside of org.apache.giraph.aggregators package, there are 
many implementations already there. Some simple, like LongSumAggregator, and 
some more complex ones inside of matrix package. Please look through that and 
let me know if you need additional help.
When you manage to implement this, you can also contribute it back to Giraph!

Maja

From: Jyoti Yadav 
rao.jyoti26ya...@gmail.commailto:rao.jyoti26ya...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, January 9, 2014 12:23 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: About writing our own aggregator..

Hi Folks...
I am trying to implement one graph algorithm on giraph.
I want that all vertices send their ids to the master .For that I need to 
implement my own aggregator class.
Please suggest me how to proceed...

Thanks
Jyoti


Re: Problem with Giraph (please help me)

2014-01-09 Thread Maja Kabiljo
Hi Chadi,

That does seem like a serialization issue. Which OutEdges class are you using, 
is it something you implemented?

Regards,
Maja

From: chadi jaber chadijaber...@hotmail.commailto:chadijaber...@hotmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, January 9, 2014 2:08 AM
To: Lukas Nalezenec 
lukas.naleze...@firma.seznam.czmailto:lukas.naleze...@firma.seznam.cz, 
user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: Problem with Giraph (please help me)

Hello Lukas
I have enclosed in my previous emails the exception. It seems to be a 
serialization issue (This occurs only when workers  1)

...
2013-12-31 16:27:33,494 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 4 connections, (4 total connected) 0 
failed, 0 failures total.
2013-12-31 16:27:33,501 INFO org.apache.giraph.worker.BspServiceWorker: 
loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 1 total splits.
2013-12-31 16:27:33,508 INFO org.apache.giraph.comm.SendPartitionCache: 
SendPartitionCache: maxVerticesPerTransfer = 1
2013-12-31 16:27:33,508 INFO org.apache.giraph.comm.SendPartitionCache: 
SendPartitionCache: maxEdgesPerTransfer = 8
2013-12-31 16:27:33,524 INFO org.apache.giraph.worker.InputSplitsCallable: 
call: Loaded 0 input splits in 0.020270009 secs, (v=0, e=0) 0.0 vertices/sec, 
0.0 edges/sec
2013-12-31 16:27:33,527 INFO org.apache.giraph.comm.netty.NettyClient: 
waitAllRequests: Finished all requests. MBytes/sec sent = 0, MBytes/sec 
received = 0, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, ave 
received req MBytes = 0, secs waited = 0.656
2013-12-31 16:27:33,527 INFO org.apache.giraph.worker.BspServiceWorker: setup: 
Finally loaded a total of (v=0, e=0)
2013-12-31 16:27:33,598 INFO 
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server window 
metrics MBytes/sec sent = 0, MBytes/sec received = 0, MBytesSent = 0, 
MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, secs 
waited = 0.816
2013-12-31 16:27:33,605 WARN 
org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
Channel failed with remote address /172.16.45.53:59257
java.io.EOFException
at 
org.jboss.netty.buffer.ChannelBufferInputStream.checkAvailable(ChannelBufferInputStream.java:231)
at 
org.jboss.netty.buffer.ChannelBufferInputStream.readInt(ChannelBufferInputStream.java:174)
at org.apache.giraph.edge.ByteArrayEdges.readFields(ByteArrayEdges.java:172)
at 
org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:480)
at 
org.apache.giraph.utils.WritableUtils.readVertexFromDataInput(WritableUtils.java:511)
at 
org.apache.giraph.partition.SimplePartition.readFields(SimplePartition.java:126)
at 
org.apache.giraph.comm.requests.SendVertexRequest.readFieldsRequest(SendVertexRequest.java:66)
at 
org.apache.giraph.comm.requests.WritableRequest.readFields(WritableRequest.java:120)
at 
org.apache.giraph.comm.netty.handler.RequestDecoder.decode(RequestDecoder.java:92)
at 
org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:72)
at 
org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

the code for my vertex compute function :

public class MergeVertex extends
VertexLongWritable,DoubleWritable, DoubleWritable, NodeMessage {

...

/***
 * Convert a Vertex Id from its LongWritable format to Point format (2 Element 
Array Format)
 * @param lng LongWritable Format of the VertexId
 * @return Alignment point Array
 */
public static int[] cvtLongToPoint(LongWritable lng){
int[] point={0,0};

point[0]=(int) (lng.get()/1000);
point[1]=(int) (lng.get()% 1000);

return point;
}

@Override
public void compute(IterableNodeMessage messages) throws IOException {

int currentId[]= cvtLongToPoint(getId());

if (getSuperstep()==0) {

//NodeValue nv=new NodeValue();
setValue(new DoubleWritable(0d));
}


_signallength=getContext().getConfiguration().getInt(SignalLength,0);


if((getSuperstep()  _signallength  getId().get()!=0L) || (getSuperstep()== 0 
 getId().get()==0L)){

LongWritable dstId=new LongWritable();

//Nodes which are on Graph Spine //Remaining Edges Construction
if(currentId[0]== currentId[1]){

//right Side
for (int i=currentId[1]+1;i_signallength;i++){
dstId=cvtPointToLong(currentId[0]+1,i);
addVertexRequest(dstId,new DoubleWritable(Double.MAX_VALUE));
addEdgeRequest(getId(),EdgeFactory.create(dstId, new 
DoubleWritable(computeCost(getId(),dstId;
}

//Left Side
for (int i=currentId[0]+2;i_signallength;i++){
dstId=cvtPointToLong(i,currentId[1]+1);
addVertexRequest(dstId,new 

Re: A Vertex Holds Other Than Text

2014-01-08 Thread Maja Kabiljo
Hi Agrta,

Take a look at IntIntTextVertexValueInputFormat for example, where vertex 
values are ints. If your vertex values are complex objects, you need to create 
a class which implements Writable interface which is going to hold all your 
data, and then extend the input format to read all the data you have. Hope this 
helps, if not please give us some more details about what you are trying to do.

Regards,
Maja

From: Agrta Rawat agrta.ra...@gmail.commailto:agrta.ra...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Wednesday, January 1, 2014 3:53 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: A Vertex Holds Other Than Text

Hi,
 I am implementing an algorithm in which a Vertex needs to hold its values in a 
class other than Text (as the value of a vertex is a record).
 I am trying to make use of VertexValueInputFormat but can't reach solution.
 My Giraph version is 1.0.0.
 Kindly help me in resolving this issue.

regards,

Agrta Rawat



Re: Extending AbstractComputation

2014-01-08 Thread Maja Kabiljo
Hi Pushparaj and Peter,

There is going to be one Computation per partition in each of the supersteps. 
Each partition is processed by a single thread, so accessing any data inside of 
your Computation is thread-safe. Multiple threads are going to be executing 
computation on multiple partitions, and therefore not interfere with each 
other. The only part which you have to worry about synchronization is if you 
are using pre/postSuperstep and accessing some global data from WorkerContext.

Regards,
Maja

From: Peter Grman peter.gr...@gmail.commailto:peter.gr...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Monday, December 23, 2013 3:03 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Extending AbstractComputation

I don't know the exact logic, maybe somebody who does could elaborate on that, 
but I noticed that it was used multiple times for different Nodes, I would 
think that it is used as a pool to minimize the number of object created, am I 
right here?

The question I would add, can it be that the compute function is called 
concurrently on multiple objects or is it really a pool and the calls to the 
function don't interfere with each other?

Thank
Peter

---
Imagination is more important than knowledge. For knowledge is limited, whereas 
imagination embraces the entire world, stimulating progress, giving birth to 
evolution. It is, strictly speaking, a real factor in scientific research.
- Albert Einstein


On Mon, Dec 23, 2013 at 8:56 PM, Pushparaj Motamari 
pushpara...@gmail.commailto:pushpara...@gmail.com wrote:
Hi,

The class we write extending AbstractComputation, is instantiated one per 
worker?

Thanks

Pushparaj



Re: MultiVertexInputFormat

2013-08-21 Thread Maja Kabiljo
Hi Yasser,

You can do this through the Configuration parameters. You should call:
description1.addParameter(myApplication.vertexInputPath, file1.txt);
and
description2.addParameter(myApplication.vertexInputPath, file2.txt);
Then from the code of your InputFormat class you can get this parameter from 
Configuration. If it's not already, make sure your InputFormat implements 
ImmutableClassesGiraphConfigurable, and configuration is going to be set in it 
automatically.

You can also take a look at HiveGiraphRunner which uses multiple inputs and 
sets parameters user passes from command line.

Hope this helps,
Maja

From: Yasser Altowim 
yasser.alto...@ericsson.commailto:yasser.alto...@ericsson.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Monday, August 19, 2013 9:16 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: MultiVertexInputFormat

Hi Guys,

 Any help on this will be appreciated. I am repeating my question and my 
code below:


I am implementing an algorithm in Giraph that reads the vertex values from two 
input files, each has its own format. I am not using  any EdgeInputFormatClass. 
I am now using VertexInputFormatDescription along with MultiVertexInputFormats, 
but still could not figure out how to set the Vertex input path for each Input 
Format Class. Can you please take a look at my code below and show me how to 
set the Vertex Input Path? I have taken a look at HiveGiraphRunner but still no 
luck. Thanks

if (null == getConf()) {
conf = new Configuration();
}

GiraphConfiguration gconf = new GiraphConfiguration(getConf());
int workers = Integer.parseInt(arg0[2]);
gconf.setWorkerConfiguration(workers, workers, 100.0f);

ListVertexInputFormatDescription vertexInputDescriptions = 
Lists.newArrayList();

// Input one
VertexInputFormatDescription description1 = new 
VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class);
// how to set the vertex input path? i.e. how to say that I want to read 
file1.txt using this input format class
vertexInputDescriptions.add(description1);

// Input two
VertexInputFormatDescription description2 = new 
VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class);
// how to set the vertex input path?
vertexInputDescriptions.add(description2);


GiraphConstants.VERTEX_INPUT_FORMAT_CLASS.set(gconf,

MultiVertexInputFormat.class);

VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions));

gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class);
gconf.setComputationClass(UseCase1Vertex.class);
GiraphJob job = new GiraphJob(gconf, Use Case 1);
FileOutputFormat.setOutputPath(job.getInternalJob(), new Path(arg0[1]));
return job.run(true) ? 0 : -1;


Thanks in advance.

Best,
Yasser

From: Yasser Altowim [mailto:yasser.alto...@ericsson.com]
Sent: Friday, August 16, 2013 11:36 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: MultiVertexInputFormat

Thanks a lot Avery for your response. I am now using 
VertexInputFormatDescription, but still could not figure out how to set the 
Vertex input path. I just need to read the vertex values from two different 
files, each with its own format. I am not using  any EdgeInputFormatClass.

 Can you please take a look at my code below and show me how to set the 
Vertex Input Path? Thanks


if (null == getConf()) {
conf = new Configuration();
   }

   GiraphConfiguration gconf = new GiraphConfiguration(getConf());
   int workers = Integer.parseInt(arg0[2]);
   gconf.setWorkerConfiguration(workers, workers, 100.0f);



   ListVertexInputFormatDescription vertexInputDescriptions = 
Lists.newArrayList();

   // Input one
   VertexInputFormatDescription description1 = new 
VertexInputFormatDescription(UseCase1FirstVertexInputFormat.class);
   // how to set the vertex input path?
   vertexInputDescriptions.add(description1);

  // Input two
   VertexInputFormatDescription description2 = new 
VertexInputFormatDescription(UseCase1SecondVertexInputFormat.class);
   // how to set the vertex input path?
   vertexInputDescriptions.add(description2);


  
VertexInputFormatDescription.VERTEX_INPUT_FORMAT_DESCRIPTIONS.set(gconf,InputFormatDescription.toJsonString(vertexInputDescriptions));


   gconf.setVertexOutputFormatClass(UseCase1OutputFormat.class);
   gconf.setComputationClass(UseCase1Vertex.class);
   GiraphJob job = new GiraphJob(gconf, Use Case 1);
   FileOutputFormat.setOutputPath(job.getInternalJob(), new 
Path(arg0[1]));
 

Re: Multiple Data Sources

2013-07-16 Thread Maja Kabiljo
Hi Tom,

We recently added something like this, please take a look at 
MultiVertexInputFormat. That one can basically wrap any number of vertex input 
formats, coming from any sources. You can also take a look at HiveGiraphRunner 
to see how it's used there. As for multiple vertex types, we don't have that 
directly supported, but you can have some variable describing the vertex type 
inside of your vertex value.

Hope this helps, please let us know if you have any questions!

Maja

From: Tom M thnyanmthn...@gmail.commailto:thnyanmthn...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Monday, July 15, 2013 9:54 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Multiple Data Sources

Hi,

I am a new to Giraph. I am working on implementing a graph algorithm that 
first reads vertex values from multiple sources (HDFS, MySQL). So basically, I 
would have two types of vertices, values of each vertex type can be read from a 
different data source. I know that, in MR, we can use DBInputFormat to retrieve 
tuples from RDBMS for example, and then join them with data read from HDFS. My 
question, can we do that in Giraph? i.e. can the graph be constructed from 
different data sources? Thanks a lot in advance.

Best,
Tom


Re: Regarding multiple values of a vertex

2013-07-09 Thread Maja Kabiljo
Hi Harsh,

The other thing you can do at the moment is make another implementation of 
Partition (similar to SimplePartition) which is going to do a different thing 
when duplicate vertex is encountered, and then set giraph.partitionClass to 
your Partition.

Maja

From: Alessandro Presta alessan...@fb.commailto:alessan...@fb.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Tuesday, July 9, 2013 10:57 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Regarding multiple values of a vertex

Hi Harsh,

It's currently not possible to combine multiple vertex values, but it is on our 
roadmap.
For now, you could try using MapReduce to aggregate those values before you 
feed them to the Giraph job.

Alessandro

From: Harsh Rathi harsh.c...@gmail.commailto:harsh.c...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Tuesday, July 9, 2013 12:24 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Regarding multiple values of a vertex

Hi All,

I am taking input graph in the form of 2 separate files Edge-List and 
Vertex-List.
In Vertex-List file, a vertex can have multiple values (value of vertex is of 
text format) i.e. there can be multiple entries of vertex-value pair for same 
vertex.

While taking input of a vertex in Giraph, it checks whether if vertex is 
already present in graph, then it replaces the old value with  new value of 
vertex. I want to append all the vertex values for the same vertex (String 
format).

I can do it by changing the giraph-core's source code. But, I am looking for a 
solution in which while taking input using vertex-input class, it is possible 
retrieve old value of that vertex. Is it possible to do what I am proposing ? 
Can I retrieve the value of vertex using Vertex Id in vertex-input class ?



Thanks


Harsh Rathi
IIT Delhi


Re: Are new vertices active?

2013-07-01 Thread Maja Kabiljo
Hi Christian,

As javadoc for getTotalNumVertices() says, it returns the number of vertices 
which existed in previous superstep, so newly created vertices are not going to 
be counted there.

In the code mutations are applied before the next superstep starts. The way 
it's currently implemented, vertices created during last superstep won't exist 
during output. That being said, I don't know if we wanted it that way, or it 
just turned out like that since nobody thought about that case.

Maja

From: Christian Krause m...@ckrause.orgmailto:m...@ckrause.org
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, June 27, 2013 4:59 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Are new vertices active?

Thank you, Claudio.

Regarding the last point: I am mutating the graph in superstep N, and in N+1 I 
am logging the total number of vertices and halt all nodes. When I am doing it 
like this, I don't get the updated number of vertices. However, if I wait one 
more superstep, I get the correct number. Strange..

Cheers,
Christian


2013/6/26 Claudio Martella 
claudio.marte...@gmail.commailto:claudio.marte...@gmail.com
Hi,

inline are my (tentative) answers.


On Wed, Jun 26, 2013 at 6:34 PM, Christian Krause 
m...@ckrause.orgmailto:m...@ckrause.org wrote:
Hi,

if I create new vertices, will they be executed in the next superstep? And does 
it make a difference whether I create them using addVertexRequest() or 
sendMessage()?

The vertex will be active. The case of a sendMessage is intuitive, because a 
message wakens up a vertex.


Another question: if I mutate the graph in superstep X and X is the last 
superstep, will the changes be executed? It is not clear to me whether the 
graph changes are executed during or before the next superstep.

I'm actually not sure about our internal implementation, somebody can shade 
light on this, but I'd expect it to be running due to above (presence of active 
vertices).


And related to the last question, if I mutate the graph in superstep X, and I 
call getTotalNumVertices() in the next step, can I expect the updated number of 
vertices, or the number of vertices before the mutation?

The mutatiations are applied at the end of a superstep and are visibile in the 
following one. Hence in s+1 you'd see the new number of vertices.


Sorry for these very basic questions, but I did not find any documentation on 
these details. If this is documented somewhere, it would be helpful to get a 
link.

Cheers,
Christian



--
   Claudio Martella
   claudio.marte...@gmail.commailto:claudio.marte...@gmail.com



Re: What if the resulting graph is larger than the memory?

2013-05-17 Thread Maja Kabiljo
Hi JU,

One thing you can try is to use out-of-core graph (giraph.useOutOfCoreGraph 
option).

I don't know what your exact use case is – do you have the graph which is huge 
or the data which you calculate in your application is? In the second case, 
there is 'giraph.doOutputDuringComputation' option you might want to try out. 
When that is turned on, during each superstep writeVertex will be called 
immediately after compute for that vertex is called. This means that you can 
store data you want to write in vertex, write it and clear the data before 
going to the next vertex.

Maja

From: Han JU ju.han.fe...@gmail.commailto:ju.han.fe...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Friday, May 17, 2013 8:38 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: What if the resulting graph is larger than the memory?

Hi,

It's me again.
After a day's work I've coded a Giraph solution for my problem at hand. I gave 
it a run on a medium dataset and it's notably faster than other approaches.

However the goal is to process larger inputs, for example I've a larger dataset 
that the result graph is about 400GB when represented in edge format and in 
text file. And I think the edges that the algorithm created all reside in the 
cluster's memory. So it means that for this big dataset, I need a cluster with 
~ 400GB main memory to run? Is there any possibilities that I can output on 
the go that means I don't need to construct the whole graph, an edge is 
outputed to HDFS immediately instead of being created in main memory then be 
outputed?

Thanks!
--
JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
 GI06 - Fouille de Données et Décisionnel

+33 061960


Re: Broadcast of large aggregated value is slow.

2013-05-16 Thread Maja Kabiljo
Eric,

Can you please take a look at the logs of one of the workers listed (13, 34, 
38, 50, 48, 52, 58, 56), what are they doing? The fact that a worker is waiting 
on aggregator can have different causes, it doesn’t necessarily mean that 
sending aggregators is slow. It can for example mean that some workers finished 
computing before others and are now waiting for others to finish and send their 
data.
How big are aggregators which you are using?

Thanks,
Maja

From: Eric Kimbrel 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:00 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast of large aggregated value is slow.

From the attached logs in original post, you can see that both workers use 
about 4 seconds of compute time on super step 4, but they complete super step 
4 about 10 minutes apart.


Eric Kimbrel
Software Engineer I Data Fusion  Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: eric.kimb...@soteradefense.commailto:first.l...@soteradefense.com
w: 
www.potomacfusion.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0As=206a9bd1407d0a4e7cdc6007d5c113baf96438de1c17043e501877ff185a6a3c
 | 
www.soteradefense.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=rMjEN5TrXaS2BX1KqSuqFERFV5ssM40qL4bcaGFCtvE%3D%0As=e2806a46969606798541933625edcd907e560f71b173ad03f7eda8fb18ff175a
Agility. Ingenuity. Integrity.


From: Eric Kimbrel 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 1:50 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Broadcast of large aggregated value is slow.


I have an giraph job in which the Master will read a chunk of a file from HDFS, 
and then use an aggregator to broadcast the data to all vertices.  No other 
messages are sent, and no vertices aggregate values, only the master.

In the attached logs you can see that the time spent to broadcast the data to 
all vertices is slow, and seems to be hanging up somehwere.  It appears that 
the majority of workers receive the data in 10-15 seconds, but then nothing 
happens for around 10 minutes.  Log snippet shown below

Is there a known reason why transmitting this data during the synchronization 
is taking so long, or anything that can be done to speed it up?


2013-05-16 11:09:03,041 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 30 more tasks to send their aggregator data
2013-05-16 11:09:14,444 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 10 more tasks to send their aggregator 
data, task ids: [13, 20, 22, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:25,190 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:09:45,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:05,191 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:15,192 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:35,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:10:55,193 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:05,194 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:25,195 INFO org.apache.giraph.utils.TaskIdsPermitsBarrier: 
waitForRequiredPermits: Waiting for 8 more tasks to send their aggregator data, 
task ids: [13, 34, 38, 50, 48, 52, 58, 56]
2013-05-16 11:11:45,196 INFO 

Re: Broadcast of large aggregated value is slow.

2013-05-16 Thread Maja Kabiljo
Thanks, Eric. I'm not sure what's going on, it's strange that there are a 
couple of machines which wait for aggregators for a very long time, but then in 
exactly same moment they receive them. Can you send us the code for Aggregator 
which you are using? Do you know approximately how big aggregators are and how 
many of them do you have?

Maja

From: Eric Kimbrel 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:36 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast of large aggregated value is slow.

My apologies. You are correct, I attached the wrong long.   Correct one 
attached here.


Eric Kimbrel
Software Engineer I Data Fusion  Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: eric.kimb...@soteradefense.commailto:first.l...@soteradefense.com
w: 
www.potomacfusion.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=JneyqIVoubY0J4ko9BK2DwfsA%2BN6Qy8nBTZj%2BVg78Uw%3D%0As=9cf5380dcf55ff5999b2f5b12c9a93b206777c47e775a949e9da6f6a8ce4f173
 | 
www.soteradefense.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=JneyqIVoubY0J4ko9BK2DwfsA%2BN6Qy8nBTZj%2BVg78Uw%3D%0As=70d18c70634f46634d557cf4f36276e3e5936b40e403d69a1ac10e3e4e5ff52b
Agility. Ingenuity. Integrity.


From: Maja Kabiljo majakabi...@fb.commailto:majakabi...@fb.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:25 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast of large aggregated value is slow.
Resent-From: 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com

Eric,

Can you please check it again, in both logs you attached we are waiting on the 
worker 13 to send data, so none of those can't be worker 13's log.

Maja

From: Eric Kimbrel 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:15 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast of large aggregated value is slow.

One of the attached logs is worker 13,  During this time period it is waiting 
for an aggregator request so that it can start the super step.


Eric Kimbrel
Software Engineer I Data Fusion  Analytics
Sotera Defense Solutions, Inc.
o: 360-516-6621
c: 360-990-1873
e: eric.kimb...@soteradefense.commailto:first.l...@soteradefense.com
w: 
www.potomacfusion.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.potomacfusion.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=IVLhuSbQeHVpz2XEdAMnlmA5DbtqWgrwg930PpuMQoQ%3D%0As=b933c5068d68b34f5bfbac0db0f8eb919a01dacd3555330fe3147bbf53399d72
 | 
www.soteradefense.comhttps://urldefense.proofpoint.com/v1/url?u=http://www.soteradefense.com/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=RGg8bUFUf%2FM2K95hnYD1RGWK1CQ%2BbcclArMcjzJodKY%3D%0Am=IVLhuSbQeHVpz2XEdAMnlmA5DbtqWgrwg930PpuMQoQ%3D%0As=f5fe0f489b7bfc207fb44206c50b2f74d0763169d0db18557586da3ce1d83443
Agility. Ingenuity. Integrity.


From: Maja Kabiljo majakabi...@fb.commailto:majakabi...@fb.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:11 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast of large aggregated value is slow.
Resent-From: 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com

Eric,

Can you please take a look at the logs of one of the workers listed (13, 34, 
38, 50, 48, 52, 58, 56), what are they doing? The fact that a worker is waiting 
on aggregator can have different causes, it doesn’t necessarily mean that 
sending aggregators is slow. It can for example mean that some workers finished 
computing before others and are now waiting for others to finish and send their 
data.
How big are aggregators which you are using?

Thanks,
Maja

From: Eric Kimbrel 
eric.kimb...@soteradefense.commailto:eric.kimb...@soteradefense.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, May 16, 2013 2:00 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Broadcast

Re: Custom halt condition

2013-03-29 Thread Maja Kabiljo
Hi Nicolas,

You are right, using aggregators and master compute is the way to go.
Please take a look at
https://cwiki.apache.org/confluence/display/GIRAPH/Aggregators
to learn more about aggregators. From the MasterCompute.compute() you will
be calling haltComputation() when you decided it's time to do so.
Please let me know if you have any questions.

Maja

On 3/29/13 3:16 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote:

Hi,

In my use case (implementation of affinity propagation) I want to halt
the computation if at least of minimum of vertex has voted to halt. As
far as I understand the default is to halt if all vertex has voted to
halt and no messages are sent between vertices. But in my use case, even
if a vertex has voted to halt, it must sent and receive message in case
there is a next superstep. And with some of my data, some vertex makes a
lot of superstep to converge and vote to halt. Which I don't care much if
there are a little percentage of theses.

My current implementation create a fake master vertex which is
gathering the convergence of all vertices via messages. And once that
master decide it is time to halt the computation, it sends a message to
all vertices so they all halt.

But I have seen some thread here about some master compute, I have seen
some code about aggregators, so I guess there is some smarter way of
implementing this ?

Nicolas




Re: Waiting for times required to be 19 (currently 18)

2013-02-21 Thread Maja Kabiljo
Nate,

Are all the workers waiting for request from the same worker? (in the log 
waitSomeRequests: Waiting for request destTask is what you should look at) If 
so, check if there is some exception on that worker. You can also try 
decreasing giraph.maxRequestMilliseconds and see what happens after the request 
gets resent. Please let us know what you find out!

Maja

From: Nate touring_...@msn.commailto:touring_...@msn.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 11:16 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: Waiting for times required to be 19 (currently 18)

Hello Maja,

Thank you for your reply and link to the issue.
I last updated the code this week, and do infact have that issue checked-out in 
my local copy of the source.  My compiled jar file of giraph-core is dated Feb 
18th (three days ago).

I will do another update from Git very soon and build and test again to be sure 
that the fix is in place and report back if the behavior changes.

Thank you,
Nate


From: majakabi...@fb.commailto:majakabi...@fb.com
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Waiting for times required to be 19 (currently 18)
Date: Thu, 21 Feb 2013 17:48:24 +

Hi Nate,

When did you take the new Giraph code? Please check if you have GIRAPH-506 
patch in, if not that's probably the reason for the issue.

Maja

From: Nate touring_...@msn.commailto:touring_...@msn.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 8:06 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Waiting for times required to be 19 (currently 18)

I recently upgraded older Giraph code built against CDH3 to a git checkout from 
a few days ago that builds against CDH4.1.0 (MRv1) libraries.  All of the 
Giraph tests pass.

When running my Giraph job with 20 workers, I usually get the above error in in 
19 map processes:

org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting for 
times required to be 19 (currently 18)

One map worker always shows something like:

org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting interval of 
15000 msecs, 1 open requests, waiting for it to be = 0,and some metrics 
org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for request 
(destTask=17, reqId=5032) - (reqId=5326,destAddr=host1:30017,elapsedNanos=..., 
started=..., writeDone=true, writeSuccess=true)
repeats...

I say this happens usually because the same giraph job does complete but only 
rarely.  I have a timeout of 100 minutes set, and the job is killed after that 
much time has elapsed.

Also, the started field in the above output in this past run reads: Wed Jan 21 
14:21:31 EST 1970  All machines are synchronized by a single time server and 
currently read accurate times.  I don't think it affected the execution, but it 
still seems erroneous.

I also don't see Hadoop maps having status messages set on them.  I see the 
GraphMapper giving the Context object to the GraphTaskManager instance, and I 
can see it calling context.setStatus(...) but those messages never show up in 
the map status column in the job tracker page.

Is there something I've missed while upgrading the old code?


Re: Waiting for times required to be 19 (currently 18)

2013-02-21 Thread Maja Kabiljo
Nate,

Great, glad to hear it works! We resend open requests after 10 minutes, so 
that's why you were seeing supersteps taking that long.

Have fun with Giraph and let us know if you have any other questions.

Maja

From: Nate touring_...@msn.commailto:touring_...@msn.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 1:32 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: Waiting for times required to be 19 (currently 18)

Maja,

Success!

I did check and see that the giraph jar being used was dated 6-Feb, but many 
hours before your fix made it into the source tree.  I probably forgot to put 
the new jar that I made earlier this week into the right place.  How 
frustrating.

I recompiled the very latest code, put the jar into the right place and have 
been able to execute the giraph job multiple times successfully.  It even 
executes much faster than before, and the time to execute is reliable too.  
Time to execute used to vary between 10 and 20 minutes when Giraph was able to 
complete, but now takes between 70 to 80 seconds every time without any 
problems.

Many thanks for fixing the original issue, and for replying to my email to the 
list.

Nate


From: majakabi...@fb.commailto:majakabi...@fb.com
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Waiting for times required to be 19 (currently 18)
Date: Thu, 21 Feb 2013 20:04:53 +

Nate,

Are all the workers waiting for request from the same worker? (in the log 
waitSomeRequests: Waiting for request destTask is what you should look at) If 
so, check if there is some exception on that worker. You can also try 
decreasing giraph.maxRequestMilliseconds and see what happens after the request 
gets resent. Please let us know what you find out!

Maja

From: Nate touring_...@msn.commailto:touring_...@msn.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 11:16 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: RE: Waiting for times required to be 19 (currently 18)

Hello Maja,

Thank you for your reply and link to the issue.
I last updated the code this week, and do infact have that issue checked-out in 
my local copy of the source.  My compiled jar file of giraph-core is dated Feb 
18th (three days ago).

I will do another update from Git very soon and build and test again to be sure 
that the fix is in place and report back if the behavior changes.

Thank you,
Nate


From: majakabi...@fb.commailto:majakabi...@fb.com
To: user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: Waiting for times required to be 19 (currently 18)
Date: Thu, 21 Feb 2013 17:48:24 +

Hi Nate,

When did you take the new Giraph code? Please check if you have GIRAPH-506 
patch in, if not that's probably the reason for the issue.

Maja

From: Nate touring_...@msn.commailto:touring_...@msn.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 8:06 AM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Waiting for times required to be 19 (currently 18)

I recently upgraded older Giraph code built against CDH3 to a git checkout from 
a few days ago that builds against CDH4.1.0 (MRv1) libraries.  All of the 
Giraph tests pass.

When running my Giraph job with 20 workers, I usually get the above error in in 
19 map processes:

org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting for 
times required to be 19 (currently 18)

One map worker always shows something like:

org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting interval of 
15000 msecs, 1 open requests, waiting for it to be = 0,and some metrics 
org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for request 
(destTask=17, reqId=5032) - (reqId=5326,destAddr=host1:30017,elapsedNanos=..., 
started=..., writeDone=true, writeSuccess=true)
repeats...

I say this happens usually because the same giraph job does complete but only 
rarely.  I have a timeout of 100 minutes set, and the job is killed after that 
much time has elapsed.

Also, the started field in the above output in this past run reads: Wed Jan 21 
14:21:31 EST 1970  All machines are synchronized by a single time server and 
currently read accurate times.  I don't think it affected the execution, but it 
still seems erroneous.

I also don't see Hadoop maps having status messages set on them.  I see the 
GraphMapper giving the Context object to the GraphTaskManager instance, and I 
can see it calling 

Re: Where can I find a simple Hello World example for Giraph

2013-02-21 Thread Maja Kabiljo
Hi Ryan,

Before running the job, you need to set Vertex and input/output format
classes on it. Please take a look at one of the benchmarks to see how to
do that. Alternatively, you can try using GiraphRunner, where you pass
these classes as command line arguments.

Maja

On 2/21/13 2:43 PM, Ryan Compton compton.r...@gmail.com wrote:

I'm still struggling with this. I am trying to use 0.2, I dont have
permissions to edit core-site.xml

I think this the most basic boiler plate code for a 0.2 Giraph
project, but I still can't run it.

Exception in thread main java.lang.NullPointerException
at 
org.apache.giraph.utils.ReflectionUtils.getTypeArguments(ReflectionUtils.j
ava:85)
at 
org.apache.giraph.conf.GiraphClasses.readFromConf(GiraphClasses.java:117)
at org.apache.giraph.conf.GiraphClasses.init(GiraphClasses.java:105)
at 
org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.init(Immutabl
eClassesGiraphConfiguration.java:84)
at 
com.hrl.issl.osi.networks.HelloGiraph0p2.setConf(HelloGiraph0p2.java:34)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:61)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.hrl.issl.osi.networks.HelloGiraph0p2.main(HelloGiraph0p2.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
pl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)



package networks;

import java.io.IOException;

import org.apache.giraph.conf.ImmutableClassesGiraphConfiguration;
import org.apache.giraph.graph.GiraphJob;
import org.apache.giraph.vertex.EdgeListVertex;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;

/**
 *
 * Hello world giraph 0.2...
 *
 */
public class HelloGiraph0p2 extends EdgeListVertexLongWritable, Text,
Text, Text implements Tool {
/** Configuration */
private ImmutableClassesGiraphConfigurationLongWritable, Text, Text,
Text conf;
/** Class logger */
private static final Logger LOG = Logger.getLogger(HelloGiraph0p2.class);

@Override
public void compute(IterableText arg0) throws IOException {
int four = 2+2;
}
@Override
public void setConf(Configuration configurationIn) {
this.conf = new ImmutableClassesGiraphConfigurationLongWritable,
Text, Text, Text(configurationIn);
return;
}
@Override
public ImmutableClassesGiraphConfigurationLongWritable, Text, Text,
Text getConf() {
return conf;
}

/**
*
* ToolRunner run
*
* @param arg0
* @return
* @throws Exception
*/
@Override
public int run(String[] arg0) throws Exception {
GiraphJob job = new GiraphJob(getConf(), getClass().getName());

return job.run(true) ? 0 : -1;

}
/**
* main...
*
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new HelloGiraph0p2(), args));
}

}



On Tue, Feb 5, 2013 at 4:24 AM, Gustavo Enrique Salazar Torres
gsala...@ime.usp.br wrote:
 Hi Ryan:

 I got that same error and discovered that I have to start a zookeeper
 instance. What I did was to download Zookeeper, write a new zoo.cfg file
 under conf directory with the following:

 dataDir=/home/user/zookeeper-3.4.5/tmp
 clientPort=2181

 Also I added some lines in Hadoop's core-site.xml:
 property
 namegiraph.zkList/name
 valuelocalhost:2181/value
   /property

 Then I start Zookeper with bin/zkServer.sh start (also you will have to
 restart Hadoop) and then you can launch your Giraph Job.
 This setup worked for me (maybe there is an easiest way :D), hope it is
 useful.

 Best regards
 Gustavo


 On Mon, Feb 4, 2013 at 10:06 PM, Ryan Compton compton.r...@gmail.com
 wrote:

 Ok great, thanks. I've been working with 0.1, I can get things to
 compile (see below code) but they still are not running, the maps hang
 (also below). I have no idea how to fix it, I may consider updating
 that code I have that compiles to 0.2 and see if it works then. The
 only difference I can see is that 0.2 requires everything have a
 message

 -bash-3.2$ hadoop jar target/giraph-0.1-jar-with-dependencies.jar
 com.SimpleGiraphSumEdgeWeights /user/rfcompton/giraphTSPInput
 /user/rfcompton/giraphTSPOutput 3 3
 13/02/04 15:48:23 INFO mapred.JobClient: Running job:
 job_201301230932_1199
 13/02/04 15:48:24 INFO mapred.JobClient:  map 0% reduce 0%
 13/02/04 15:48:35 INFO mapred.JobClient:  map 25% reduce 0%
 13/02/04 15:58:40 INFO mapred.JobClient: Task Id :
 attempt_201301230932_1199_m_03_0, Status : FAILED
 java.lang.IllegalStateException: run: Caught an unrecoverable
 exception setup: Offlining servers due to exception...
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at 

Re: InputFormat for the example SimpleMasterComputeVertex

2013-02-21 Thread Maja Kabiljo
I wrote up a basic info about aggregators: 
https://cwiki.apache.org/confluence/display/GIRAPH/Aggregators. Please take a 
look, and let me know if something needs to be changed / improved.

From: Eli Reisman apache.mail...@gmail.commailto:apache.mail...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 4:21 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: InputFormat for the example SimpleMasterComputeVertex

That sounds great to me, maybe just a mention in the wiki that the two 
functionalities are tied together will help the idea click for people. Either 
way this will be a big help I think.


On Thu, Feb 21, 2013 at 3:24 PM, Maja Kabiljo 
majakabi...@fb.commailto:majakabi...@fb.com wrote:
Eli, that's an interesting idea, we could have some class which user extends 
and which is there only for aggregator registration. Sometimes we want to 
register some aggregators later on during the computation, so we need to keep 
allowing registration from masterCompute too.

But I think for users the biggest problem is to realize that they have to 
extend and set MasterCompute/this new class in order to use aggregators. 
Currently, if user tries to aggregate a value to unregistered aggregator he 
will get an exception, but if he tries to get the value of unregistered 
aggregator he will just get null. So maybe adding a warning message in that 
case, together with a wiki page, might be enough? What do you think?

From: Eli Reisman apache.mail...@gmail.commailto:apache.mail...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 21, 2013 10:25 AM

To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: InputFormat for the example SimpleMasterComputeVertex

Thanks for the explanation, that makes sense. I would love to see a wiki page 
at some point, you have so much knowledge of this piece of Giraph from all your 
dev work on it and have also the additional bonus of experience running big 
cluster jobs using these features so you have a lot of insight to share.

Would there be any point to a future JIRA to break out the aggregator 
registration from the master compute stuff, at least from the user's view? Or 
is it not that confusing once you've used them a bit?


On Thu, Feb 14, 2013 at 4:52 PM, Maja Kabiljo 
majakabi...@fb.commailto:majakabi...@fb.com wrote:
Progressable exception can be caused by many different reasons (it's totally 
unrelated to aggregators), and when looking at which exception it's caused by 
users should get better sense about what's going on.
What you are suggesting about providing default master compute is not doable, 
since the part which needs to be done there is aggregator registration. We 
can't know what kind of aggregators (names and types) an application needs.
I remember I was talking about writing a short tutorial for aggregators long 
time ago, sorry for not doing that, will try to get to it soon.

From: Eli Reisman apache.mail...@gmail.commailto:apache.mail...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 14, 2013 2:23 PM

To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: InputFormat for the example SimpleMasterComputeVertex

Other folks on the list are also having this problem with the progressable 
utile exception  job failures. I don't know much about master compute usage 
but if it is needed to make the aggregators work, maybe we should have a 
default dummy class that just handles aggregators if no other master compute is 
specified? Or a wiki page? The progressable error message does not lead us to 
this conclusion directly.

On Wed, Feb 13, 2013 at 3:04 AM, Maria Stylianou 
mars...@gmail.commailto:mars...@gmail.com wrote:
Hey,

I am trying to run the example SimpleMasterComputeVertex, but no matter which 
Input Format and graph I give, it doesn't work. Each worker gives the error:

Caused by: java.lang.NullPointerException
at 
org.apache.giraph.examples.SimpleMasterComputeVertex.compute(SimpleMasterComputeVertex.java:42)

This line 42 is the first line of the compute()
public void compute(IterableDoubleWritable messages){

So I guess, the initialization is not done correctly, because the input file 
does not have the correct format.

Any help would be appreciated,
Thanks!
Maria
--
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
Master Student of European Master in Distributed 
Computinghttp://www.kth.se/en/studies/programmes/master/em/emdc
Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
KTH Royal Institute of Technology

Re: InputFormat for the example SimpleMasterComputeVertex

2013-02-14 Thread Maja Kabiljo
Progressable exception can be caused by many different reasons (it's totally 
unrelated to aggregators), and when looking at which exception it's caused by 
users should get better sense about what's going on.
What you are suggesting about providing default master compute is not doable, 
since the part which needs to be done there is aggregator registration. We 
can't know what kind of aggregators (names and types) an application needs.
I remember I was talking about writing a short tutorial for aggregators long 
time ago, sorry for not doing that, will try to get to it soon.

From: Eli Reisman apache.mail...@gmail.commailto:apache.mail...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Thursday, February 14, 2013 2:23 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: InputFormat for the example SimpleMasterComputeVertex

Other folks on the list are also having this problem with the progressable 
utile exception  job failures. I don't know much about master compute usage 
but if it is needed to make the aggregators work, maybe we should have a 
default dummy class that just handles aggregators if no other master compute is 
specified? Or a wiki page? The progressable error message does not lead us to 
this conclusion directly.

On Wed, Feb 13, 2013 at 3:04 AM, Maria Stylianou 
mars...@gmail.commailto:mars...@gmail.com wrote:
Hey,

I am trying to run the example SimpleMasterComputeVertex, but no matter which 
Input Format and graph I give, it doesn't work. Each worker gives the error:

Caused by: java.lang.NullPointerException
at 
org.apache.giraph.examples.SimpleMasterComputeVertex.compute(SimpleMasterComputeVertex.java:42)

This line 42 is the first line of the compute()
public void compute(IterableDoubleWritable messages){

So I guess, the initialization is not done correctly, because the input file 
does not have the correct format.

Any help would be appreciated,
Thanks!
Maria
--
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
Master Student of European Master in Distributed 
Computinghttp://www.kth.se/en/studies/programmes/master/em/emdc
Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain
KTH Royal Institute of Technology, Stockholm, Sweden




Re: Can Giraph handle graphs with very large number of edges per vertex?

2012-09-13 Thread Maja Kabiljo
Hi Jeyendran,

As Paolo mentioned, there were two patches to deal with out-of-core:
GIRAPH-249 for out-of-core graph
GIRAPH-45 for out-of-core messages

For the graph part, currently assumption is that you have enough memory to
keep at least one whole partition at the time. Options you need to set
here are:
giraph.useOutOfCoreGraph=true
giraph.maxPartitionsInMemory= as much as you can keep

For the messages, it's not necessary that messages for the whole partition
fit in memory, since it streams on per vertex basis. There is however the
constraint that all vertex ids (from all partitions) need to fit in
memory, but for your application I understand that's not an issue. Options:
giraph.useOutOfCoreMessages=true

giraph.maxMessagesInMemory= as much as you can keep

Also for messages, if you have a really heavy load and still run out of
memory, you can also try using options from GIRAPH-287, since in practice
it happens that messages are created much faster than they are actually
transferred and processed on the destination, options there will prevent
it from happening. But try it without these options first, since this can
really slow down your application. You set:
giraph.waitForRequestsConfirmation=true

giraph.maxNumberOfOpenRequests= as much as you want


Hope this helps, let us know if you have any other questions.

Maja

On 9/13/12 8:41 AM, Paolo Castagna castagna.li...@gmail.com wrote:

Hi Jeyendran,
interesting questions and IMHO it is not always easy to understand how
many Giraph workers are necessary in order to process a specific
(large) graph.
A few more comments inline, but I am interested in the answers to your
questions as well.

On 13 September 2012 07:03, Jeyendran Balakrishnan j...@personaltube.com
wrote:
 After reading both of your replies, I have some (final!) questions
regarding
 memory usage:

 · For applications with a large number of edges per vextex: Are
 there any built-in vertex or helper classes or at least sample code
which
 feature spilling of edges to disk, or some kind of disk-backed map of
edges,
 to support such vertices? Or do we have to sort of roll our own?

You'll probably need to roll your own (let's see what others suggest).
However, if you do that, you should do it in the open so others can have
a look,
eventually help you and perhaps ensure that what you do might in future be
contributed back to Giraph for others to benefit/use.

A few months ago I had a look at this and I tried to use TDB (i.e. the
storage
layer available in Apache Jena) to store (and spill on disk) vertexes
with Giraph.
TDB uses B+Tree and memory mapped files. It's designed and tuned to store
RDF, however it is not limited to RDF and someone might reuse it's low
level
indexing capabilities to store different graphs.

Even if you do not use TDB, having a look at its sources might inspire
you or
give you some ideas and what you could do:
https://svn.apache.org/repos/asf/jena/trunk/jena-tdb/src/main/java/com/hp/
hpl/jena/tdb/index/


 · For graphs with a large number of vertices relative to
available
 workers, at least in development phase,  one may not always have access
to a
 large number of workers, yet one might wish to process a very large
graph.
 In these cases, it may happen that the workers may not be able to hold
all
 their assigned vertices in memory. So again in this case, are there any
 built-in classes to allow spilling of vertices to disk, or a similar
kind of
 disk-backed map?

Here, I am not sure I understand where your need comes from.

I usually develop and test everything locally, but while I do that I
use a small
dataset which it can be loaded in memory and allows me to iterate faster.

Why do you need to use a large/read dataset in development phase?

How large is your large number of vertices?

Even if you use indexes and data structures on disk, as your dataset grow,
the indexing and processing might take long time. So, perhaps, in
development
you are better off with small datasets anyway.

 · Assuming some kind of disk backing is implemented to handle
large
 number of vertices/edges (under a situation of insufficient # of
workers or
 memory per worker), is it likely that just the volume of IO
(message/IPC)
 could cause OOMEs? Or merely slowdowns?

There was work on spilling messages to disk and I found GIRAPH-249
(marked as resolved):
https://issues.apache.org/jira/browse/GIRAPH-249

 In general, I feel that one of the reasons for wide and rapid adoption
of
 Hadoop is the ³download, install and run² feature, where even for large
data
 sets, the stock code will still run to completion on a single laptop
(or a
 single Linux server, etc), except that it will take more time. But this
may
 be perfectly acceptable for people who are evaluating and experimenting,
 since there is no incurred cost for hardware. A lot of developers might
be
 OK with giving the thing a run overnight on their laptops or fire up
just
 one spot instance on EC2 etc and let it chug 

Re: How to register aggregators with the 'new' Giraph?

2012-09-12 Thread Maja Kabiljo
I don't plan to change the API for aggregators anymore, only the way they are 
implemented is going to change (unless someone else has an 
objection/improvement to current API). So I can write the tutorial on how to 
use them already.

We should probably make some plan for the pages structure on 
https://cwiki.apache.org/confluence/display/GIRAPH/Index, otherwise it's going 
to be a mess :-) So for example have a section with writing a simple 
application first, with some examples. And then a section with additional 
stuff, subsections for combiners, aggregators, master compute… What do you 
think?

From: Eli Reisman apache.mail...@gmail.commailto:apache.mail...@gmail.com
Reply-To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Date: Tuesday, September 11, 2012 7:38 PM
To: user@giraph.apache.orgmailto:user@giraph.apache.org 
user@giraph.apache.orgmailto:user@giraph.apache.org
Subject: Re: How to register aggregators with the 'new' Giraph?

Hey Maja,

A small tutorial on the wiki would be wonderful, either now or when the final 
changes to aggregators in the upcoming patches are done. We need a wiki entry 
for master compute too. I would like to go through and update some of the 
website examples as well  regarding best practices with the new Vertex API, 
using the bin/giraph script and command line opts to set up jobs without 
writing your own run() method, implementing Tool, and writing your own IO 
Formats, etc.

Thanks again!

On Tue, Sep 11, 2012 at 9:36 AM, Paolo Castagna 
castagna.li...@gmail.commailto:castagna.li...@gmail.com wrote:
Hi Maja,
yep, your explanation makes sense.

Clear now.

Paiki

On 11 September 2012 16:09, Maja Kabiljo 
majakabi...@fb.commailto:majakabi...@fb.com wrote:
 Hi Paolo,

 Glad to hear it works :-)

 The reason why you don't see the value you set with setAggregatedValue
 right away is that we want to read aggregated values from previous
 superstep and change them for next one. It goes the same with vertices
 where you call aggregate to give values for next superstep and read the
 values from previous. This is actually the part which wasn't working well
 before - it wasn't possible to get aggregated value without changes that
 vertices on the same worker made in current superstep. Hope this makes it
 clear for you.

 Maja


 On 9/11/12 12:45 PM, Paolo Castagna 
 castagna.li...@gmail.commailto:castagna.li...@gmail.com wrote:

Hi,
the green bar is back. :-)

I made multiple mistakes in relation to the new aggregators but now I
believe I grasped how they work.

For those interested the PageRankVertex, PageRankMasterCompute and
PageRankWorkerContext are here:
https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce
a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankV
ertex.java
https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce
a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankM
asterCompute.java
https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce
a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankW
orkerContext.java

There might be some further improvement left, but I'll try that another
time.

For example:

  registerPersistentAggregator(dangling-current,
DoubleSumAggregator.class);
  registerPersistentAggregator(error-current,
DoubleSumAggregator.class);

Could probably be registerAggregator.

I also noticed that within the compute() method if I call
setAggregatedValue(name, ...) and getAggregatedValue(name) I don't
seem to get the value set back. But the value is sent to the worker.
This is not important, but it confuses me.

I do agree with you, now the situation around aggregators is cleaner
than before.

Thank you for your help.

Paolo

PS:
There is still a known failure in the tests, that is to show that the
SimplePageRankVertex approach is too simple, it does not give back a
probability distribution (i.e. sum at the end is not 1.0) and it does
not take into account dangling nodes properly.
On the other hand, PageRankVertex produces same results as two other
implementations: one serial, all in memory and another one using JUNG.

On 11 September 2012 11:03, Maja Kabiljo 
majakabi...@fb.commailto:majakabi...@fb.com wrote:
 Hi Paolo,

 You get null for aggregated value because aggregators haven't been
 registered yet in the moment WorkerContext.preApllication() is called.
But
 I think that shouldn't be a problem since you can set initial values for
 aggregators in MasterCompute.initialize().

 Please also note that you are not using the new aggregator api in the
 proper way. Function getAggregatedValue will return the value of the
 aggregator, not the aggregator object itself. It's not possible to set
the
 value of the aggregators on workers (in methods from WorkerContext and
 Vertex), because that would produce nondeterministic results. You
 aggregate on workers and set values