Re: Giraph vs good-old PVM/MPI ?

2013-08-06 Thread Claudio Martella
In principle you could implement (and it has been) Pregel through MPI. The
idea behind Pregel was precisely to factor out typical patterns of graph
processing that used to be based on message-passing and barriers. A
framework like Pregel/Giraph hides this complexity through a well-defined
API and programming pattern, leaving the user with only the application
logics. How the rest is implemented under the hood, is another story that
the user does not have to worry about.


On Tue, Aug 6, 2013 at 7:19 PM, Yang  wrote:

> it seems that the paradigm offered by Giraph/Pregel is very similar to the
> programming paradim of PVM , and to a lesser degree, MPI. using PVM, we
> often engages in such "iterative cycles" where all the nodes sync on a
> barrier and then enters the next cycle.
>
> so what is the extra features offered by Giraph/Pregel? I can see
> persistence/restarting of tasks, and maybe abstraction of the
> user-code-specific part into the API so that users are not concerned with
> the actual message passing (message passing is done by the framework).
>
> Thanks
> Yang
>



-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: Giraph vs good-old PVM/MPI ?

2013-08-06 Thread Yang
thanks !


On Tue, Aug 6, 2013 at 10:48 AM, Avery Ching  wrote:

> The Giraph/Pregel model is based on bulk synchronous parallel computing,
> where the programmer is abstracted from the details of how the
> parallelization occurs (infrastructure does this for you).  Additionally
> the APIs are built for graph-processing.  Since the computing model is well
> defined (BSP), the infrastructure can checkpoint the state of the
> application at the appropriate time and also handle failures without user
> interaction.
>
> MPI is a much lower level and generic API, where messages are send to
> processes.  Users must pack/unpack their own messages and deliver messages
> to the appropriate data structures.  Users must partition their own data.
>  As of MPI 2, the state of a failed process leaves the application in an
> undefined state (usually dead).
>
> Hope that helps,
>
> Avery
>
>
> On 8/6/13 10:19 AM, Yang wrote:
>
>> it seems that the paradigm offered by Giraph/Pregel is very similar to
>> the programming paradim of PVM , and to a lesser degree, MPI. using PVM, we
>> often engages in such "iterative cycles" where all the nodes sync on a
>> barrier and then enters the next cycle.
>>
>> so what is the extra features offered by Giraph/Pregel? I can see
>> persistence/restarting of tasks, and maybe abstraction of the
>> user-code-specific part into the API so that users are not concerned with
>> the actual message passing (message passing is done by the framework).
>>
>> Thanks
>> Yang
>>
>
>


Re: Giraph vs good-old PVM/MPI ?

2013-08-06 Thread Avery Ching
The Giraph/Pregel model is based on bulk synchronous parallel computing, 
where the programmer is abstracted from the details of how the 
parallelization occurs (infrastructure does this for you).  Additionally 
the APIs are built for graph-processing.  Since the computing model is 
well defined (BSP), the infrastructure can checkpoint the state of the 
application at the appropriate time and also handle failures without 
user interaction.


MPI is a much lower level and generic API, where messages are send to 
processes.  Users must pack/unpack their own messages and deliver 
messages to the appropriate data structures.  Users must partition their 
own data.  As of MPI 2, the state of a failed process leaves the 
application in an undefined state (usually dead).


Hope that helps,

Avery

On 8/6/13 10:19 AM, Yang wrote:
it seems that the paradigm offered by Giraph/Pregel is very similar to 
the programming paradim of PVM , and to a lesser degree, MPI. using 
PVM, we often engages in such "iterative cycles" where all the nodes 
sync on a barrier and then enters the next cycle.


so what is the extra features offered by Giraph/Pregel? I can see 
persistence/restarting of tasks, and maybe abstraction of the 
user-code-specific part into the API so that users are not concerned 
with the actual message passing (message passing is done by the 
framework).


Thanks
Yang




Giraph vs good-old PVM/MPI ?

2013-08-06 Thread Yang
it seems that the paradigm offered by Giraph/Pregel is very similar to the
programming paradim of PVM , and to a lesser degree, MPI. using PVM, we
often engages in such "iterative cycles" where all the nodes sync on a
barrier and then enters the next cycle.

so what is the extra features offered by Giraph/Pregel? I can see
persistence/restarting of tasks, and maybe abstraction of the
user-code-specific part into the API so that users are not concerned with
the actual message passing (message passing is done by the framework).

Thanks
Yang


Re: zookeeper not starting

2013-08-06 Thread Kyle Orlando
I am having similar problems. Not really sure what's causing it. I've
tried to specify giraph.zkList, as well as adding various directories
to the classpath, but that of course didn't work. I would keep getting
a connection exception for the localhost on the worker computers,
which doesn't really make any sense because it should be connecting to
the master. The only way I could get it to work was by putting the IP
address of the master machine next to localhost in /etc/hosts, which
is a horrible solution, but it works.

On Wed, Jul 31, 2013 at 5:33 PM, Claudio Martella
 wrote:
> as a follow up to this, I think there might be some problem with the
> classpath and the jar files:
>
> 2013-07-31 19:51:53,270 INFO org.apache.giraph.graph.GraphTaskManager:
> setup: classpath @
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/distcache/3182842244657980528_-2099658804_903563705/localhost/tmp/hadoop-hammer/mapred/staging/hammer/.staging/job_201307302017_0010/libjars/SOCCPartitioning-1.0-SNAPSHOT.jar
> for job Giraph: org.apache.giraph.benchmark.WeightedPageRankComputation
> 2013-07-31 19:51:53,367 INFO org.apache.giraph.zk.ZooKeeperManager:
> createCandidateStamp: Made the directory /user/martella/zk
> 2013-07-31 19:51:53,369 INFO org.apache.giraph.zk.ZooKeeperManager:
> createCandidateStamp: Creating my filestamp
> /user/martella/zk/_task/localhost. 0
> 2013-07-31 19:51:53,378 INFO org.apache.giraph.zk.ZooKeeperManager:
> getZooKeeperServerList: Got [localhost.] 1 hosts from 1 candidates when 1
> required (polling period is 3000) on attempt 0
> 2013-07-31 19:51:53,379 INFO org.apache.giraph.zk.ZooKeeperManager:
> createZooKeeperServerList: Creating the final ZooKeeper file
> '/user/martella/zk/zkServerList_localhost. 0 '
> 2013-07-31 19:51:53,383 INFO org.apache.giraph.zk.ZooKeeperManager:
> getZooKeeperServerList: For task 0, got file 'zkServerList_localhost. 0 '
> (polling period is 3000)
> 2013-07-31 19:51:53,383 INFO org.apache.giraph.zk.ZooKeeperManager:
> getZooKeeperServerList: Found [localhost., 0] 2 hosts in filename
> 'zkServerList_localhost. 0 '
> 2013-07-31 19:51:53,385 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Trying to delete old directory
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/jobcache/job_201307302017_0010/work/_bspZooKeeper
> 2013-07-31 19:51:53,390 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Creating file
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/jobcache/job_201307302017_0010/work/_bspZooKeeper/zoo.cfg
> in
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/jobcache/job_201307302017_0010/work/_bspZooKeeper
> with base port 22181
> 2013-07-31 19:51:53,390 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
> 2013-07-31 19:51:53,390 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Delete of zoo.cfg = false
> 2013-07-31 19:51:53,392 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Attempting to start ZooKeeper server with command
> [/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java,
> -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
> -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/distcache/3182842244657980528_-2099658804_903563705/localhost/tmp/hadoop-hammer/mapred/staging/hammer/.staging/job_201307302017_0010/libjars/SOCCPartitioning-1.0-SNAPSHOT.jar,
> org.apache.zookeeper.server.quorum.QuorumPeerMain,
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/jobcache/job_201307302017_0010/work/_bspZooKeeper/zoo.cfg]
> in directory
> /Users/hammer/Dev/hadoop/data/hadoop/mapred/taskTracker/hammer/jobcache/job_201307302017_0010/work/_bspZooKeeper
> 2013-07-31 19:51:53,401 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Shutdown hook added.
> 2013-07-31 19:51:53,401 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
> localhost.:22181 with poll msecs = 3000
> 2013-07-31 19:51:53,409 WARN org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
>
> the jar used to run ZK is actually the jar with my application (used with
> -libjars and put in HADOOP_CLASSPATH). The question is why suddenly this is
> creating a problem...
>
>
>
> On Tue, Jul 30, 2013 at 8:43 PM, Claudio Martella
>  wrote:
>>
>> Am I the only one that recently is experiencing problems with zookeeper? I
>> get the workers failing to connect to zookeeper. I presume it is not
>> starting at all. I'm using trunk and hadoop 1.0.3. Used to work smoothly.
>>
>> --
>>Claudio Martella
>>claudio.marte...@gmail.com
>
>
>
>
> --
>Claudio Martella
>claudio.marte...@gmail.com



-- 
Kyle Orlando
Com