I'll try to make things clear.
The end of a superstep is obtained by the un-activation of all the
vertices. In pregel the superstep is over when all the vertices call
VoteToHalt() (in hama it's done by the sync() method). This happens at
the end of each computation by each vertex. Each vertex is activated by
the arrival of a message directed to that vertex. This means that each
superstep computation is atomic and it should be considered in the
design of the algorithm. That's a change of paradigm and that's what
pregel author's call the "vertex's perspective programming".
So no, there's no assumption about all the vertices to be active at the
beginning of each superstep.
About Felix's argument: yes, fewer supersteps mean less communication
and synchronization overhead. At the same time, having longer supersteps
will mean that it's more probable that certain vertices end their
computation earlier than others, making them idle for a long time
(waiting for the others to finish), and loosing computational time. So
ideally it should be a good balance between long computational
supersteps (decreasing communication overhead) and short computational
supersteps (decreasing idle time).
This is an intrinsical problem of BSP models because of the barrier. On
the contrary DataFlow models don't have barriers and each computation is
more independent, therefore more similar to the model you have in mind.
Hope this helps.
I attach the text from my blog post (roughly obtained with html2text) as
requested.
Cheers,
Claudio
zercal wrote:
> The paper I found about pregel are not very detailed,
> is it "http://portal.acm.org/citation.cfm?id=1582716.1582723"?
> I guess, in this paper, vertices are assumed to be all actived at every
> superstep. simply random
> access will reduce some communication cost but take more superstep.
> However, is there any way of vertice selection method can be performed
> that at every super step, each vertex knows whether to active according to
> information it kept and received from other vertex?
> But I can't found more detail from that paper...
> Becides, I can not access your blogs. Would you please send me your article?
>
> Thank you very much!
> from Xiong Chenyan
>
> ÔÚ2010-07-02 20:04:40£¬"Felix Halim" <[email protected]> дµÀ£º
>
>> Exactly how to activate a particular vertex is not clear from the
>> paper (is it random access?) and this feature is probably not as good
>> as it sounds for complex graph algorithms. It might be better off to
>> assume all vertices are active (to reduce the overhead of the flag
>> needed and the space to make it randomly accessible, by storing it in
>> blocks or whatever).
>>
>> Here is my argument:
>>
>> The way Pregel (and existing MR) works is iterative, where each
>> iteration is separated by a super-step barrier where all messages have
>> to arrive. Algorithms that have fewer super-steps are preferable than
>> those that have large number of super-steps. In fact, we should
>> measure Algorithms in terms of the number of super-steps required. To
>> minimize the number of super-steps, likely we need to activate as much
>> vertices as possible to do all the work in current super-step, rather
>> than spill-over to the next super-step. In this case, the feature to
>> "turn off" vertices is useless, since most of the time all vertices
>> will be active to effectively reduce the number of super-steps.
>>
>> Unfortunately, I don't have experiments to backup my argument... I
>> don't have Pregel...
>>
>> Felix Halim
>>
>>
>> On Fri, Jul 2, 2010 at 7:37 PM, Claudio Martella
>> <[email protected]> wrote:
>>
>>> I did too. See:
>>>
>>> http://blog.acaro.org/entry/pregel-is-out-but-what-is-pregel
>>>
>>>
>>> Felix Halim wrote:
>>>
>>>> I have. See my comment in this blog:
>>>>
>>>> http://blog.udanax.org/2010/06/summary-of-google-pregel.html
>>>>
>>>> Felix Halim
>>>>
>>>>
>>>> On Tue, Jun 8, 2010 at 4:00 AM, Mark Kerzner <[email protected]> wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> anybody has read it?
>>>>>
>>>>> Thank you,
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> Claudio Martella
>>> Digital Technologies
>>> Unit Research & Development - Analyst
>>>
>>> TIS innovation park
>>> Via Siemens 19 | Siemensstr. 19
>>> 39100 Bolzano | 39100 Bozen
>>> Tel. +39 0471 068 123
>>> Fax +39 0471 068 129
>>> [email protected] http://www.tis.bz.it
>>>
>>> Short information regarding use of personal data. According to Section 13
>>> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that
>>> we process your personal data in order to fulfil contractual and fiscal
>>> obligations and also to send you information regarding our services and
>>> events. Your personal data are processed with and without electronic means
>>> and by respecting data subjects' rights, fundamental freedoms and dignity,
>>> particularly with regard to confidentiality, personal identity and the
>>> right to personal data protection. At any time and without formalities you
>>> can write an e-mail to [email protected] in order to object the processing
>>> of your personal data for the purpose of sending advertising materials and
>>> also to exercise the right to access personal data and other rights
>>> referred to in Section 7 of Decree 196/2003. The data controller is TIS
>>> Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find
>>> the complete information on the web site www.tis.bz.it.
>>>
>>>
>>>
>>>
--
Claudio Martella
Digital Technologies
Unit Research & Development - Analyst
TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax +39 0471 068 129
[email protected] http://www.tis.bz.it
Short information regarding use of personal data. According to Section 13 of
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we
process your personal data in order to fulfil contractual and fiscal
obligations and also to send you information regarding our services and events.
Your personal data are processed with and without electronic means and by
respecting data subjects' rights, fundamental freedoms and dignity,
particularly with regard to confidentiality, personal identity and the right to
personal data protection. At any time and without formalities you can write an
e-mail to [email protected] in order to object the processing of your personal
data for the purpose of sending advertising materials and also to exercise the
right to access personal data and other rights referred to in Section 7 of
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige,
Siemens Street n. 19, Bolzano. You can find the complete information on the web
site www.tis.bz.it.
************ _cc_ll_aa_uu_dd_ii_oo_
_mm_aa_rr_tt_ee_ll_ll_aa ************
********** _[[_rr_ss_ss_ _ff_ee_ee_dd_]]
_aa_rr_cc_hh_ii_vv_ee _aa_bb_oo_uu_tt **********
************ _GG_oo_oo_gg_ll_ee_
_PP_rr_ee_gg_ee_ll_ _ii_ss_ _oo_uu_tt_.._
_BB_uu_tt_ _ww_hh_aa_tt_ _ii_ss_
_PP_rr_ee_gg_ee_ll_?? ************
June 21, 2010
Google Pregel's _p_a_p_e_r is finally out. But what is Pregel? It has been
mentioned
in many posts talking about NoSQL, GrabhDBs, Big Data, even the Facebook
OpenGraph, so it looks like, apart of the hype fuzz, there's a little confusion
about what it is, what it does, what it's good for and what it certainly is
not. Let's start with the latter.
Google Pregel is not a database (neither RDBMS nor NoSQL), no key-value store
or any new means of storing data (big or small it might be). Putting it in the
same lists with GraphDBs like _N_e_o_4_j,
_H_y_p_e_r_G_r_a_p_h_D_B or even Twitter's _F_l_o_c_k_D_B is
somehow like putting MapReduce in the NoSQL group.
GraphDBs are storage systems that use graph representations for data where each
node represents an entity with unique ids, type and properties. An arc
represents a relationship between two nodes and itself can have a type and
properties. Think of a GraphDB as a RDBMS where instead of tables you have a
graph. It's Semantic Web's triple stores brought to general purpose.
Why would you use a GraphDB? Well, as you can describe your data in terms of
entities and relationship, you're able to avoid defining a schema as we know
it. It's a smaller step towards schema-less representation of data that actual
NoSQLs provide.
Informally, you can describe your data with ER diagrams without translating
them into tables, keeping the representation dynamic and avoiding the costs of
schema redefinition in RDBMs. Plus, it's very efficient and easy to write
queries that allow you to get, for example, all the followers of a user, all
the users he follows, the items related to him (tweets he wrote?) and maybe the
users connected to his items (users retweeting or reply his tweets?) in one go.
That's basically what Twitter wants to achieve with FlockDB, but with a general
GraphDB you can describe your data and the relationships between your data as
you wish.
You store data in a GraphDB and you recall it in an easy and efficient way. So
what's Pregel good for? What if if you want to mine the data in the graph (i.e.
Google's Pagerank, Facebook's social network analysis, Twitter's retweeting/
authority analysis)? Google reports that 80% of their distributed computation
is based on MapReduce (Google Maps, Search Indexing, clustering in Google News,
reports of Google Trends, Google Translate etc.) so we can only guess that the
rest 20% is based on Pregel and the authors report they can work with graphs of
the size of billions of vertices. Plus, implementing Pagerank is just about 15
lines of code...
That's what Pregel is for. So what is it? Pregel is a system for large-scale
graph processing. It provides a fault-tolerant framework for the execution of
graph algorithms in parallel over many machines. Think of it as MapReduce re-
thought for graph operations.
But what's wrong with MapReduce and graph algorithms? Nothing particularly,
though it can lead to suboptimal performance because the graph state has to be
passed from one phase to the other generating a lot of I/O, but in general we
can say it has some usability issues as it doesn't provide a way to do any per-
vertex calculation. In general, it's not easy to express graph algorithms in M/
R. Pregel fills a gap as there are no frameworks for graph processing that
address both distributability and fault-tolerance.
Pregel's architecture is inspired by the Bulk Synchronous Parallel model
introduced by Valiant. BSP is a computational model for the execution of
parallel algorithms on top of multiple sequential Von Neumann machines. It
gives an abstraction, just like M/R, that allows the programmer to think about
the parallel expression of his solution without the hassle of communication and
memory allocation in a distributed system. Before we get into details I think
two things have to be underlined.
First, again like M/R, although the model is used by Google to distribute
computation among multiple computers that's not necessary, in principle BSP
fits parallel programming on SMP or NUMA machines and mainframes.
Second, although the model is used by Google to distribute graph processing,
BSP can be used to distribute other kind of algorithms like matrix
manipulation, just like M/R.
Ok, how does BSP work? I'll take the diagram and snippet from the _B_S_P_
_p_a_g_e_ _f_r_o_m
_W_i_k_i_p_e_d_i_a:
âA BSP computer consists of processors connected by a communication network.
Each processor has a fast local memory, and may follow different threads of
computation.
A BSP computation proceeds in a series of global supersteps. A superstep
consists of three ordered stages:
1. Concurrent computation: Several computations take place on every
participating processor. Each process only uses values stored in the
local memory of the processor. The computations are independent in the
sense that they occur asynchronously of all the others.
2. Communication: At this stage, the processes exchange data between
themselves.
3. Barrier synchronisation: When a process reaches this point (the barrier),
it waits until all other processes have finished their communication
actions.
The figure below shows this in a diagrammatic form. The processes are not
regarded as having a particular linear order (from left to right or otherwise),
and may be mapped to processors in any way.â
[bsp architecture]
Ok, basically at every superstep every processor executes the same algorithm on
its data: its state and the incoming messages. At superstep t every processor
will work on its state, which is the result of its computation at superstep t-
1, and the messages sent to him at superstep t-1. As a result of the superstep
t computation the processor will send messages to other processors and these
messages will be the incoming messages at superstep t+1. And the cycle goes on.
The barrier synchronisation is the moment where t gets to be t+1.
It is easy to see that each computation should take approximately the same
amount of time, otherwise a long lasting computation will force the others to
wait idle.
How does Pregel implement BSP? Quoting Pregel's original paper: âThe input to
a Pregel computation is a directed graph in which each vertex is uniquely
identiï¬ed by a string vertex identiï¬er. Each vertex is associated with a
modiï¬able, user deï¬ned value. The directed edges are associated with their
source vertices, and each edge consists of a modiï¬able, user deï¬ned value
and a target vertex identiï¬er. A typical Pregel computation consists of
input, when the graph is initialized, followed by a sequence of supersteps
separated by global synchronization points until the algorithm terminates, and
ï¬nishing with output.
Within each superstep the vertices compute in parallel, each executing the same
user-deï¬ned function that expresses the logic of a given algorithm. A vertex
can modify its state or that of its outgoing edges, receive messages sent to it
in the previous superstep, send messages to other vertices (to be received in
the next superstep), or even mutate the topology of the graph. Edges are not
ï¬rst-class citizens in this model, having no associated computation.
Algorithm termination is based on every vertex voting to halt. In superstep 0,
every vertex is in the active state; all active vertices participate in the
computation of any given superstep. A vertex deactivates itself by voting to
halt. This means that the vertex has no further work to do unless triggered
externally, and the Pregel framework will not execute that vertex in subsequent
supersteps unless it receives a message. If reactivated by a message, a vertex
must explicitly deactivate itself again. The algorithm as a whole terminates
when all vertices are simultaneously inactive and there are no messages in
transit.â
The mapping between BSP and Pregel is very simple: each local computation of
BSP maps to the user-defined function in Pregel and the communication often,
but not necessarily, corresponds to edge connectivity between nodes. The
barrier is defined by the halt voting of all the active nodes.
From the perspective of the API Pregel requires the implementation of the
virtual Compute() method of the class Vertex. The class Vertex itself provides
VoteToHalt(), SendMessageTo(), GetValue(), GetOutEdgeIterator() and const
methods superstep() and vertex_id().
Like M/R, it provides the possibility to define Combiners in order to reduce
message passing overhead by combining messages together where semantically
possible. Like Sawzall Pregel provides Aggregators which allow global
communication by receiving messages from multiple vertices, combining them and
sending the result back to the vertices. They are useful for statistics (think
of an histogram of vertex degrees) or for global controlling (for example an
aggregator can collect all the vertices' PageRank deltas to calculate the
convergence condition).
From the perspective of architecture, Pregel follows a master/worker
architecture, like most of the other Google frameworks. The master is
responsible of partitioning the graph with a hash function based on the vertex
ID (like hash(ID) mod #partitions although a topology-aware partitioner might
be able to minimize communication between workers by keeping messages intra-
machine) but doesn't compute any partition. At the beginning of computation the
workers subscribe to the computation to the master.
Once the graph is partitioned and the partitions are assigned to workers, the
master issues the start of the superstep. Each worker loops through all his
active vertices, calling the Compute() method and delivering the messages
collected in the previous superstep. The new messages are delivered before the
end of the superstep, right before telling the master the list of active
vertices for the next superstep.
After the computation halts the master might ask the workers to dump their
graph partition to disk.
At the moment there are no projects that handle such computational power over
graphs. _P_a_r_a_l_l_e_l_ _B_G_L and _C_G_M_L_i_b can handle
parallel processing on graphs but
don't scale to this size and is not fault-tolerant. _H_a_m_a, an Apache
incubated
project, aims at developing a similar model to Pregel, but it's not complete
yet.
So, where do I go now?
_G_o_o_g_l_e_'_s_ _R_e_s_e_a_r_c_h_ _B_l_o_g_
_a_n_n_o_u_n_c_e_m_e_n_t
_B_S_P_ _W_o_r_l_d_-_w_i_d_e
_N_o_S_Q_L_ _G_r_a_p_h_D_B
Please enable JavaScript to view the _c_o_m_m_e_n_t_s_
_p_o_w_e_r_e_d_ _b_y_ _D_i_s_q_u_s_. _b_l_o_g_
_c_o_m_m_e_n_t_s
_p_o_w_e_r_e_d_ _b_y_ _D_i_s_q_u_s