Hi Jake, yep, you basically understand it correctly, it is indeed something like a SSSP, but I'm starting from many vertices at the same time (so i'd call it a grammar-based ASSP).
To make it more clear, the current code looks like: @Override public void compute(Iterator<Message> messages) throws IOException { if (getSuperstep() == 0 && isSource()) { processMessage(new Message(this.query, new ResultSet())); } else { while (messages.hasNext()) { processMessage(messages.next()); } } voteToHalt(); } as you can see I have used a hack and saved the query as VertexValue (so I at least saved parsing it multiple times from String). With what I have in mind, I could remove the first if-condition completely. Also, as not all the vertices are starting vertices, I could obtain the complete list of starting vertices by running a filter based on the first element of my stack/query. Suppose my query asks for all the actors who have acted in a movie which took place in USA. (actor - acted_in -> movie - takes_place -> USA), I could set as a starting vertices only those vertices that have an outgoing edge with label "acted_in". That would save quite a lot. As we didn't have an API to do this yet, I still haven't figured out how to do this precisely, but i guess this could be done in the VertexReader. On Wed, Nov 2, 2011 at 4:51 PM, Jake Mannix <jake.man...@gmail.com> wrote: > On Wed, Nov 2, 2011 at 5:07 AM, Claudio Martella <claudio.marte...@gmail.com >> wrote: >> >> Not each vertex is a starting vertex for the path traversals, so it >> could be nice if at the first superstep the starting vertices could >> already have the messages in their inbox. This way the message >> processing (and path traversal processing with it) could be fully >> transparent (vertices just have to iterate throw messages without >> caring about parsing, start vertices etc etc). >> >> Currently, at first superstep, each vertex has to get che query from >> the conf, parse it into a stack and discard it afterwards, as soon as >> it discovers it is not a starting vertex for a path traversal. >> Outside, I know who's a starting vertex and could just set the inbox >> for those vertices accordingly. >> > > Let me understand what the situation looks like, by basically repeating > this back to you: in your case, the set of queries which are to be run > over your graph need to be read/parsed at startup and "given" to the > starting node who is going to be responsible for initiating their > propagation > across the graph, right? > > For one thing, storing this batch set of queries in the Configuration > doesn't seem very scalable - having that live on-disk and be read > by the VertexReader does seem to make sense, although I wonder how > easy it would be to coerce your serialized on-disk format to be nice > and re-use the unchanging graph for multiple queries? Maybe a > MultipleInputFormats kind of thing would be needed? > > So how is this different from SimpleShortestPathsVertex, in the way > that one vertex is "special"? The compute() method of that vertex > uses the knowledge of whether it is the source or not to set its "distance" > from the origin, and only when that distance is > the current min does > it send messages out. > > I could see how algorithms like this could possibly be sped up if we > didn't need to iterate over *all* of the graph on each superstep, as the > first step only needs to send messages to the vertices directly > connected to the origin, then only it's 2nd degree, etc. If we were > only to process the vertices which had messages to send, for example, > and iterate over those in a sparse fashion, this algorithm could be > sped up considerably. > > I guess you're saying that similarly, not only if the messages were > directly associated with their starting nodes, but if during the superstep > we could only iterate over the nodes with messages, we could be much > faster. > > It seems there might be a bunch of algorithms which would work this > way, and I wonder if it would be a good idea to modify the system to > have GraphMapper check to see if the vertex class "isSparse()" in > some sense, which would mean that in the superstep walk: > > for (Map.Entry<I, VertexRange<I, V, E, M>> entry : > serviceWorker.getVertexRangeMap().entrySet()) { > // ... > for (BasicVertex<I, V, E, M> vertex : > entry.getValue().getVertexMap().values()) { > // ... > if (!vertex.isHalted()) { > Iterator<M> vertexMsgIt = > vertex.getMsgList().iterator(); > context.progress(); > vertex.compute(vertexMsgIt); > } > > we instead would iterate over a sparse data structure which > only contains the vertices to be processed (this data structure > would of course change from superstep to superstep as the > messages propagate across the graph). This could be a pretty > significant speedup a lot of the time in practice, as it would > mean that iterations per superstep would scale in complexity > as O( |currently visible part of the graph| ) as opposed to > O( | entire graph| ), even if most of the iteration is just a bunch > of in-memory boolean and size checks. > > Not sure if this is exactly what you're talking about, however. > > -jake > -- Claudio Martella claudio.marte...@gmail.com