On Mon, Jul 5, 2010 at 7:06 PM, Edward J. Yoon <[email protected]> wrote: > I agree with you that we should measure about the number of > iterations. And, as you said, there is still I/O overhead involved in > reading and writing materialized data every time, even if avoiding the > situation shuffle and sort of reduce phase.
I'm particularly interested in how BSP handle the I/O overhead? Suppose only several vertices are active among millions of vertices. How does BSP activate those vertices? Does the vertices directly accessible? Usually the vertices are stored within a block of 64 MB. Does BSP read all 64 MB just to activate one vertex? Or BSP has some kind of indexing? > IMO, BSP will communication only some vertices which can't be solved > locally, and I'm sure that the number of iterations will be less or > equal to (M/R based) Schimmy approach. As far as I know, Schimmy approach doesn't reduce the number of iterations. It only used to avoid shuffling the "master" graph. So, the number of iterations for MR based and the number of supersteps for BSP should be the same. Here number of MR iterations (or rounds) is identical to BSP's "supersteps". > More I hope we can compare them using Hama BSP soon. I'm sure BSP version will be more efficient since BSP is like MR + Schimmy built in. Felix Halim
