This benchmark means that 2 GB is enough for each task if the number of total vertices * edges assigned to task is about 10 million (partition file size is approximately 60 MB ~ 100 MB). Input size doesn't matter if there are enough task slots.
The reason of slight performance degradation is message serialization and deserialization for reducing memory usage, but I am really happy with this result. Since the today's machines can be equipped with up to 128 GB, we should stick to in-memory style. So, disk-based issues (this is the way backward) should have the lowest priority. If no objections are raised, I'll re-arrange the JIRA roadmap. > Looks pretty good. Can we try even larger scale? Sure. To Min and Lee, I think this benchmark testing can be a first step to get involved in Apache Hama project. :-) 1. Build lastest version from Source: http://wiki.apache.org/hama/GettingStarted#Build_latest_version_from_source 2. Run PageRank example: https://wiki.apache.org/hama/PageRank On Thu, Feb 13, 2014 at 7:43 AM, Yexi Jiang <[email protected]> wrote: > Looks pretty good. Can we try even larger scale? > > > 2014-02-11 3:18 GMT-05:00 Edward J. Yoon <[email protected]>: > >> See >> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3 >> >> >> >> On Mon, Jan 27, 2014 at 5:09 PM, Edward J. Yoon <[email protected]> >> wrote: >> > FYI, this result shows the improvement obtained by removing message >> > bundling overheads. >> > >> > ---- >> > PageRank with 3 tasks on single machine >> > bsp.child.java.opts: -Xmx1524m >> > hama.graph.vertices.info: ListVerticesInfo >> > >> > * Input vertices: 20000 max edges per vertex: 100 >> > Hama-0.6.3: 87.895 seconds >> > Hama-TRUNK: 99.689 seconds >> > >> > * Input vertices: 40000 max edges per vertex: 200 >> > Hama-0.6.3: 340.094 seconds >> > Hama-TRUNK: 420.992 seconds >> > >> > * Input vertices: 40000 max edges per vertex: 300 >> > Hama-0.6.3: 556.07 seconds >> > Hama-TRUNK: 583.098 seconds >> > >> > * Input vertices: 40000 max edges per vertex: 400 >> > Hama-0.6.3: 733.408 seconds >> > Hama-TRUNK: 739.116 seconds >> > >> > * Input vertices: 40000 max edges per vertex: 500 >> > Hama-0.6.3: java.lang.OutOfMemoryError: Java heap space >> > Hama-TRUNK: 1207.854 seconds >> > >> > * Input vertices: 40000 max edges per vertex: 600 >> > Hama-0.6.3: java.lang.OutOfMemoryError: Java heap space >> > Hama-TRUNK: java.lang.OutOfMemoryError: Java heap space >> > Hama-TRUNK (DiskVerticesInfo): java.lang.OutOfMemoryError: Java heap >> space >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> > > > > -- > ------ > Yexi Jiang, > ECS 251, [email protected] > School of Computer and Information Science, > Florida International University > Homepage: http://users.cis.fiu.edu/~yjian004/ -- Best Regards, Edward J. Yoon @eddieyoon
