Hey there everyone!

On the user list, there was noone to help me, so I thought I'll just start bugging devs..

I am currently writing my bachelor thesis about Giraph and GraphX, where I am trying to compare their scalability and features and bring them into a context with different graph types. In order to compare the two on a fair basis, I want to tune the frameworks to get the most out of them :-) I was hoping to get some tips and tricks from you all, where I can make some configurations to impact my computations..

My set up:
10 machines, each 1 cpu with 1 3,3GHz core, 4GB RAM, 100GB HDD -> one is designated master
Giraph 1.10
Hadoop 1.2.1

So far I haven't done any special configurations for hadoop or giraph besides the basic ones during setup.
Performance-critical might be these:
In *mapred-site.xml*:
    mapred.tasktracker.map.tasks.maximum = 4
    mapred.map.tasks=4
In *dfs-site.xml*:
    dfs.replication=3

If I am correctly informed, the default amount of heap is 1000MB, which I haven't changed. I am also not sure where I can actually increase memory usage. Any advice? Also, I read somewhere that it is smarter to increase the amount of threads per worker and not the amount of worker per machine? But I am anyways somewhat handicapped with only one core per machine..

Lastly, has anyone noticed any performance changes when using checkointing, combiners, aggregators and so on? Is the use of combiners and aggregators a choice of the application code or my execution command?

I would appreciate any advice and comments greatly! :-)

Greetings from Ulm,
Sonja



Reply via email to