[appengine-java] [mapreduce] Global limitation on concurrent mapper jobs performance

Cyrille Vincey Mon, 30 May 2011 02:50:45 -0700

We are testing the mass creation of a very large number of entities in the 
datastore (several billions).
We use csv files (approx. 100 Mb each), uploaded into the blobstore, and run 
mapper jobs on them.
Our goal : minimize the overall execution time (whatever the cost).


There seems to be an overall performance limitation we can not overcome, 
even when "playing" with different parameters :
. Set a high value to "mapreduce.mapper.inputprocessingrate" (for instance 
1,000,000)
. Set a high value to "mapreduce.mapper.shardcount" (for instance 20, or 50)
. Launch concurrent mapper jobs in parallel (for instance 20 jobs, 1 job per 
file)

The overall performance sticks around 500 entities/second.
Is there a specific limitation related to the blobstore reads we should be 
aware of?
Or has someone any tips about improving this perf?

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

[appengine-java] [mapreduce] Global limitation on concurrent mapper jobs performance

Reply via email to