We are testing the mass creation of a very large number of entities in the datastore (several billions). We use csv files (approx. 100 Mb each), uploaded into the blobstore, and run mapper jobs on them. Our goal : minimize the overall execution time (whatever the cost).
There seems to be an overall performance limitation we can not overcome, even when "playing" with different parameters : . Set a high value to "mapreduce.mapper.inputprocessingrate" (for instance 1,000,000) . Set a high value to "mapreduce.mapper.shardcount" (for instance 20, or 50) . Launch concurrent mapper jobs in parallel (for instance 20 jobs, 1 job per file) The overall performance sticks around 500 entities/second. Is there a specific limitation related to the blobstore reads we should be aware of? Or has someone any tips about improving this perf? -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.