Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?
In your cron_mapreduce.py add these two lines shard_count=int(self.request.get("shard_count", mr_control._DEFAULT_SHARD_COUNT)) mr_control.start_map( self.request.get("name"), self.request.get("reader_spec", "your_mapreduce.map"), self.request.get("reader_parameters", "mapreduce.input_readers.DatastoreInputReader"), { "entity_kind": self.request.get("entity_kind", "models.YourModel"), "processing_rate": int(self.request.get("processing_rate", 100)) }, shard_count = shard_count, mapreduce_parameters={"done_callback": self.request.get("done_callback", None) } ) 2011/2/10 Andrin von Rechenberg : > Hey there > Today I created a library to run MapReduces as cron jobs in python. > See > here: http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html > However, I didn't figure out how to I'm able to set the > shard_count programmatically. > In mapreduce/control.py there is a function I call: > > def start_map(name, > > handler_spec, > > reader_spec, > > reader_parameters, > > shard_count=_DEFAULT_SHARD_COUNT, > > [...]) > > However, no matter what o I pass as the shard_count argument, it is ignored. > > Any ideas? > > Cheers, > > -Andrin -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?
Today I updated the mapreduce library. I see the same "only 1 shard" when I use the dev_server. The dev_server does not have the __scatter__ property of objects. The mapreduce library then falls back to a single shard. And on the production it depends on how many objects have a __scatter__ property. If less then shard_count have __scatter__ you get less shards. GAE Team: What determines if an object gets a __scatter__ property? 2011/2/11 djidjadji : > In your cron_mapreduce.py add these two lines > > shard_count=int(self.request.get("shard_count", > mr_control._DEFAULT_SHARD_COUNT)) > > mr_control.start_map( > self.request.get("name"), > self.request.get("reader_spec", "your_mapreduce.map"), > self.request.get("reader_parameters", > "mapreduce.input_readers.DatastoreInputReader"), > { "entity_kind": self.request.get("entity_kind", "models.YourModel"), > "processing_rate": int(self.request.get("processing_rate", 100)) }, > shard_count = shard_count, > mapreduce_parameters={"done_callback": self.request.get("done_callback", > None) } ) > > > 2011/2/10 Andrin von Rechenberg : >> Hey there >> Today I created a library to run MapReduces as cron jobs in python. >> See >> here: http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html >> However, I didn't figure out how to I'm able to set the >> shard_count programmatically. >> In mapreduce/control.py there is a function I call: >> >> def start_map(name, >> >> handler_spec, >> >> reader_spec, >> >> reader_parameters, >> >> shard_count=_DEFAULT_SHARD_COUNT, >> >> [...]) >> >> However, no matter what o I pass as the shard_count argument, it is ignored. >> >> Any ideas? >> >> Cheers, >> >> -Andrin > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?
This document explains the strategy: http://code.google.com/p/appengine-mapreduce/wiki/ScatterPropertyImplementation It says tht there is a .8% chance of an entity getting this property. That seems really low. I wonder if they meant 8% not .8%? Stephen On Mon, Feb 14, 2011 at 7:12 AM, djidjadji wrote: > Today I updated the mapreduce library. > I see the same "only 1 shard" when I use the dev_server. > The dev_server does not have the __scatter__ property of objects. > The mapreduce library then falls back to a single shard. > > And on the production it depends on how many objects have a > __scatter__ property. > If less then shard_count have __scatter__ you get less shards. > > GAE Team: What determines if an object gets a __scatter__ property? > > 2011/2/11 djidjadji : > > In your cron_mapreduce.py add these two lines > > > > shard_count=int(self.request.get("shard_count", > > mr_control._DEFAULT_SHARD_COUNT)) > > > > mr_control.start_map( > > self.request.get("name"), > > self.request.get("reader_spec", "your_mapreduce.map"), > > self.request.get("reader_parameters", > > "mapreduce.input_readers.DatastoreInputReader"), > > { "entity_kind": self.request.get("entity_kind", "models.YourModel"), > > "processing_rate": int(self.request.get("processing_rate", 100)) }, > >shard_count = shard_count, > > mapreduce_parameters={"done_callback": > self.request.get("done_callback", > > None) } ) > > > > > > 2011/2/10 Andrin von Rechenberg : > >> Hey there > >> Today I created a library to run MapReduces as cron jobs in python. > >> See > >> here: > http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html > >> However, I didn't figure out how to I'm able to set the > >> shard_count programmatically. > >> In mapreduce/control.py there is a function I call: > >> > >> def start_map(name, > >> > >> handler_spec, > >> > >> reader_spec, > >> > >> reader_parameters, > >> > >> shard_count=_DEFAULT_SHARD_COUNT, > >> > >> [...]) > >> > >> However, no matter what o I pass as the shard_count argument, it is > ignored. > >> > >> Any ideas? > >> > >> Cheers, > >> > >> -Andrin > > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.