[google-appengine] MapReduce as Cron jobs: How to specify the number of shards?

2011-02-10 Thread Andrin von Rechenberg
Hey there

Today I created a library to run MapReduces as cron jobs in python.

See here:
http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html

However, I didn't figure out how to I'm able to set the
shard_count programmatically.

In mapreduce/control.py there is a function I call:

   def start_map(name,

  handler_spec,

  reader_spec,

  reader_parameters,

  shard_count=_DEFAULT_SHARD_COUNT,

  [...])


However, no matter what o I pass as the *shard_count argument, it is
ignored.*

*
*

Any ideas?


Cheers,

-Andrin

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?

2011-02-11 Thread djidjadji
In your cron_mapreduce.py add these two lines

shard_count=int(self.request.get("shard_count",
mr_control._DEFAULT_SHARD_COUNT))

mr_control.start_map(
 self.request.get("name"),
 self.request.get("reader_spec", "your_mapreduce.map"),
 self.request.get("reader_parameters",
  "mapreduce.input_readers.DatastoreInputReader"),
 { "entity_kind": self.request.get("entity_kind", "models.YourModel"),
   "processing_rate": int(self.request.get("processing_rate", 100)) },
shard_count = shard_count,
   mapreduce_parameters={"done_callback": self.request.get("done_callback",
 None) } )


2011/2/10 Andrin von Rechenberg :
> Hey there
> Today I created a library to run MapReduces as cron jobs in python.
> See
> here: http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html
> However, I didn't figure out how to I'm able to set the
> shard_count programmatically.
> In mapreduce/control.py there is a function I call:
>
> def start_map(name,
>
>               handler_spec,
>
>               reader_spec,
>
>               reader_parameters,
>
>               shard_count=_DEFAULT_SHARD_COUNT,
>
>               [...])
>
> However, no matter what o I pass as the shard_count argument, it is ignored.
>
> Any ideas?
>
> Cheers,
>
> -Andrin

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?

2011-02-14 Thread djidjadji
Today I updated the mapreduce library.
I see the same "only 1 shard" when I use the dev_server.
The dev_server does not have the __scatter__ property of objects.
The mapreduce library then falls back to a single shard.

And on the production it depends on how many objects have a
__scatter__ property.
If less then shard_count have __scatter__ you get less shards.

GAE Team: What determines if an object gets a __scatter__ property?

2011/2/11 djidjadji :
> In your cron_mapreduce.py add these two lines
>
> shard_count=int(self.request.get("shard_count",
> mr_control._DEFAULT_SHARD_COUNT))
>
> mr_control.start_map(
>     self.request.get("name"),
>     self.request.get("reader_spec", "your_mapreduce.map"),
>     self.request.get("reader_parameters",
>                      "mapreduce.input_readers.DatastoreInputReader"),
>     { "entity_kind": self.request.get("entity_kind", "models.YourModel"),
>       "processing_rate": int(self.request.get("processing_rate", 100)) },
>    shard_count = shard_count,
>   mapreduce_parameters={"done_callback": self.request.get("done_callback",
>                                                             None) } )
>
>
> 2011/2/10 Andrin von Rechenberg :
>> Hey there
>> Today I created a library to run MapReduces as cron jobs in python.
>> See
>> here: http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html
>> However, I didn't figure out how to I'm able to set the
>> shard_count programmatically.
>> In mapreduce/control.py there is a function I call:
>>
>> def start_map(name,
>>
>>               handler_spec,
>>
>>               reader_spec,
>>
>>               reader_parameters,
>>
>>               shard_count=_DEFAULT_SHARD_COUNT,
>>
>>               [...])
>>
>> However, no matter what o I pass as the shard_count argument, it is ignored.
>>
>> Any ideas?
>>
>> Cheers,
>>
>> -Andrin
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] MapReduce as Cron jobs: How to specify the number of shards?

2011-02-14 Thread Stephen Johnson
This document explains the strategy:
http://code.google.com/p/appengine-mapreduce/wiki/ScatterPropertyImplementation

It says tht there is a .8% chance of an entity getting this property. That
seems really low. I wonder if they meant 8% not .8%?

Stephen

On Mon, Feb 14, 2011 at 7:12 AM, djidjadji  wrote:

> Today I updated the mapreduce library.
> I see the same "only 1 shard" when I use the dev_server.
> The dev_server does not have the __scatter__ property of objects.
> The mapreduce library then falls back to a single shard.
>
> And on the production it depends on how many objects have a
> __scatter__ property.
> If less then shard_count have __scatter__ you get less shards.
>
> GAE Team: What determines if an object gets a __scatter__ property?
>
> 2011/2/11 djidjadji :
> > In your cron_mapreduce.py add these two lines
> >
> > shard_count=int(self.request.get("shard_count",
> > mr_control._DEFAULT_SHARD_COUNT))
> >
> > mr_control.start_map(
> > self.request.get("name"),
> > self.request.get("reader_spec", "your_mapreduce.map"),
> > self.request.get("reader_parameters",
> >  "mapreduce.input_readers.DatastoreInputReader"),
> > { "entity_kind": self.request.get("entity_kind", "models.YourModel"),
> >   "processing_rate": int(self.request.get("processing_rate", 100)) },
> >shard_count = shard_count,
> >   mapreduce_parameters={"done_callback":
> self.request.get("done_callback",
> > None) } )
> >
> >
> > 2011/2/10 Andrin von Rechenberg :
> >> Hey there
> >> Today I created a library to run MapReduces as cron jobs in python.
> >> See
> >> here:
> http://devblog.miumeet.com/2011/02/schedule-mapreduce-daily-on-appengine.html
> >> However, I didn't figure out how to I'm able to set the
> >> shard_count programmatically.
> >> In mapreduce/control.py there is a function I call:
> >>
> >> def start_map(name,
> >>
> >>   handler_spec,
> >>
> >>   reader_spec,
> >>
> >>   reader_parameters,
> >>
> >>   shard_count=_DEFAULT_SHARD_COUNT,
> >>
> >>   [...])
> >>
> >> However, no matter what o I pass as the shard_count argument, it is
> ignored.
> >>
> >> Any ideas?
> >>
> >> Cheers,
> >>
> >> -Andrin
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.