So we had written our own input readers that applied filtering, and it was 
pretty simple by extending the DatastoreInputReader code.  I didn't have 
the actual code in front of me, so I took a look back at the library code 
to see what I needed to override.

Then I noticed, filtering functionality is included now!  Just not 
documented online.  Look through 
mapreduce.input_readers.DatastoreInputReader.validate_filters

Looks like you need to include a 'filters' dictionary in your mapper_spec

This supports my poor documentation point...

On Friday, December 6, 2013 2:11:56 PM UTC-5, Amit S wrote:
>
> I agree with D X that packaged version would really help.
>
> One functionality we are badly in need is to be able to supply a query 
> filter in MapReduce (for e.g. where x="vvv"). Currently, it just iterates 
> through all entities of a given object in datastore. So, if I want to 
> update 10,000 rows of a given entity which consists of 5M records, it 
> iterates through all 5M entities and then we have to put the filter logic 
> in our code. This really defeats the purpose and also has huge implications 
> on read costs.
>
> On Friday, December 6, 2013 10:51:48 AM UTC-8, D X wrote:
>>
>> I've been using the mapreduce library for the last 18mo or so.
>> In addition to what's already been mentioned, some additional comments:
>>
>> - The docs are kinda confusing because there's different sets of docs. 
>>  Just the fact that docs are disorganized gives the impression that it's a 
>> low priority project that's not well maintained.
>> i.e. prettier: 
>> https://developers.google.com/appengine/docs/python/dataprocessing/helloworld
>> but more detailed: 
>> https://code.google.com/p/appengine-mapreduce/wiki/MapReduceDemoApp
>> Keeping one set of well maintained docs would help give the sense that 
>> MapReduce is a higher class citizen
>>
>> - Using MapReduce for schema changes is probably a very common yet simple 
>> use case.  I've heard more than one comment that the MapReduce pipeline 
>> seems to complicated to pick up to do a simple task of updating a bunch of 
>> entities.
>> mapper_spec?  reducer_spec?  input_reader?  output_reader?  Do I have to 
>> learn all these things just to add an extra field to my entities?  While 
>> the wordcount demo shows more of the pipeline, it would probably be easier 
>> for users to pick up if there was a simple demo of how to update your 
>> 'schema' with 5 lines of python.  (a DatastoreInputReader that supports 
>> filtering would be great too)
>>
>> If this sounds too negative, you can interepret these comments as saying 
>> that the rest of the GAE docs are great and easy to follow.
>>
>> - Packaged versions would be great.  I mean, it was great.  I'm not sure 
>> why you guys got rid of it.  Maybe I'm not hardcore enough to just sync 
>> with the repo (actually, I did).  However, a packaged version suggests that 
>> it's tested and stable.  If I see bugs, I can check online to see if 
>> anyone's seeing the same issue.  When syncing with the repo, I have not 
>> idea how stable the latest checkins are.  Maybe something just broke and 
>> I'm the one person who sync'd after the broken change, and I'm obviously 
>> not going to be constantly syncing the MR library, because I actually have 
>> other things to work on.
>> What about the version that's included in the SDK?  I toyed with using 
>> that, but the docs indicate I should be downloading from the repo.  So is 
>> the repo more recent, and the SDK version outdated?  Would the SDK version 
>> be more stable?  Again, confusion.
>>
>>
>>
>>
>> On Friday, December 6, 2013 1:10:37 AM UTC-5, Chris Ramsdale wrote:
>>>
>>> thanks, PK.  we'll follow-up on the issue you cited.
>>>
>>> -- Chris
>>>
>>>
>>> On Thu, Dec 5, 2013 at 9:59 PM, PK <p...@gae123.com> wrote:
>>>
>>>> Thanks Chris.
>>>>
>>>> Great to hear that you plan to move map/reduce to GA. The reasons I 
>>>> asked are:
>>>>
>>>> 1. it has been experimental for about 3 years if not longer so I had 
>>>> started wondering… I think it is fair.
>>>> 2. More important some bugs still in NEW state make me wonder whether 
>>>> it follows along with other changes in the platform. In particular seeing 
>>>> issues staying in NEW state for months concerns me the most about how 
>>>> active the effort is.
>>>>
>>>> (examples of such bugs:
>>>> e.g. Issue 
>>>> 182<https://code.google.com/p/appengine-mapreduce/issues/detail?id=182>
>>>>  mapreduce/include.yaml doesn't work with python2.7 and thread 
>>>> safe<https://code.google.com/p/appengine-mapreduce/issues/detail?id=182&can=5&colspec=ID%20Type%20Status%20Priority%20Component%20Owner%20Summary>
>>>>           or Issue 
>>>> 203<https://code.google.com/p/appengine-mapreduce/issues/detail?id=203>  
>>>> map reduce is broken since r534)
>>>>
>>>> Thanks,
>>>> PK
>>>>
>>>> On December 5, 2013 at 8:27:33 PM, Chris Ramsdale (cram...@google.com) 
>>>> wrote:
>>>>
>>>> PK,
>>>>
>>>> We're definitely planning on moving mapreduce to GA. Plan is to get the 
>>>> API finalized and then move it through the std Preview => GA channel.
>>>>
>>>> Questions for you:
>>>>
>>>> - how would prefer to access the library itself?
>>>>
>>>> - is there something about having it outside of the SDK that causes 
>>>> substantial friction?
>>>>
>>>> - or, is it the fact that it's sat in experimental for way too long 
>>>> that is the larger concern?
>>>>
>>>> -- Chris
>>>>
>>>> Product Manager, Google App Engine
>>>>
>>>> -- Chris
>>>> On Dec 5, 2013 6:12 PM, "PK" <p...@gae123.com> wrote:
>>>>
>>>>>  I am resending this in case the right Product Manager(s) at Google 
>>>>> missed my question.
>>>>>  
>>>>>  Thanks,
>>>>> PK
>>>>> http://www.gae123.com
>>>>>
>>>>> On November 25, 2013 at 12:53:44 AM, PK (p...@gae123.com) wrote:
>>>>>
>>>>>   For a very very long time MapReduce has not been integrated with 
>>>>> the GAE SDK and remains experimental (
>>>>> https://developers.google.com/appengine/docs/python/dataprocessing/).
>>>>>  
>>>>>  Could somebody shed some light on the roadmap plan?
>>>>>  
>>>>> Thanks,
>>>>> PK
>>>>> http://www.gae123.com
>>>>>  
>>>>>  --
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Google App Engine" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to google-appengi...@googlegroups.com.
>>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/google-appengine.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Google App Engine" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to google-appengi...@googlegroups.com.
>>>> To post to this group, send email to google-a...@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/google-appengine.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to