I've used MapReduce myself for while, and I can say it to you: 100+MB of 
keys means A LOT of keys at the shuffle stage. And the real limitations of 
MapReduce are:

"The total size of all the instances of Mapper, InputReader, OutputWriter 
and Counters must be less than 1MB between slices. This is because these 
instances are serialized and saved to the datastore between slices."

Source 
<https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/2.6-The-Java-MapReduce-Library>

The real problem of MapReduce, in my opinion, is the latency of the 
operations and huge amount of read/writes to the datastore to maintain the 
things working between slices (which considerably increases costs). You 
can't rely on MapReduce to do real time or near real time work as you could 
with pure tasks queues. And it actually only shines when you can afford a 
large number of machines to run your logic - running MapReduce in few 
machines is sometimes worse than pure sequential brute force.

Fitting your problem in a MapReduce process is actually good for your code 
- even if you don't use the library itself. It forces you to think on how 
can you split your huge tasks into smaller, more manageable and more 
scalable pieces. It's a good exercise - sometimes you think you can't 
parallelize your problem, but when you're forced to the MapReduce workflow, 
you might find you were actually wrong, and by the end of the day you have 
a better code.

On Wednesday, December 10, 2014 6:22:17 PM UTC-2, Emanuele Ziglioli wrote:
>
> It all comes at a cost: increased complexity.
> You can't beat the simplicity of task queues and the 10m limit seems 
> artificially imposed to me. I mean, we pay for CPU time, as we would pay 
> for 20m, 30m, 1h tasks.
> I've got a simple task that takes a long time, looping through hundreds of 
> thousands of rows to produced ordered files in output.
> The current code is simple and elegant but I have to keep increasing the 
> CPU size in order to finish the task within 10m.
> A solution could be using MapReuce, but I haven't figured out yet how 
> MapReduce would solve my problem without hitting the memory limit: with my 
> simple task there are only 1000 rows in memory at any given time (of 
> course, minus the GC). A MapReduce shuffle stage would require all 
> entities, or at least their keys, to be kept in memory, and that's 
> impossible with F1s or F2s.
>
> Emanuele
>
> On Wednesday, 10 December 2014 19:24:30 UTC+13, Vinny P wrote:
>>
>> On Sat, Dec 6, 2014 at 5:58 AM, Maneesh Tripathi <
>> maneesh....@razaonline.info> wrote:
>>
>>> I have Created an task queue which stop working after 10 Min.
>>> I want to increase the timing. 
>>> Please help me on this 
>>>
>>
>>
>> Task queue requests are limited to 10 minutes of execution time: 
>> https://cloud.google.com/appengine/docs/java/taskqueue/overview-push#task_deadlines
>>
>> If you need to go past the 10 minute deadline, you're better off using a 
>> manual or basic scaled module: 
>> https://cloud.google.com/appengine/docs/java/modules/#scaling_types
>>
>>  
>> -----------------
>> -Vinny P
>> Technology & Media Consultant
>> Chicago, IL
>>
>> App Engine Code Samples: http://www.learntogoogleit.com
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.

Reply via email to