Re: [google-appengine] How to increase Task Queue Execution timing

2014-12-11 Thread Gilberto Torrezan Filho
I've used MapReduce myself for while, and I can say it to you: 100+MB of 
keys means A LOT of keys at the shuffle stage. And the real limitations of 
MapReduce are:

The total size of all the instances of Mapper, InputReader, OutputWriter 
and Counters must be less than 1MB between slices. This is because these 
instances are serialized and saved to the datastore between slices.

Source 
https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/2.6-The-Java-MapReduce-Library

The real problem of MapReduce, in my opinion, is the latency of the 
operations and huge amount of read/writes to the datastore to maintain the 
things working between slices (which considerably increases costs). You 
can't rely on MapReduce to do real time or near real time work as you could 
with pure tasks queues. And it actually only shines when you can afford a 
large number of machines to run your logic - running MapReduce in few 
machines is sometimes worse than pure sequential brute force.

Fitting your problem in a MapReduce process is actually good for your code 
- even if you don't use the library itself. It forces you to think on how 
can you split your huge tasks into smaller, more manageable and more 
scalable pieces. It's a good exercise - sometimes you think you can't 
parallelize your problem, but when you're forced to the MapReduce workflow, 
you might find you were actually wrong, and by the end of the day you have 
a better code.

On Wednesday, December 10, 2014 6:22:17 PM UTC-2, Emanuele Ziglioli wrote:

 It all comes at a cost: increased complexity.
 You can't beat the simplicity of task queues and the 10m limit seems 
 artificially imposed to me. I mean, we pay for CPU time, as we would pay 
 for 20m, 30m, 1h tasks.
 I've got a simple task that takes a long time, looping through hundreds of 
 thousands of rows to produced ordered files in output.
 The current code is simple and elegant but I have to keep increasing the 
 CPU size in order to finish the task within 10m.
 A solution could be using MapReuce, but I haven't figured out yet how 
 MapReduce would solve my problem without hitting the memory limit: with my 
 simple task there are only 1000 rows in memory at any given time (of 
 course, minus the GC). A MapReduce shuffle stage would require all 
 entities, or at least their keys, to be kept in memory, and that's 
 impossible with F1s or F2s.

 Emanuele

 On Wednesday, 10 December 2014 19:24:30 UTC+13, Vinny P wrote:

 On Sat, Dec 6, 2014 at 5:58 AM, Maneesh Tripathi 
 maneesh@razaonline.info wrote:

 I have Created an task queue which stop working after 10 Min.
 I want to increase the timing. 
 Please help me on this 



 Task queue requests are limited to 10 minutes of execution time: 
 https://cloud.google.com/appengine/docs/java/taskqueue/overview-push#task_deadlines

 If you need to go past the 10 minute deadline, you're better off using a 
 manual or basic scaled module: 
 https://cloud.google.com/appengine/docs/java/modules/#scaling_types

  
 -
 -Vinny P
 Technology  Media Consultant
 Chicago, IL

 App Engine Code Samples: http://www.learntogoogleit.com



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: como voltar para a conta gratis ?

2014-12-11 Thread Gilberto Torrezan Filho
(English below)

-- Português

Caro Contato Recompa, você está no fórum errado. Essa lista de discussão 
refere-se ao Google App Engine, produto da linha do Google Cloud Platform, 
que em nada tem a ver com o Google Apps, que parece ser o que você está se 
referindo.

De qualquer maneira acredito que esse link 
https://support.google.com/a/answer/2855120?hl=pt-BR pode ajudar a sanar 
a sua dúvida.


-- English

Dear Contact Recompa, you are at the wrong forum. This group refers to the 
Google App Engine, a product of Google Cloud Platform line, which has 
nothing to do with Google Apps, which seems to be what you are referring to.

Anyway I this this link https://support.google.com/a/answer/2855120?hl=en can 
help you out in your question.


On Sunday, December 7, 2014 6:31:39 PM UTC-2, Contato Recompa wrote:

 Minha cota era Gratis   GMAIL APPS se que era gratis , como é que faço 
 para voltar para o GMAIL APPS (GRATIS ) 

  * visite nosso Site : www.recompa.com.br 
 http://www.recompa.com.br*

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: Trying to query the datastore

2014-12-11 Thread Gilberto Torrezan Filho
Here is the native datastore query for your problem if you want to try it 
out:

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query(Member.class.getSimpleName());
query.setFilter(new FilterPredicate(name, FilterOperator.EQUAL, 
nameParam));
ListEntity list = 
ds.prepare(query).asList(FetchOptions.Builder.withDefaults());

In my opinion, if you are building a new application right now, you should 
consider other options on accessing the Datastore, such as Objectify 
https://code.google.com/p/objectify-appengine/ or even the raw Datastore 
API. JDO is pretty heavyweight for the App Engine model.

On Sunday, December 7, 2014 7:12:47 AM UTC-2, ajg...@gmail.com wrote:


 Hello,

 I am using eclipse+google plugin to develop a simple API with Cloud 
 Endpoints. I just have a member entity with JDO annotations, and I 
 generated the Endpoint class.


 My Member entity only has 2 fields: the Long id and String name.
 I would like to use a getMemberByName(String name) method to retrieve 
 easily a member instead of having to know its id.

 So I added this method to the Endpoint class generated:


 @SuppressWarnings({“unchecked”})



 @ApiMethod(name = “getMemberByName”,path=”/member/byname/{name}”)

 public List getMemberByName(@Named(“name”) String name) {

 PersistenceManager mgr = getPersistenceManager();

 List candidates = null;

 Query q=mgr.newQuery(Member.class);

 q.setFilter(“name == nameParam”);

 q.declareParameters(“String nameParam”);

 try {

 candidates = (List) q.execute(nom);

 } finally {

 mgr.close();

 }

 return candidates;

 }


 When I call this API method and give it a name parameter (which does exist 
 in my datastore) it sends me nothing back but a 404 error. 

 I thought maybe you could tell me if my code got any error, I’m not an 
 expert with JDO.


 Thank you.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] How to increase Task Queue Execution timing

2014-12-11 Thread Emanuele Ziglioli
Thank you very much Gilberto!

It's great to make contact with people out there who are on the same boat.
I've just been watching a series of videos on pipelines, and I'm starting 
to get the pattern for big data processing that Google promotes:

Datastore - Cloud Storage - BigQuery

The key point is that BigQuery is append only, something I didn't realize 
before.
Here are the videos:

   1. Google I/O 2012 - Building Data Pipelines at Google Scale: 
   http://youtu.be/lqQ6VFd3Tnw 
   2. BigQuery: Simple example of a data collection and analysis pipeline + 
   Yo...: http://youtu.be/btJE659h5Bg
   3. GCP Cloud Platform Integration Demo: http://youtu.be/JcOEJXopmgo via 
   @YouTube
   
All I need it seems is the Pipeline API, iterating over the Datastore (I 
guess, in order with a query) and producing a CSV (and other formats) 
output.
That should allow me to do what I do already but on top of multiple 
(perhaps sequential) task queues, rather than just one.

From the point of view of costs, currently I heavily rely on, possibly 
abusing, memcache. With no memcache, I expect costs to go up.
A further improvement would be to update only subsets of data, rather than 
a whole lot. I've been designing a new datastore 'schema' so that my data 
is hierarchically organized in entity groups, therefore I could generate a 
file per entity group (once that's changed) and have a final stage that 
assembles those files together.
I'm pretty happy with my current task because, as I wrote, is simple and 
elegant.
If I could upgrade the same algorithm to a Datastore input reader for 
pipelines, that should do for us.

Emanuele

On Friday, 12 December 2014 02:29:54 UTC+13, Gilberto Torrezan Filho wrote:

 I've used MapReduce myself for while, and I can say it to you: 100+MB of 
 keys means A LOT of keys at the shuffle stage. And the real limitations of 
 MapReduce are:

 The total size of all the instances of Mapper, InputReader, OutputWriter 
 and Counters must be less than 1MB between slices. This is because these 
 instances are serialized and saved to the datastore between slices.

 Source 
 https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/2.6-The-Java-MapReduce-Library

 The real problem of MapReduce, in my opinion, is the latency of the 
 operations and huge amount of read/writes to the datastore to maintain the 
 things working between slices (which considerably increases costs). You 
 can't rely on MapReduce to do real time or near real time work as you could 
 with pure tasks queues. And it actually only shines when you can afford a 
 large number of machines to run your logic - running MapReduce in few 
 machines is sometimes worse than pure sequential brute force.

 Fitting your problem in a MapReduce process is actually good for your code 
 - even if you don't use the library itself. It forces you to think on how 
 can you split your huge tasks into smaller, more manageable and more 
 scalable pieces. It's a good exercise - sometimes you think you can't 
 parallelize your problem, but when you're forced to the MapReduce workflow, 
 you might find you were actually wrong, and by the end of the day you have 
 a better code.

 On Wednesday, December 10, 2014 6:22:17 PM UTC-2, Emanuele Ziglioli wrote:

 It all comes at a cost: increased complexity.
 You can't beat the simplicity of task queues and the 10m limit seems 
 artificially imposed to me. I mean, we pay for CPU time, as we would pay 
 for 20m, 30m, 1h tasks.
 I've got a simple task that takes a long time, looping through hundreds 
 of thousands of rows to produced ordered files in output.
 The current code is simple and elegant but I have to keep increasing the 
 CPU size in order to finish the task within 10m.
 A solution could be using MapReuce, but I haven't figured out yet how 
 MapReduce would solve my problem without hitting the memory limit: with my 
 simple task there are only 1000 rows in memory at any given time (of 
 course, minus the GC). A MapReduce shuffle stage would require all 
 entities, or at least their keys, to be kept in memory, and that's 
 impossible with F1s or F2s.

 Emanuele

 On Wednesday, 10 December 2014 19:24:30 UTC+13, Vinny P wrote:

 On Sat, Dec 6, 2014 at 5:58 AM, Maneesh Tripathi 
 maneesh@razaonline.info wrote:

 I have Created an task queue which stop working after 10 Min.
 I want to increase the timing. 
 Please help me on this 



 Task queue requests are limited to 10 minutes of execution time: 
 https://cloud.google.com/appengine/docs/java/taskqueue/overview-push#task_deadlines

 If you need to go past the 10 minute deadline, you're better off using a 
 manual or basic scaled module: 
 https://cloud.google.com/appengine/docs/java/modules/#scaling_types

  
 -
 -Vinny P
 Technology  Media Consultant
 Chicago, IL

 App Engine Code Samples: http://www.learntogoogleit.com



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To 

[google-appengine] Re: How to Increase Task queue Execution time

2014-12-11 Thread Gilberto Torrezan Filho
That's the default behavior of tasks: they execute over 10 minutes max. To 
increase that you need to setup a backend and make your tasks execute in 
that backend.

I don't know which language you are using, but I'll link the java 
documentation about that:

https://cloud.google.com/appengine/docs/java/taskqueue/
https://cloud.google.com/appengine/docs/java/modules/

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] How to increase Task Queue Execution timing

2014-12-11 Thread Gilberto Torrezan Filho
Actually I just migrated my statistics job from MapReduce to BigQuery 
(using the Datastore - Cloud Storage - BigQuery pattern) =)

I strongly recommend the book Google BigQuery Analytics from Jordan 
Tigani and Siddartha Naidu if you plan to use or know more about BigQuery. 
I got mine at I/O this year (the last book in the box) =)

BigQuery is awesome but have its quirks - the append-only tables is just 
one of them. You have to shape your business logic to handle that before 
starting to heavily use it.

If you don't need statistics, you probably don't need BigQuery.

The sad part is that I spent more than 2 months tweaking and improving my 
whole pipeline stack trying to get a better performance (or 
cost-effectiveness), when I could just be using BigQuery to solve my 
problems. Anyway, it was a good lesson.

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Datastore NDB migration?

2014-12-11 Thread John Louis Del Rosario
What would be the best way to migrate records? 

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Datastore NDB migration?

2014-12-11 Thread Qian Qiao
On Fri Dec 12 2014 at 3:07:25 PM John Louis Del Rosario joh...@gmail.com
wrote:

 What would be the best way to migrate records?


Records, i.e., the data itself, don't need to be migrated.

-- Joe

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Datastore NDB migration?

2014-12-11 Thread John Louis Del Rosario
 I mean migrating a record's schema, or just setting a field's value for 
all existing records. 

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.