Re: [google-appengine] How to increase Task Queue Execution timing
I've used MapReduce myself for while, and I can say it to you: 100+MB of keys means A LOT of keys at the shuffle stage. And the real limitations of MapReduce are: The total size of all the instances of Mapper, InputReader, OutputWriter and Counters must be less than 1MB between slices. This is because these instances are serialized and saved to the datastore between slices. Source https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/2.6-The-Java-MapReduce-Library The real problem of MapReduce, in my opinion, is the latency of the operations and huge amount of read/writes to the datastore to maintain the things working between slices (which considerably increases costs). You can't rely on MapReduce to do real time or near real time work as you could with pure tasks queues. And it actually only shines when you can afford a large number of machines to run your logic - running MapReduce in few machines is sometimes worse than pure sequential brute force. Fitting your problem in a MapReduce process is actually good for your code - even if you don't use the library itself. It forces you to think on how can you split your huge tasks into smaller, more manageable and more scalable pieces. It's a good exercise - sometimes you think you can't parallelize your problem, but when you're forced to the MapReduce workflow, you might find you were actually wrong, and by the end of the day you have a better code. On Wednesday, December 10, 2014 6:22:17 PM UTC-2, Emanuele Ziglioli wrote: It all comes at a cost: increased complexity. You can't beat the simplicity of task queues and the 10m limit seems artificially imposed to me. I mean, we pay for CPU time, as we would pay for 20m, 30m, 1h tasks. I've got a simple task that takes a long time, looping through hundreds of thousands of rows to produced ordered files in output. The current code is simple and elegant but I have to keep increasing the CPU size in order to finish the task within 10m. A solution could be using MapReuce, but I haven't figured out yet how MapReduce would solve my problem without hitting the memory limit: with my simple task there are only 1000 rows in memory at any given time (of course, minus the GC). A MapReduce shuffle stage would require all entities, or at least their keys, to be kept in memory, and that's impossible with F1s or F2s. Emanuele On Wednesday, 10 December 2014 19:24:30 UTC+13, Vinny P wrote: On Sat, Dec 6, 2014 at 5:58 AM, Maneesh Tripathi maneesh@razaonline.info wrote: I have Created an task queue which stop working after 10 Min. I want to increase the timing. Please help me on this Task queue requests are limited to 10 minutes of execution time: https://cloud.google.com/appengine/docs/java/taskqueue/overview-push#task_deadlines If you need to go past the 10 minute deadline, you're better off using a manual or basic scaled module: https://cloud.google.com/appengine/docs/java/modules/#scaling_types - -Vinny P Technology Media Consultant Chicago, IL App Engine Code Samples: http://www.learntogoogleit.com -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: como voltar para a conta gratis ?
(English below) -- Português Caro Contato Recompa, você está no fórum errado. Essa lista de discussão refere-se ao Google App Engine, produto da linha do Google Cloud Platform, que em nada tem a ver com o Google Apps, que parece ser o que você está se referindo. De qualquer maneira acredito que esse link https://support.google.com/a/answer/2855120?hl=pt-BR pode ajudar a sanar a sua dúvida. -- English Dear Contact Recompa, you are at the wrong forum. This group refers to the Google App Engine, a product of Google Cloud Platform line, which has nothing to do with Google Apps, which seems to be what you are referring to. Anyway I this this link https://support.google.com/a/answer/2855120?hl=en can help you out in your question. On Sunday, December 7, 2014 6:31:39 PM UTC-2, Contato Recompa wrote: Minha cota era Gratis GMAIL APPS se que era gratis , como é que faço para voltar para o GMAIL APPS (GRATIS ) * visite nosso Site : www.recompa.com.br http://www.recompa.com.br* -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Re: Trying to query the datastore
Here is the native datastore query for your problem if you want to try it out: DatastoreService ds = DatastoreServiceFactory.getDatastoreService(); Query query = new Query(Member.class.getSimpleName()); query.setFilter(new FilterPredicate(name, FilterOperator.EQUAL, nameParam)); ListEntity list = ds.prepare(query).asList(FetchOptions.Builder.withDefaults()); In my opinion, if you are building a new application right now, you should consider other options on accessing the Datastore, such as Objectify https://code.google.com/p/objectify-appengine/ or even the raw Datastore API. JDO is pretty heavyweight for the App Engine model. On Sunday, December 7, 2014 7:12:47 AM UTC-2, ajg...@gmail.com wrote: Hello, I am using eclipse+google plugin to develop a simple API with Cloud Endpoints. I just have a member entity with JDO annotations, and I generated the Endpoint class. My Member entity only has 2 fields: the Long id and String name. I would like to use a getMemberByName(String name) method to retrieve easily a member instead of having to know its id. So I added this method to the Endpoint class generated: @SuppressWarnings({“unchecked”}) @ApiMethod(name = “getMemberByName”,path=”/member/byname/{name}”) public List getMemberByName(@Named(“name”) String name) { PersistenceManager mgr = getPersistenceManager(); List candidates = null; Query q=mgr.newQuery(Member.class); q.setFilter(“name == nameParam”); q.declareParameters(“String nameParam”); try { candidates = (List) q.execute(nom); } finally { mgr.close(); } return candidates; } When I call this API method and give it a name parameter (which does exist in my datastore) it sends me nothing back but a 404 error. I thought maybe you could tell me if my code got any error, I’m not an expert with JDO. Thank you. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
Re: [google-appengine] How to increase Task Queue Execution timing
Thank you very much Gilberto! It's great to make contact with people out there who are on the same boat. I've just been watching a series of videos on pipelines, and I'm starting to get the pattern for big data processing that Google promotes: Datastore - Cloud Storage - BigQuery The key point is that BigQuery is append only, something I didn't realize before. Here are the videos: 1. Google I/O 2012 - Building Data Pipelines at Google Scale: http://youtu.be/lqQ6VFd3Tnw 2. BigQuery: Simple example of a data collection and analysis pipeline + Yo...: http://youtu.be/btJE659h5Bg 3. GCP Cloud Platform Integration Demo: http://youtu.be/JcOEJXopmgo via @YouTube All I need it seems is the Pipeline API, iterating over the Datastore (I guess, in order with a query) and producing a CSV (and other formats) output. That should allow me to do what I do already but on top of multiple (perhaps sequential) task queues, rather than just one. From the point of view of costs, currently I heavily rely on, possibly abusing, memcache. With no memcache, I expect costs to go up. A further improvement would be to update only subsets of data, rather than a whole lot. I've been designing a new datastore 'schema' so that my data is hierarchically organized in entity groups, therefore I could generate a file per entity group (once that's changed) and have a final stage that assembles those files together. I'm pretty happy with my current task because, as I wrote, is simple and elegant. If I could upgrade the same algorithm to a Datastore input reader for pipelines, that should do for us. Emanuele On Friday, 12 December 2014 02:29:54 UTC+13, Gilberto Torrezan Filho wrote: I've used MapReduce myself for while, and I can say it to you: 100+MB of keys means A LOT of keys at the shuffle stage. And the real limitations of MapReduce are: The total size of all the instances of Mapper, InputReader, OutputWriter and Counters must be less than 1MB between slices. This is because these instances are serialized and saved to the datastore between slices. Source https://github.com/GoogleCloudPlatform/appengine-mapreduce/wiki/2.6-The-Java-MapReduce-Library The real problem of MapReduce, in my opinion, is the latency of the operations and huge amount of read/writes to the datastore to maintain the things working between slices (which considerably increases costs). You can't rely on MapReduce to do real time or near real time work as you could with pure tasks queues. And it actually only shines when you can afford a large number of machines to run your logic - running MapReduce in few machines is sometimes worse than pure sequential brute force. Fitting your problem in a MapReduce process is actually good for your code - even if you don't use the library itself. It forces you to think on how can you split your huge tasks into smaller, more manageable and more scalable pieces. It's a good exercise - sometimes you think you can't parallelize your problem, but when you're forced to the MapReduce workflow, you might find you were actually wrong, and by the end of the day you have a better code. On Wednesday, December 10, 2014 6:22:17 PM UTC-2, Emanuele Ziglioli wrote: It all comes at a cost: increased complexity. You can't beat the simplicity of task queues and the 10m limit seems artificially imposed to me. I mean, we pay for CPU time, as we would pay for 20m, 30m, 1h tasks. I've got a simple task that takes a long time, looping through hundreds of thousands of rows to produced ordered files in output. The current code is simple and elegant but I have to keep increasing the CPU size in order to finish the task within 10m. A solution could be using MapReuce, but I haven't figured out yet how MapReduce would solve my problem without hitting the memory limit: with my simple task there are only 1000 rows in memory at any given time (of course, minus the GC). A MapReduce shuffle stage would require all entities, or at least their keys, to be kept in memory, and that's impossible with F1s or F2s. Emanuele On Wednesday, 10 December 2014 19:24:30 UTC+13, Vinny P wrote: On Sat, Dec 6, 2014 at 5:58 AM, Maneesh Tripathi maneesh@razaonline.info wrote: I have Created an task queue which stop working after 10 Min. I want to increase the timing. Please help me on this Task queue requests are limited to 10 minutes of execution time: https://cloud.google.com/appengine/docs/java/taskqueue/overview-push#task_deadlines If you need to go past the 10 minute deadline, you're better off using a manual or basic scaled module: https://cloud.google.com/appengine/docs/java/modules/#scaling_types - -Vinny P Technology Media Consultant Chicago, IL App Engine Code Samples: http://www.learntogoogleit.com -- You received this message because you are subscribed to the Google Groups Google App Engine group. To
[google-appengine] Re: How to Increase Task queue Execution time
That's the default behavior of tasks: they execute over 10 minutes max. To increase that you need to setup a backend and make your tasks execute in that backend. I don't know which language you are using, but I'll link the java documentation about that: https://cloud.google.com/appengine/docs/java/taskqueue/ https://cloud.google.com/appengine/docs/java/modules/ -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
Re: [google-appengine] How to increase Task Queue Execution timing
Actually I just migrated my statistics job from MapReduce to BigQuery (using the Datastore - Cloud Storage - BigQuery pattern) =) I strongly recommend the book Google BigQuery Analytics from Jordan Tigani and Siddartha Naidu if you plan to use or know more about BigQuery. I got mine at I/O this year (the last book in the box) =) BigQuery is awesome but have its quirks - the append-only tables is just one of them. You have to shape your business logic to handle that before starting to heavily use it. If you don't need statistics, you probably don't need BigQuery. The sad part is that I spent more than 2 months tweaking and improving my whole pipeline stack trying to get a better performance (or cost-effectiveness), when I could just be using BigQuery to solve my problems. Anyway, it was a good lesson. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
[google-appengine] Datastore NDB migration?
What would be the best way to migrate records? -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
Re: [google-appengine] Datastore NDB migration?
On Fri Dec 12 2014 at 3:07:25 PM John Louis Del Rosario joh...@gmail.com wrote: What would be the best way to migrate records? Records, i.e., the data itself, don't need to be migrated. -- Joe -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.
Re: [google-appengine] Datastore NDB migration?
I mean migrating a record's schema, or just setting a field's value for all existing records. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/d/optout.