I like the concept of MapReduce, however, I think it might be easier to borrow a page from Apple with the Grand Central Dispatch released in Snow Leopard. The hardest part would be implement a usable tool / framework in Java which many developers could leverage and understand. Especially, in my experience, most developers do not write thread safe code by default which is a fundamental tenant to both GCD and MapReduce.
Tim On Nov 13, 4:55 pm, "Ikai L (Google)" <ika...@google.com> wrote: > Thanks for the feedback, Tim. It sounds to me like what you are looking for > is MapReduce support. There's an feature in our issue tracker for this: > > http://code.google.com/p/googleappengine/issues/detail?id=112 > > Map/Reduce would be a great fit for our model since the work could be > transparently distributed among your application instances. App Engine > definitely favors the approach you describe of breaking a big job into > smaller pieces and reassembling the data, but currently this is up to the > developer to manage and build. > > On Thu, Nov 12, 2009 at 8:26 AM, tsp...@tangiblesoftware.com < > > > > > > tsp...@tangiblesoftware.com> wrote: > > Ikai, > > This is not really a relational data question. It is a summary data > > question. To give a brief overview on my approach; here is the history > > over the past 20 years on my approach to summary information: > > > 1. Calculate the summary information on the fly per user request. > > Very database intensive and potentially slow performance for the user. > > 2. Create summary data tables which the application can read very > > quickly, use database triggers to create/update the summary values. > > Improved user experience, but has a penalty at write time and requires > > developers to know two tools (database triggers and application > > language). > > 3. Same approach as number 2, but create/update the summary values > > in the application code. Reduces maintenance headaches by having a > > single tool, makes the write performance a little worse because now > > the transaction spans computers/servers. Since servers are cheap and > > developers are not, this became the preferred approach. > > 4. Avoid the possible create/search of step two/three and assume a > > summary record exists at time of write. Increases performance by > > eliminating the check for a summary record at each write, downside; > > need an asynchronous process to pre-create all possible summary > > records and prune ones which never were used after a reasonable time. > > > Depending on the requirements, I prefer the first or forth choice > > (mostly read to write ratio is what matters). However, it is hard to > > create a long running process via the existing toolset and constraints > > provided by GAE. Because of this, I was falling back to the third > > option; which was the basis for my original question. (I am looking > > into trying to break the process into many 30 seconds or less tasks, > > but it is not looking like a practical solution yet. This is another > > reason we need to get support for long running batch processes within > > GAE.) > > > Tim > > > On Nov 10, 5:44 pm, "Ikai L (Google)" <ika...@google.com> wrote: > > > Tim, > > > > It really depends on what you're doing. One of the challenges of > > developing > > > on a distributed store like the App Engine data store is adjusting the > > way > > > you approach persistence for objects. For instance, suppose you store > > > favorite colors per application user. The canonical way of solving this > > > problem in a relational environment is to normalize the color data and > > > create a lock around inserting each individual new color. In App Engine's > > > environment, we would likely recommend that you take advantage of data > > store > > > list properties as a much more performant alternative to data > > normalization: > > > App Engine will handle all the indexing for you. > > > > If you are working with objects in parent/child relationships and require > > > transactional integrity, you should take a look at our documentation > > > describing Entities and Entity Groups: > >http://code.google.com/appengine/docs/java/datastore/transactions.html. > > > > On Fri, Nov 6, 2009 at 12:12 PM, tsp...@tangiblesoftware.com < > > > > tsp...@tangiblesoftware.com> wrote: > > > > > Guys, > > > > In a normal relational database, I am used to using a combination > > > > of singletons (single application server), synchronized objects in a > > > > dedicated thread (single application server) or table locks (multiple > > > > application servers) to manage the creation of summary data records > > > > which could created by multiple simultaneous requests. > > > > In GAE, none of the methods seem to be supported; what would be > > > > the suggested method? > > > > > I am using the JPA method of accessing the data store. > > > > > Thanks, > > > > > Tim > > > > -- > > > Ikai Lan > > > Developer Programs Engineer, Google App Engine > > > -- > > > You received this message because you are subscribed to the Google Groups > > "Google App Engine for Java" group. > > To post to this group, send email to > > google-appengine-j...@googlegroups.com. > > To unsubscribe from this group, send email to > > google-appengine-java+unsubscr...@googlegroups.com<google-appengine-java%2B > > unsubscr...@googlegroups.com> > > . > > For more options, visit this group at > >http://groups.google.com/group/google-appengine-java?hl=. > > -- > Ikai Lan > Developer Programs Engineer, Google App Engine -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-j...@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=.