I don't use transactions so I can't help you there.  But, it seems like
trying to do big batch puts (say, on 50 parent entities and 50 child
entities at the same time) in a transaction would introduce a lot of
overhead.  Only way to know for sure is to test.

Also, if you check the mapreduce page, it mentions that transactions are on
the roadmap (but not yet supported):

http://code.google.com/p/appengine-mapreduce/

<http://code.google.com/p/appengine-mapreduce/>I know that there are
technical limits to how much datastore putting you can do at once.  But, I
haven't hit that limit except when putting around 330 entities per second
for 10 minutes straight.  The average size of each entity was about 2KB, and
the puts were only batches of 10 since there was a lot of up front
processing that needed to be done first.  This was done using my own method
of fanning out tasks.

You should be able to do 100,000 entities extremely fast (without
transactions).  If you can design a modification/put function that can
update 100 entities in under 1 second and you used a queue that processed at
30/s.. you could do all the modifications in 33 seconds.

How quickly could you pre-process 10 entities for a transactional put?
 Depending on the size of them.. you could do the 100,000 entities in 5
minutes at a rate of 30/s.. but I'm guessing about the transactional put on
5 parent and 5 children entities happening in under 1 second.  Either way,
to speed up your update... you want to do these puts in batches of more than
1 parent/child pair.

Also, what are you using sharded counters for?  How many entities does one
task update right now?  The fact that it can take 1 hour to do 100,000
entities suggests the process is extremely inefficient.

Again, without seeing the code (or at least a pseudo-code outline) for what
you are doing.. there is no way to really help you figure out a
straightforward speedup of your process.

On Mon, Nov 29, 2010 at 12:21 PM, dflorey <daniel.flo...@gmail.com> wrote:

> Thanks a lot for your valuable replies!
> I'll have to check out the current state of the map reduce lib as I
> remember from Google IO that it does not support certain filters etc.
> Simple question though: What is the maximum of updated entities/minute
> inside a transaction that you have seen in the real world?
>
>
> On 29 Nov., 18:04, Eli Jones <eli.jo...@gmail.com> wrote:
> > You mention that "tasks get rescheduled for some reason".. what is the
> > reason?  Does this reason occur frequently?
> >
> > Also, there is no way to evaluate how fast you can perform your
> > modifications since you haven't shown the code that you are currently
> using.
> >
> > There may be several simple tweaks to your existing code that could make
> it
> > much faster.
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 29, 2010 at 9:29 AM, dflorey <daniel.flo...@gmail.com>
> wrote:
> > > Thanks for your response. I though that mapreduce will also sit on top
> > > of task queue and will most likely give any speed improvements over my
> > > approach?
> > > I am seeing ~1500 tasks per minute getting executed. Will mapreduce
> > > give higher numbers?
> >
> > > Daniel
> >
> > > On 29 Nov., 10:41, Peter Ondruska <peter.ondru...@gmail.com> wrote:
> > > > I would you mapreduce for GAE, seehttp://
> > > code.google.com/p/appengine-mapreduce/.
> > > > It has been integrated with latest SDK so no need to download, I use
> > > > it with Python--just make sure to import
> > > > google.appengine.ext.mapreduce.
> >
> > > > On 29 lis, 10:06, dflorey <daniel.flo...@gmail.com> wrote:
> >
> > > > > Hi,
> > > > > I'm looking for the most effective way to update 50000 entities +
> one
> > > > > of the child entities each.
> > > > > Right now I'm using a task per transaction to be able to modify the
> > > > > entity and the child entities inside a transaction to make the task
> > > > > idempotent.
> > > > > I'm using sharded counters to check when the operation is done.
> > > > > Everything works fine, but it takes very long (=minutes to hours)
> to
> > > > > perform the modifications.
> > > > > I'm getting no concurrent modification exceptions etc. at all, but
> > > > > tasks get rescheduled for some reason and wait for a long time
> before
> > > > > getting executed depending on the number of retries.
> >
> > > > > Is there a way to speed things up?
> > > > > I'm looking for a solution that will execute the update almost
> > > > > immediately :-)
> > > > > My tasks take less than 1000ms each and I can see ~30 instances in
> the
> > > > > dashboard.
> >
> > > > > Thanks for any ideas,
> >
> > > > > Daniel
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com
> .
> > > To unsubscribe from this group, send email to
> > > google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com><google-appengine%2Bunsubscrib
> e...@googlegroups.com>
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to