Thanks Eli for your time helping me out.
You can have a look at the app I'm working on:
http://ucm.floreysoft.net
It's a contact manager app that will keep track of all changes to
create a revision history.
It a user has about 50000 contacts and e.g. adds all contacts to a new
group or changes some fields on all contacts I have to create a new
entity for the current version and shift the previous version to the
revision history (revisions are child of master record to be able to
handle the operation as an atomic commit),
So in fact I have to create two entities in a single transaction and
on top of that I have to make some calls to the Google contact servers
in order to sync changes.
So what I did is to create a task for each individual command executed
by the user and put the task to the task queue.
In worst case if a user modifies all 50000 contacts I'll add 50000
tasks to the task queue.
I'm using sharded counters to see how many tasks have already been
executed to display a progess bar to the user.
I guess it would speed things up if I would batch some operations per
tasks?
I've read somewhere that it is important to keep tasks under 1000ms -
so is it recommended to perform as many operations in a single task
that can be done in less than a second?

Thanks again!

Daniel


On 29 Nov., 19:15, Eli Jones <eli.jo...@gmail.com> wrote:
> I don't use transactions so I can't help you there.  But, it seems like
> trying to do big batch puts (say, on 50 parent entities and 50 child
> entities at the same time) in a transaction would introduce a lot of
> overhead.  Only way to know for sure is to test.
>
> Also, if you check the mapreduce page, it mentions that transactions are on
> the roadmap (but not yet supported):
>
> http://code.google.com/p/appengine-mapreduce/
>
> <http://code.google.com/p/appengine-mapreduce/>I know that there are
> technical limits to how much datastore putting you can do at once.  But, I
> haven't hit that limit except when putting around 330 entities per second
> for 10 minutes straight.  The average size of each entity was about 2KB, and
> the puts were only batches of 10 since there was a lot of up front
> processing that needed to be done first.  This was done using my own method
> of fanning out tasks.
>
> You should be able to do 100,000 entities extremely fast (without
> transactions).  If you can design a modification/put function that can
> update 100 entities in under 1 second and you used a queue that processed at
> 30/s.. you could do all the modifications in 33 seconds.
>
> How quickly could you pre-process 10 entities for a transactional put?
>  Depending on the size of them.. you could do the 100,000 entities in 5
> minutes at a rate of 30/s.. but I'm guessing about the transactional put on
> 5 parent and 5 children entities happening in under 1 second.  Either way,
> to speed up your update... you want to do these puts in batches of more than
> 1 parent/child pair.
>
> Also, what are you using sharded counters for?  How many entities does one
> task update right now?  The fact that it can take 1 hour to do 100,000
> entities suggests the process is extremely inefficient.
>
> Again, without seeing the code (or at least a pseudo-code outline) for what
> you are doing.. there is no way to really help you figure out a
> straightforward speedup of your process.
>
>
>
>
>
>
>
> On Mon, Nov 29, 2010 at 12:21 PM, dflorey <daniel.flo...@gmail.com> wrote:
> > Thanks a lot for your valuable replies!
> > I'll have to check out the current state of the map reduce lib as I
> > remember from Google IO that it does not support certain filters etc.
> > Simple question though: What is the maximum of updated entities/minute
> > inside a transaction that you have seen in the real world?
>
> > On 29 Nov., 18:04, Eli Jones <eli.jo...@gmail.com> wrote:
> > > You mention that "tasks get rescheduled for some reason".. what is the
> > > reason?  Does this reason occur frequently?
>
> > > Also, there is no way to evaluate how fast you can perform your
> > > modifications since you haven't shown the code that you are currently
> > using.
>
> > > There may be several simple tweaks to your existing code that could make
> > it
> > > much faster.
>
> > > On Mon, Nov 29, 2010 at 9:29 AM, dflorey <daniel.flo...@gmail.com>
> > wrote:
> > > > Thanks for your response. I though that mapreduce will also sit on top
> > > > of task queue and will most likely give any speed improvements over my
> > > > approach?
> > > > I am seeing ~1500 tasks per minute getting executed. Will mapreduce
> > > > give higher numbers?
>
> > > > Daniel
>
> > > > On 29 Nov., 10:41, Peter Ondruska <peter.ondru...@gmail.com> wrote:
> > > > > I would you mapreduce for GAE, seehttp://
> > > > code.google.com/p/appengine-mapreduce/.
> > > > > It has been integrated with latest SDK so no need to download, I use
> > > > > it with Python--just make sure to import
> > > > > google.appengine.ext.mapreduce.
>
> > > > > On 29 lis, 10:06, dflorey <daniel.flo...@gmail.com> wrote:
>
> > > > > > Hi,
> > > > > > I'm looking for the most effective way to update 50000 entities +
> > one
> > > > > > of the child entities each.
> > > > > > Right now I'm using a task per transaction to be able to modify the
> > > > > > entity and the child entities inside a transaction to make the task
> > > > > > idempotent.
> > > > > > I'm using sharded counters to check when the operation is done.
> > > > > > Everything works fine, but it takes very long (=minutes to hours)
> > to
> > > > > > perform the modifications.
> > > > > > I'm getting no concurrent modification exceptions etc. at all, but
> > > > > > tasks get rescheduled for some reason and wait for a long time
> > before
> > > > > > getting executed depending on the number of retries.
>
> > > > > > Is there a way to speed things up?
> > > > > > I'm looking for a solution that will execute the update almost
> > > > > > immediately :-)
> > > > > > My tasks take less than 1000ms each and I can see ~30 instances in
> > the
> > > > > > dashboard.
>
> > > > > > Thanks for any ideas,
>
> > > > > > Daniel
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to google-appengine@googlegroups.com
> > .
> > > > To unsubscribe from this group, send email to
> > > > google-appengine+unsubscr...@googlegroups.com<google-appengine%2Bunsubscrib
> > > >  e...@googlegroups.com><google-appengine%2Bunsubscrib
> > e...@googlegroups.com>
> > > > .
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com<google-appengine%2Bunsubscrib 
> > e...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to