[google-appengine] Re: A question for Jaiku's developers, if they're watching..

peterk Fri, 13 Mar 2009 06:28:11 -0700

Unfortunately I do need to query them based on subscriber_id..so I
can't pack them into a non-indexed property.


Retrieving updates particular user has subscribed to is blazingly fast
though...that's the gain in the end, I can query and fetch 1000
updates for a user sorted by date in 20-30ms-cpu. Love that :p In my
hacky approaches previously where I tried to write once and then
'gather', I had to do lots of in-memory sorting and stuff, and even
the results often wouldn't be totally accurate.

I'm going to keep toying with the write end of things though..because
in my full app, I may need to do write to other entities along with
subscribers to do certain things I'm trying to achieve. So I'm going
to be looking for every opportunity possible to optimise the cost of
an 'update', which in my case may go beyond notifying subscribers. So
any thoughts/ideas on further optimisation are more than welcome!!

@Paul

If you've more subscribers than will fit in one 'group' you'll need
multiple groups, correct. So you'll have n writes, where n = number of
subscribers/group-size, rounded up to the nearest whole number. Even
with the costly index creation for each of these 'group' entities
though, it should still work out a fair bit cheaper than writing a
seperate entity for each subscriber.


On Mar 13, 11:47 am, bFlood <bflood...@gmail.com> wrote:
> @peterk - if you don't need to query by the subscriber, you could
> alternatively pack the list of subscribers for a feed into a
> TextProperty so it is not indexed. I use TextProperty a lot to store
> large lists of geometry data and they work out pretty well
>
> @brett - async! looking forward to it in future GAE builds. thanks
>
> cheers
> brian
>
> On Mar 13, 5:37 am, peterk <peter.ke...@gmail.com> wrote:
>
> > I was just toying around with this idea yesterday Brett.. :D I did
> > some profiling, and it would reduce the write cost per subscriber to
> > about 24ms-40ms (depending on the number of subscribers you have..more
> > = lower cost per avg), from 100-150ms. These are rough numbers with
> > entities I was using, I have to do some more accurate profiling..
>
> > When I first thought about doing this, I was thinking ":o I'll reduce
> > write cost by a factor of hundreds!", but as it turns out, the extra
> > index update time for an entity with a large number of list property
> > entries eats into that saving significantly.
>
> > But it still is a saving. Funnily enough the per subscriber saving
> > increases (to a point) the more subscribers you have.
>
> > I'm not sure if there's anything one can do to optimise index creation
> > time with large lists.. I'm going to do some more work as well to see
> > if there's an optimum 'batch size' for grouping subscribers
> > together..at first blush, as mentioned above, it seems the larger the
> > better (up to the per entity property/index cap of course).
>
> > Thanks also for the insight on pubsubhubub..I eagerly await updates on
> > that front :) Thank you!!
>
> > On Mar 13, 8:05 am, Paul Kinlan <paul.kin...@gmail.com> wrote:
>
> > > Just Curious,
>
> > > For other pub/sub-style systems where you want to write to the
> > > Datastore, the trick is to use list properties to track the
> > > subscribers you've published to. So for instance, instead of writing a
> > > single entity per subscriber, you write one entity with 1000-2000
> > > subscriber IDs in a list. Then all queries for that list with an
> > > equals filter for the subscriber will show the entity. This lets you
> > > pack a lot of information into a single entity write, thus minimizing
> > > Datastore overhead, cost, etc. Does that make sense?
>
> > > So if you have over the 5000 limit in the subscribers would you write the
> > > entity twice? Each with differnt subscriber id's?
>
> > > Paul
>
> > > 2009/3/13 Brett Slatkin <brett-appeng...@google.com>
>
> > > > Heyo,
>
> > > > Good finds, peterk!
>
> > > > pubsubhubbub uses some of the same techniques that Jaiku uses for
> > > > doing one-to-many fan-out of status message updates. The migration is
> > > > underway as we speak
> > > > (http://www.jaiku.com/blog/2009/03/11/upcoming-service-break/). I
> > > > believe the code should be available very soon.
>
> > > > 2009/3/11 peterk <peter.ke...@gmail.com>:
>
> > > > > The app is actually live here:
>
> > > > >http://pubsubhubbub.appspot.com/
> > > > >http://pubsubhubbub-subscriber.appspot.com/
>
> > > > > (pubsubhubbub-publisher isn't there, but it's trivial to upload your
> > > > > own.)
>
> > > > > This suggests it's working on appengine as it is now. Been looking
> > > > > through the source, and I'm not entirely clear on how the 'background
> > > > > workers' are actually working..there are two, one for pulling updates
> > > > > to feeds from publishers, and one for propogating updates to
> > > > > subscribers in batches.
>
> > > > > But like I say, I can't see how they're actually started and running
> > > > > constantly.  There is a video here of a live demonstration:
>
> > > > >http://www.veodia.com/player.php?vid=fCNU1qQ1oSs
>
> > > > > The background workers seem to be behaving as desired there, but I'm
> > > > > not sure if they were just constantly polling some urls to keep the
> > > > > workers live for the purposes of that demo, or if they're actually
> > > > > running somehow constantly on their own.. I can't actually get the
> > > > > live app at the urls above to work, but not sure if it's because
> > > > > background workers aren't really working, or because i'm feeding it
> > > > > incorrect urls/configuration etc.
>
> > > > Ah sorry yeah I still have the old version of the source running on
> > > > pubsubhubbub.appspot.com; I need to update that with a more recent
> > > > build. Sorry for the trouble! It's still not quite ready for
> > > > widespread use, but it should be soon.
>
> > > > The way pubsubhubbub does fan-out, there's no need to write an entity
> > > > for each subscriber of a feed. Instead, each time it consumes a task
> > > > from the work queue it will update the current iterator position in
> > > > the query result of subscribers for a URL. Subsequent work requests
> > > > will offset into the subscribers starting at the iterator position.
> > > > This works well in this case because it's using urlfetch to actually
> > > > notify subscribers, instead of writing to the Datastore.
>
> > > > For other pub/sub-style systems where you want to write to the
> > > > Datastore, the trick is to use list properties to track the
> > > > subscribers you've published to. So for instance, instead of writing a
> > > > single entity per subscriber, you write one entity with 1000-2000
> > > > subscriber IDs in a list. Then all queries for that list with an
> > > > equals filter for the subscriber will show the entity. This lets you
> > > > pack a lot of information into a single entity write, thus minimizing
> > > > Datastore overhead, cost, etc. Does that make sense?
>
> > > > @bFlood: Indeed, the async_apiproxy.py code is interesting. Not much
> > > > to say about that at this time, besides the fact that it works. =)
>
> > > > -Brett
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: A question for Jaiku's developers, if they're watching..

Reply via email to