Re: [google-appengine] What is the most efficient way to compute a rolling median on appengine?

Jeff Schnitzer Tue, 19 Nov 2013 12:33:38 -0800

Yikes, most of these suggestions are crazy complicated. Even redis seems
excessive.


Write a little server that accepts values, stores them in memory, rolls off
old values, and answers requests for the median. Run it on GCE or linode or
whatever. This should be a few hours of programming, and even a simplistic
implementation in Java should be able to handle hundreds of QPS. Even if
you hit capacity, you can always start sampling. Or you can federate.

Don't treat GAE as "all or nothing".

Jeff


On Tue, Nov 19, 2013 at 6:50 PM, John Belmonte <jbelmo...@google.com> wrote:

> Hi Mathieu,
>
> I've been toying with concurrent data structures on memcache.  I haven't
> gotten as far as an (indexed) skiplist yet, but I believe it's possible--
> and that structure is efficient for maintaining a running percentile.  Is
> your app in Python by any chance?
>
>   http://github.com/google/memcache-collections
>
>
>
> On Friday, November 15, 2013 11:29:11 AM UTC-8, Mathieu Simard wrote:
>>
>> I decided to settle with a solution that will do the trick for now.
>>
>> I'm building heaps of data per metrics per system in memcache and just
>> accepting the fact that sometimes I will not be able to compute my results.
>> I've no guarantee of consistency, since now all my writes in memcache can
>> overwrite others.
>>
>> It's far from perfect, but it does the trick quite well at a fraction of
>> the cost.
>> If flush rates get too intense I'll simply upgrade to a dedicated
>> memcache.
>>
>> I will post an update if I hit a wall with that approach.
>>
>> On Friday, November 15, 2013 2:10:39 PM UTC-5, Kaan Soral wrote:
>>>
>>> This seems like a bad idea, you could just add it as a pull-queue task,
>>> instances can die pretty easily
>>>
>>> You also assumed that the data is small enough to process/store easily,
>>> however this doesn't seem to be the case
>>>
>>> On Friday, November 15, 2013 9:04:58 PM UTC+2, Stephen wrote:
>>>>
>>>>
>>>> On Tue, Nov 12, 2013 at 8:07 PM, Mathieu Simard 
>>>> <mathieu....@gmail.com>wrote:
>>>>
>>>>> Since there is no appengine solution available such as the Redis
>>>>> atomic list, I'm left wondering how to implement a cost effective rolling
>>>>> median.
>>>>> Has anyone come up with a solution that would be more convenient than
>>>>> running a redis instance on Google Compute Engine?
>>>>>
>>>>
>>>> - batch data in front-end instance memory
>>>>  - flush data to a pull queue every time window
>>>> - run a task every time window to gather all data batches from pull
>>>> queue
>>>> - merge data, compute moving median, write result to data store
>>>>
>>>>  Instances will be started as more data is submitted. Tune the
>>>> frequency of the calculation so that the size of the pending data batches
>>>> does not overwhelm a single task.
>>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [google-appengine] What is the most efficient way to compute a rolling median on appengine?

Reply via email to