[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-21 Thread Mathieu Simard
I need the median value for multiple entities but only compared to 
themselves.
In the future I will probably create the media across entities by doing a 
median average.

On Tuesday, November 19, 2013 9:23:59 PM UTC-5, Jim wrote:

 Are you doing a time-series type analysis where you need the rolling 
 median value for a specific entity, or do you need the median value across 
 a range of entities?



 On Tuesday, November 12, 2013 2:07:34 PM UTC-6, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-21 Thread Jim
missed your comment... this is what we're doing, except we avoid the 1MB 
limitation by storing the data sets in blobs and store the pointer to the 
blob in the entity record


On Wednesday, November 13, 2013 1:20:21 PM UTC-6, Kaan Soral wrote:

 A single datastore entity can hold up to 1MB's

 How big will a single dataset be?

 If it's smaller than 1MB's in summarized format, you could build a queue 
 based solution to handle the 15/s data rate

 You could also probably develop something like a tree, with each entity 
 representing a node and storing the data about the leafs etc, it could 
 maybe lead up to a practical median calculator, just an idea, the point is:

 as Vinny P stated, the solution is always based on exactly what you are 
 doing

 On Tuesday, November 12, 2013 10:07:34 PM UTC+2, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-21 Thread Mathieu Simard
I'm already using that approach.
However, the distribution of my metrics require a more precise solution for 
my median.

On Thursday, November 21, 2013 12:50:41 AM UTC-5, Luca de Alfaro wrote:

 If you can weigh recent data more than older data, you might consider 
 instead of building a rolling average, an  exponentially decaying weights 
 average. 

 You can store in ndb, sharded, total_amount, and total_weight, and 
 timestamp.  
 Then, when you get an update, you compute the decay_factor, which is equal 
 to exp(- time since update / time constant). 
 You then do: 
 total_amount = total_amount * decay_factor + amount_now
 total_weight = total_weight * decay_factor + weight_now
 timestamp = present time
 avg = total_amount / total_weight


 On Tuesday, November 12, 2013 12:07:34 PM UTC-8, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-21 Thread Mathieu Simard
The query volume doubled since I first started this thread and, at the
current rate, should double again by the end of next week.
I definitely need a solution that can handle in the thousand QPS because
that's where we're heading.
I'm currently running this without consistency using a dedicated memcache.


On Thu, Nov 21, 2013 at 2:00 PM, Jim jeb62...@gmail.com wrote:

 If data points for each entity are not coming too fast, you could use
 blobstore/gcs to store your time series for each entity in a blob, then
 store a pointer to that blob in your entity in the data store.  updating is
 expensive but can run off a task queue.  retrieval of the blobs is very
 fast, and then you can quicky parse the blob into memory and compute your
 stats on a given entity.  cross entity stats are trickier and require some
 map-reduce-esque processing.

 we use this approach for smart-meter analytics where data points for a
 given entity (meter) don't come any faster than once every 15 min... not
 sure if it would work for you.



 On Thursday, November 21, 2013 8:12:15 AM UTC-6, Mathieu Simard wrote:

 I need the median value for multiple entities but only compared to
 themselves.
 In the future I will probably create the media across entities by doing a
 median average.

 On Tuesday, November 19, 2013 9:23:59 PM UTC-5, Jim wrote:

 Are you doing a time-series type analysis where you need the rolling
 median value for a specific entity, or do you need the median value across
 a range of entities?



 On Tuesday, November 12, 2013 2:07:34 PM UTC-6, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than
 running a redis instance on Google Compute Engine?

  --
 You received this message because you are subscribed to a topic in the
 Google Groups Google App Engine group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/google-appengine/VMG96Xzvsok/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to
 google-appengine+unsubscr...@googlegroups.com.
 To post to this group, send email to google-appengine@googlegroups.com.
 Visit this group at http://groups.google.com/group/google-appengine.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-21 Thread Jim
If data points for each entity are not coming too fast, you could use 
blobstore/gcs to store your time series for each entity in a blob, then 
store a pointer to that blob in your entity in the data store.  updating is 
expensive but can run off a task queue.  retrieval of the blobs is very 
fast, and then you can quicky parse the blob into memory and compute your 
stats on a given entity.  cross entity stats are trickier and require some 
map-reduce-esque processing.

we use this approach for smart-meter analytics where data points for a 
given entity (meter) don't come any faster than once every 15 min... not 
sure if it would work for you.



On Thursday, November 21, 2013 8:12:15 AM UTC-6, Mathieu Simard wrote:

 I need the median value for multiple entities but only compared to 
 themselves.
 In the future I will probably create the media across entities by doing a 
 median average.

 On Tuesday, November 19, 2013 9:23:59 PM UTC-5, Jim wrote:

 Are you doing a time-series type analysis where you need the rolling 
 median value for a specific entity, or do you need the median value across 
 a range of entities?



 On Tuesday, November 12, 2013 2:07:34 PM UTC-6, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-20 Thread Luca de Alfaro
If you can weigh recent data more than older data, you might consider 
instead of building a rolling average, an  exponentially decaying weights 
average. 

You can store in ndb, sharded, total_amount, and total_weight, and 
timestamp.  
Then, when you get an update, you compute the decay_factor, which is equal 
to exp(- time since update / time constant). 
You then do: 
total_amount = total_amount * decay_factor + amount_now
total_weight = total_weight * decay_factor + weight_now
timestamp = present time
avg = total_amount / total_weight


On Tuesday, November 12, 2013 12:07:34 PM UTC-8, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-19 Thread Jim
Are you doing a time-series type analysis where you need the rolling median 
value for a specific entity, or do you need the median value across a range 
of entities?



On Tuesday, November 12, 2013 2:07:34 PM UTC-6, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-13 Thread Gilberto Torrezan Filho
I miss some Redis functionality in App Engine as well. Memcache is just an 
unreliable cache to hold some data for while... nothing more.

To make such calculations which iterate over large sets of data, I use 
backends with in-memory processing: loading part of the data from datastore 
into memory, spawn multiple threads (if applicable) and iterate over data. 
Ugly, strange, error-prone and sometimes slow, but it works.

A bomb-to-kill-an-ant solution would be using Google BigQuery. I don't like 
like the idea, but depending on your problem it can solve it for you.

You can try to use some MapReduce processing as well. But since I'm using 
Java (a not so loved language in App Engine, see servlet 3.0 
discussionhttp://code.google.com/p/googleappengine/issues/detail?id=3091) 
MapReduce (Mapper, actually http://code.google.com/p/appengine-mapreduce/) 
is too experimental to put in production (after the Conversion and Files 
API, I learned my lesson: never ever ever use an experimental API in App 
Engine).

Anyway, you have several options to try. I just recommend you to avoid 
storing large datasets on Memcache, since it's just a cache and can wipe 
your data at any time - invalidating your calculations.

On Tuesday, November 12, 2013 6:07:34 PM UTC-2, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-13 Thread Mathieu Simard
Storing the data points isn't an option.
I'm already receiving far too many data points to start writing them all.
Unless they dramatically lower the write costs...

As for the in-memory approach, can you provide a scale at which you're 
using this technique?
Do you have to ensure that the backend runs on a single thread?

On Wednesday, November 13, 2013 6:59:49 AM UTC-5, Gilberto Torrezan Filho 
wrote:

 I miss some Redis functionality in App Engine as well. Memcache is just an 
 unreliable cache to hold some data for while... nothing more.

 To make such calculations which iterate over large sets of data, I use 
 backends with in-memory processing: loading part of the data from datastore 
 into memory, spawn multiple threads (if applicable) and iterate over data. 
 Ugly, strange, error-prone and sometimes slow, but it works.

 A bomb-to-kill-an-ant solution would be using Google BigQuery. I don't 
 like like the idea, but depending on your problem it can solve it for you.

 You can try to use some MapReduce processing as well. But since I'm using 
 Java (a not so loved language in App Engine, see servlet 3.0 
 discussionhttp://code.google.com/p/googleappengine/issues/detail?id=3091) 
 MapReduce (Mapper, actuallyhttp://code.google.com/p/appengine-mapreduce/) 
 is too experimental to put in production (after the Conversion and Files 
 API, I learned my lesson: never ever ever use an experimental API in App 
 Engine).

 Anyway, you have several options to try. I just recommend you to avoid 
 storing large datasets on Memcache, since it's just a cache and can wipe 
 your data at any time - invalidating your calculations.

 On Tuesday, November 12, 2013 6:07:34 PM UTC-2, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-13 Thread Kaan Soral
A single datastore entity can hold up to 1MB's

How big will a single dataset be?

If it's smaller than 1MB's in summarized format, you could build a queue 
based solution to handle the 15/s data rate

You could also probably develop something like a tree, with each entity 
representing a node and storing the data about the leafs etc, it could 
maybe lead up to a practical median calculator, just an idea, the point is:

as Vinny P stated, the solution is always based on exactly what you are 
doing

On Tuesday, November 12, 2013 10:07:34 PM UTC+2, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-13 Thread Mathieu Simard
Here's a better definition of the problem:

I receive a *value* for a tracked metric (i.e. start-up time) for different 
systems at a rate of 15/s.
This rate is expected to grow quickly as clients add systems.
I need to produce a rolling median of that metric.

Inserting all entries in the datastore is not an option since that would 
already require 15 writes per second which is extremely expensive.
Using the memcache is not a good solution since there is no atomic push/pop 
on arrays. (Hence, my earlier reference to Redis.)
Using a backend instance to hold it all in memory is a quick fix but it 
won't scale as we add new metrics.

At the same time, I'm trying to keep the cost low since volume is only 
going to grow.

On Wednesday, November 13, 2013 2:20:21 PM UTC-5, Kaan Soral wrote:

 A single datastore entity can hold up to 1MB's

 How big will a single dataset be?

 If it's smaller than 1MB's in summarized format, you could build a queue 
 based solution to handle the 15/s data rate

 You could also probably develop something like a tree, with each entity 
 representing a node and storing the data about the leafs etc, it could 
 maybe lead up to a practical median calculator, just an idea, the point is:

 as Vinny P stated, the solution is always based on exactly what you are 
 doing

 On Tuesday, November 12, 2013 10:07:34 PM UTC+2, Mathieu Simard wrote:

 Since there is no appengine solution available such as the Redis atomic 
 list, I'm left wondering how to implement a cost effective rolling median.
 Has anyone come up with a solution that would be more convenient than 
 running a redis instance on Google Compute Engine?



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: What is the most efficient way to compute a rolling median on appengine?

2013-11-13 Thread Vinny P
On Wed, Nov 13, 2013 at 7:58 PM, Mathieu Simard mathieu.simar...@gmail.com
 wrote:

 Here's a better definition of the problem:
 I receive a *value* for a tracked metric (i.e. start-up time) for
 different systems at a rate of 15/s.
 This rate is expected to grow quickly as clients add systems.
 I need to produce a rolling median of that metric.
 At the same time, I'm trying to keep the cost low since volume is only
 going to grow.



A few months back someone posted a similar problem to this mailing list: a
mobile game needed a backend to collect scores from thousands of mobile
clients, compute a leaderboard, then send back the leaderboard to the
clients, all in a 10 second window. After the discussion, the consensus
IIRC was to either (1) run App Engine backends to reap incoming requests
and calculate values within a high-memory backend, or (2) run a Compute
Engine machine to reap and calculate values.

The choice is up to you since it depends on what you're comfortable with,
but if low cost is an important goal I'd choose the Compute Engine route.
Since your application and values can be held entirely within RAM, you can
choose a high-memory, diskless instance to optimize your resource usage.

However if the incoming values will have spiky traffic levels or if your
application requires complex services such as Endpoints, hosting on App
Engine is the better solution.


-
-Vinny P
Technology  Media Advisor
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.