[google-appengine] Re: Number of Entities in DataStore

2009-08-13 Thread ajacks504

"Perhaps the best way to do this would be a set of boolean values
indicating
if this entry was logged in the first second of the minute, the first
minute
of the hour, and so forth. Then you can fetch the first entity of any
interval that you've catered for."

I like it...

my update rate isnt going to be crazy, im going to have my router poll
it in a python script and upload data only once every 30 seconds or
so.  thats...

2,888 / day
20,160 / week
80,640 / month
967,680 / year

i really only want 30 second resolution on the last week or so of
data...  maybe i can use your task queue idea and "roll up" data in
increasingly larger chunks based on how old it is.

thanks again.

On Aug 13, 10:30 am, "Nick Johnson (Google)" 
wrote:
> On Thu, Aug 13, 2009 at 4:26 PM, ajacks504  wrote:
>
> > thanks for the responses guys.  I think i'm getting the hang of things
> > here...
>
> > >> 2- i dont know how to grab the decimated data in a way that wont be
> > >> computationaly expensive
>
> > >You could add a property that you set randomly to a number between 0
> > >and 9. Fetching every record where that property equals (for example)
> > >0 will get you 1/10 the records.
>
> > That might work well, maybe instead of random i can age them 0-9 as i
> > post them, then select only records with age = 4?
>
> Maintaining a sequential, distributed counter is tricky. If you need exact
> numbering, you'd be better kicking off a regular task that numbers existing
> entities that don't yet have numbers.
>
> > what i really want is something that grabs every nth record
> > efficiently so that i can adjust the timebase on the display chart and
> > get only "enough" evenly decimated samples from the DB to display
> > based on the timebase.
>
> Perhaps the best way to do this would be a set of boolean values indicating
> if this entry was logged in the first second of the minute, the first minute
> of the hour, and so forth. Then you can fetch the first entity of any
> interval that you've catered for.
>
>
>
>
>
> > small skips for today only, larger for days, larger for months...
> > just like you guys do for the google.finance stock displays...
>
> > thanks again for the replys...
>
> > On Aug 12, 7:18 pm, sboire  wrote:
> > > True indeed it will timeout if you try to do them all in the same
> > > request. The trick is to return the last entry info of the bunch of
> > > 1000 records as part of the request response and then query another
> > > time for the next 1000. Still, as you mention., there is a huge
> > > bandwidth and CPU hit to do so. So in practice that may not be a good
> > > solution for counting, more for iterating a whole database.Using
> > > key_only option in your query may significantly increases the
> > > performance of the query, allowing you to do several "1000" chunk in
> > > one request, better, but still inefficient for counting unless the
> > > datasets stays within few thousands.
>
> > > Sebastien
>
> > > On Aug 12, 2:12 pm, Neal Walters  wrote:
>
> > > > Sboire,
> > > >   There is an "offset" parm on the fetch, so yes, you can get 1000
> > > > records at a time in a loop.
> > > > I believe however this is discouraged because it will eat up your CPU
> > > > quota, and potentially you could hit other limits and quotas.  Imagine
> > > > if you had 5 million records.  Reading 1000 at a time would take 5000
> > > > calls.  Even on a MySQL database with PHP for example, you would
> > > > probably hit the various CPU limits per task reading so many records
> > > > in one round-trip from the client to the server.
>
> > > > Neal Walters
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-13 Thread Nick Johnson (Google)
On Thu, Aug 13, 2009 at 4:26 PM, ajacks504  wrote:

>
> thanks for the responses guys.  I think i'm getting the hang of things
> here...
>
> >> 2- i dont know how to grab the decimated data in a way that wont be
> >> computationaly expensive
>
> >You could add a property that you set randomly to a number between 0
> >and 9. Fetching every record where that property equals (for example)
> >0 will get you 1/10 the records.
>
> That might work well, maybe instead of random i can age them 0-9 as i
> post them, then select only records with age = 4?


Maintaining a sequential, distributed counter is tricky. If you need exact
numbering, you'd be better kicking off a regular task that numbers existing
entities that don't yet have numbers.


> what i really want is something that grabs every nth record
> efficiently so that i can adjust the timebase on the display chart and
> get only "enough" evenly decimated samples from the DB to display
> based on the timebase.


Perhaps the best way to do this would be a set of boolean values indicating
if this entry was logged in the first second of the minute, the first minute
of the hour, and so forth. Then you can fetch the first entity of any
interval that you've catered for.


>
>
> small skips for today only, larger for days, larger for months...
> just like you guys do for the google.finance stock displays...
>
> thanks again for the replys...
>
> On Aug 12, 7:18 pm, sboire  wrote:
> > True indeed it will timeout if you try to do them all in the same
> > request. The trick is to return the last entry info of the bunch of
> > 1000 records as part of the request response and then query another
> > time for the next 1000. Still, as you mention., there is a huge
> > bandwidth and CPU hit to do so. So in practice that may not be a good
> > solution for counting, more for iterating a whole database.Using
> > key_only option in your query may significantly increases the
> > performance of the query, allowing you to do several "1000" chunk in
> > one request, better, but still inefficient for counting unless the
> > datasets stays within few thousands.
> >
> > Sebastien
> >
> > On Aug 12, 2:12 pm, Neal Walters  wrote:
> >
> > > Sboire,
> > >   There is an "offset" parm on the fetch, so yes, you can get 1000
> > > records at a time in a loop.
> > > I believe however this is discouraged because it will eat up your CPU
> > > quota, and potentially you could hit other limits and quotas.  Imagine
> > > if you had 5 million records.  Reading 1000 at a time would take 5000
> > > calls.  Even on a MySQL database with PHP for example, you would
> > > probably hit the various CPU limits per task reading so many records
> > > in one round-trip from the client to the server.
> >
> > > Neal Walters
> >
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-13 Thread ajacks504

thanks for the responses guys.  I think i'm getting the hang of things
here...

>> 2- i dont know how to grab the decimated data in a way that wont be
>> computationaly expensive

>You could add a property that you set randomly to a number between 0
>and 9. Fetching every record where that property equals (for example)
>0 will get you 1/10 the records.

That might work well, maybe instead of random i can age them 0-9 as i
post them, then select only records with age = 4?

what i really want is something that grabs every nth record
efficiently so that i can adjust the timebase on the display chart and
get only "enough" evenly decimated samples from the DB to display
based on the timebase.

small skips for today only, larger for days, larger for months...
just like you guys do for the google.finance stock displays...

thanks again for the replys...

On Aug 12, 7:18 pm, sboire  wrote:
> True indeed it will timeout if you try to do them all in the same
> request. The trick is to return the last entry info of the bunch of
> 1000 records as part of the request response and then query another
> time for the next 1000. Still, as you mention., there is a huge
> bandwidth and CPU hit to do so. So in practice that may not be a good
> solution for counting, more for iterating a whole database.Using
> key_only option in your query may significantly increases the
> performance of the query, allowing you to do several "1000" chunk in
> one request, better, but still inefficient for counting unless the
> datasets stays within few thousands.
>
> Sebastien
>
> On Aug 12, 2:12 pm, Neal Walters  wrote:
>
> > Sboire,
> >   There is an "offset" parm on the fetch, so yes, you can get 1000
> > records at a time in a loop.
> > I believe however this is discouraged because it will eat up your CPU
> > quota, and potentially you could hit other limits and quotas.  Imagine
> > if you had 5 million records.  Reading 1000 at a time would take 5000
> > calls.  Even on a MySQL database with PHP for example, you would
> > probably hit the various CPU limits per task reading so many records
> > in one round-trip from the client to the server.
>
> > Neal Walters
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-12 Thread sboire

True indeed it will timeout if you try to do them all in the same
request. The trick is to return the last entry info of the bunch of
1000 records as part of the request response and then query another
time for the next 1000. Still, as you mention., there is a huge
bandwidth and CPU hit to do so. So in practice that may not be a good
solution for counting, more for iterating a whole database.Using
key_only option in your query may significantly increases the
performance of the query, allowing you to do several "1000" chunk in
one request, better, but still inefficient for counting unless the
datasets stays within few thousands.

Sebastien

On Aug 12, 2:12 pm, Neal Walters  wrote:
> Sboire,
>   There is an "offset" parm on the fetch, so yes, you can get 1000
> records at a time in a loop.
> I believe however this is discouraged because it will eat up your CPU
> quota, and potentially you could hit other limits and quotas.  Imagine
> if you had 5 million records.  Reading 1000 at a time would take 5000
> calls.  Even on a MySQL database with PHP for example, you would
> probably hit the various CPU limits per task reading so many records
> in one round-trip from the client to the server.
>
> Neal Walters
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-12 Thread Neal Walters

Sboire,
  There is an "offset" parm on the fetch, so yes, you can get 1000
records at a time in a loop.
I believe however this is discouraged because it will eat up your CPU
quota, and potentially you could hit other limits and quotas.  Imagine
if you had 5 million records.  Reading 1000 at a time would take 5000
calls.  Even on a MySQL database with PHP for example, you would
probably hit the various CPU limits per task reading so many records
in one round-trip from the client to the server.

Neal Walters

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-12 Thread sboire

I'm new to Python and GAE too, so bear with me if I'm saying something
stupid. I read about how backups of whole GEA databases were made and
the technique involves using both ordering and filtering to achieve an
iterator on a large list.

I principle it goes like this, you order the list, let's say in your
case by datetime, than you read the first thousand records. Then after
to  "requery" the the store, but by filtering for dates greater than
(or lower than) the last date your processed. I'm not  sure how
deterministic the store sort is for entries with the same value (i.e.
datetime) and if duplicate are possible for you application. Maybe
ordering by key value would be more deterministic, since they are
unique.

Sebastien

On Aug 12, 5:22 am, "Nick Johnson (Google)" 
wrote:
> Hi Adam,
>
>
>
>
>
> On Tue, Aug 11, 2009 at 5:12 AM, ajacks504 wrote:
>
> > Hi All,
>
> > I'm really new to python and databases period, so please take it easy
> > on me, im trying to learn.  im trying to roll my own energy monitor
> > and i cant really figure out how to get the number of entities in my
> > data store.
>
> > class PowerData(db.Model):
> >  date = db.DateTimeProperty(auto_now_add=True)         # timestamp
> >  kw = db.FloatProperty()                                                    
> >    # current kilowatt data (0.01 scale)
>
> > my database will have more than 1000 entires in it quickly.  i want to
> > grab 1000 data points and send it to the google annotated timeline
> > display, but i know my query will be limited to 1000 results.  I want
> > to write a query that will return the number of entities in my
> > datastore, then write a function that decimates (get every Nth one)
> > the data based on a hop factor that will give me less than 100 data
> > points.
>
> > 1- i have no idea how to get the full size of the datastore if its
> > greater than 1000 entities
>
> There's no efficient way to do this, simply because there's no way to
> count something without spending O(n) time doing it. You need to
> maintain a count of entities in another entity, and update it when you
> add new records.
>
> > 2- i dont know how to grab the decimated data in a way that wont be
> > computationaly expensive
>
> You could add a property that you set randomly to a number between 0
> and 9. Fetching every record where that property equals (for example)
> 0 will get you 1/10 the records.
>
> Alternately, you can have a task that 'rolls up' every 10 entities
> into one aggregate record, using the task queue or otherwise.
>
>
>
> > im not looking for handout code, just throw me a bone for something to
> > read up on?
>
> > is this a decent way to structure the data base?  should i change it
> > before i got further?
>
> > thanks,
> > adam
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-12 Thread Nick Johnson (Google)

Hi Adam,

On Tue, Aug 11, 2009 at 5:12 AM, ajacks504 wrote:
>
> Hi All,
>
> I'm really new to python and databases period, so please take it easy
> on me, im trying to learn.  im trying to roll my own energy monitor
> and i cant really figure out how to get the number of entities in my
> data store.
>
> class PowerData(db.Model):
>  date = db.DateTimeProperty(auto_now_add=True)         # timestamp
>  kw = db.FloatProperty()                                                      
>  # current kilowatt data (0.01 scale)
>
> my database will have more than 1000 entires in it quickly.  i want to
> grab 1000 data points and send it to the google annotated timeline
> display, but i know my query will be limited to 1000 results.  I want
> to write a query that will return the number of entities in my
> datastore, then write a function that decimates (get every Nth one)
> the data based on a hop factor that will give me less than 100 data
> points.
>
> 1- i have no idea how to get the full size of the datastore if its
> greater than 1000 entities

There's no efficient way to do this, simply because there's no way to
count something without spending O(n) time doing it. You need to
maintain a count of entities in another entity, and update it when you
add new records.

> 2- i dont know how to grab the decimated data in a way that wont be
> computationaly expensive

You could add a property that you set randomly to a number between 0
and 9. Fetching every record where that property equals (for example)
0 will get you 1/10 the records.

Alternately, you can have a task that 'rolls up' every 10 entities
into one aggregate record, using the task queue or otherwise.

>
> im not looking for handout code, just throw me a bone for something to
> read up on?
>
> is this a decent way to structure the data base?  should i change it
> before i got further?
>
> thanks,
> adam
>
> >
>



-- 
Nick Johnson, Developer Programs Engineer, App Engine

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Number of Entities in DataStore

2009-08-11 Thread NealWalters

Google Data Store doesn't offer a record count of all rows of a "Kind"
on BigTable; the typical recommendation is to keep your own counter up-
to-date each time you store a record (assuming you really need it).
If you are worried about concurrency when doing this, there is a
"sharding" technique here:
http://code.google.com/appengine/articles/sharding_counters.html

Neal Walters


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---