[google-appengine] Re: Efficient way to structure my data model

2009-06-24 Thread ecognium

Thanks very much! I think I understand indexing with Lists much better
now...
 In the case of your index with 4 occurrences, this will rather be 6C4.
Yup, I was looking at the max situation when I use 3 categ fields.
Need to be careful as it grows pretty quickly.


On Jun 24, 4:19 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 On Tue, Jun 23, 2009 at 6:11 PM, ecognium ecogn...@gmail.com wrote:

  Thanks again - this is very helpful. I will let you know if i run into
  any future index creation errors as it could have been caused by any
  number of other entries - i mistakenly thought it was all these categ
  list-based entries.

  So if i understand it right even with a 10 element list for keywords,
  there will only be 10 rows when 4 categ fields are used.

 Correct - 4C4 * 10C1 index entries for the custom index you specified
 earlier.

  In the event
  I use  'categ' only once in my query along with keywords field, it
  will have up to 40 rows (10 from keywords and 4C1 from categ list).

 Correct.

  Am
  I adding these up right?

  I do not see myself going beyond 6 elements in the categ list at this
  point (I guess the max will be 6C3 = 20 under such a situation).

 In the case of your index with 4 occurrences, this will rather be 6C4.



  The
  keyword list will be probably go into the 20s but do not see anything
  beyond that and will always be used only once in the query.

  Thanks,
  -e

  On Jun 23, 3:53 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
   Hi ecognium,

   On Tue, Jun 23, 2009 at 1:35 AM, ecognium ecogn...@gmail.com wrote:

Thanks, Nick. Let me make sure I understand your comment correctly.
Suppose I have the following data:

ID      BlobProp1       BlobProp2-N     Keywords
 Categ
=
123     blah                    blah                    tag1,tag2,tag3
 Circle,
Red,  Large, Dotted
345     blah                    blah                    tag3,tag4,tag5
Square, Blue, Small, Solid
678     blah                    blah                    tag1,tag3,tag4
Circle, Blue, Small, Solid

  -

The field categ (list) contains four different types - Shape, Color,
Size and Line Type. Suppose the user wants to retrieve all entities
that are Small Dotted Blue Circles then the query will be:

Select * From MyModel where categ = Circle AND categ = Small AND
categ = Blue AND categ = Dotted

When I was reading about exploding indexes the example indicated the
issue was due to Cartesian product of two list elements. I thought the
same will hold true with one list field when used multiple times in a
query.

   That is indeed true, though it's not quite the cartesian product - the
   datastore won't bother indexing (Circle, Circle, Circle, Circle), or
   (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique
  combination,
   which is a substantially smaller number than the cartesian product. It's
   still only tractable for small lists, though, such as the 4 item lists
   you're dealing with.

   Are you saying the above query will not need {Circle, Red,

Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
number of index entities for entity ID=123?

   Correct - if you're not specifying a sort order, you can execute the
  query
   without any composite indexes whatsoever. The datastore satisfies
   equality-only queries using a merge join strategy.

I was getting index errors
when I was using the categ list property four times in my index
specification and that's why I was wondering if I should restructure
things.

   How many items did you have in the list you were indexing in that case?
  If
   your list has 4 items and your index specification lists it 4 times, you
   should only get one index entry.

   so I am guessing the following spec should not cause any index

issues in the future?

   Again, that depends on the number of entries in the 'categ' list. With 4
   entries, this will only generate a single index entry, but the number of
   entries will expand exponentially as the list increases in size.

   -Nick Johnson

- kind: MyModel
 properties:
 - name: categ
 - name: categ
 - name: categ
 - name: categ
 - name: keywords
 - name: __key__   # used for paging

Thanks,
-e

On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 If I understand your problem correctly, every entity will have 0-4
entries
 in the 'categ' list, corresponding to the values for each of 4
  categories
 (eg, Color, Size, Shape, etc)?

 The sample query you give, with only equality filters, will be
satisfiable
 using the merge join query planner, which doesn't require custom
  indexes

[google-appengine] Re: Efficient way to structure my data model

2009-06-23 Thread ecognium

Thanks again - this is very helpful. I will let you know if i run into
any future index creation errors as it could have been caused by any
number of other entries - i mistakenly thought it was all these categ
list-based entries.

So if i understand it right even with a 10 element list for keywords,
there will only be 10 rows when 4 categ fields are used. In the event
I use  'categ' only once in my query along with keywords field, it
will have up to 40 rows (10 from keywords and 4C1 from categ list). Am
I adding these up right?

I do not see myself going beyond 6 elements in the categ list at this
point (I guess the max will be 6C3 = 20 under such a situation). The
keyword list will be probably go into the 20s but do not see anything
beyond that and will always be used only once in the query.

Thanks,
-e

On Jun 23, 3:53 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,



 On Tue, Jun 23, 2009 at 1:35 AM, ecognium ecogn...@gmail.com wrote:

  Thanks, Nick. Let me make sure I understand your comment correctly.
  Suppose I have the following data:

  ID      BlobProp1       BlobProp2-N     Keywords
   Categ
  =
  123     blah                    blah                    tag1,tag2,tag3
   Circle,
  Red,  Large, Dotted
  345     blah                    blah                    tag3,tag4,tag5
  Square, Blue, Small, Solid
  678     blah                    blah                    tag1,tag3,tag4
  Circle, Blue, Small, Solid

  -

  The field categ (list) contains four different types - Shape, Color,
  Size and Line Type. Suppose the user wants to retrieve all entities
  that are Small Dotted Blue Circles then the query will be:

  Select * From MyModel where categ = Circle AND categ = Small AND
  categ = Blue AND categ = Dotted

  When I was reading about exploding indexes the example indicated the
  issue was due to Cartesian product of two list elements. I thought the
  same will hold true with one list field when used multiple times in a
  query.

 That is indeed true, though it's not quite the cartesian product - the
 datastore won't bother indexing (Circle, Circle, Circle, Circle), or
 (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination,
 which is a substantially smaller number than the cartesian product. It's
 still only tractable for small lists, though, such as the 4 item lists
 you're dealing with.

 Are you saying the above query will not need {Circle, Red,

  Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
  number of index entities for entity ID=123?

 Correct - if you're not specifying a sort order, you can execute the query
 without any composite indexes whatsoever. The datastore satisfies
 equality-only queries using a merge join strategy.

  I was getting index errors
  when I was using the categ list property four times in my index
  specification and that's why I was wondering if I should restructure
  things.

 How many items did you have in the list you were indexing in that case? If
 your list has 4 items and your index specification lists it 4 times, you
 should only get one index entry.

 so I am guessing the following spec should not cause any index

  issues in the future?

 Again, that depends on the number of entries in the 'categ' list. With 4
 entries, this will only generate a single index entry, but the number of
 entries will expand exponentially as the list increases in size.

 -Nick Johnson





  - kind: MyModel
   properties:
   - name: categ
   - name: categ
   - name: categ
   - name: categ
   - name: keywords
   - name: __key__   # used for paging

  Thanks,
  -e

  On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
   Hi ecognium,

   If I understand your problem correctly, every entity will have 0-4
  entries
   in the 'categ' list, corresponding to the values for each of 4 categories
   (eg, Color, Size, Shape, etc)?

   The sample query you give, with only equality filters, will be
  satisfiable
   using the merge join query planner, which doesn't require custom indexes,
  so
   you won't have high indexing overhead. There will simply be one index
  entry
   for each item in each list.

   If you do need custom indexes, the number of index entries, isn't 4^4, as
   you suggest, but rather smaller. Assuming you want to be able to query
  with
   any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
   (depending on if the 0-category case requires its own index), and the
  total
   number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 =
  15.
   For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1
  =
   63, which is still a not-unreasonable number.

   -Nick Johnson

   On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote:

Hi All,

   I would like to get

[google-appengine] Efficient way to structure my data model

2009-06-22 Thread ecognium

Hi All,

I would like to get your opinion on the best way to structure my
data model.
My app allows the users to filter the entities by four category types
(say A,B,C,D). Each category can have multiple values (for e.g.,
category type A can have values 1,2,3) but the
user can  choose only one value per category for filtering.  Please
note the values are unique across the category types as well. I could
create four fields corresponding to the four types but it does not
allow me to expand to more categories later easily. Right now, I just
use one list field to store the different values as it is easy to add
more category types later on.

My model (simplified) looks like this:



class Example(db.Model):

categ= db.StringListProperty()

keywords = db.StringListProperty()



The field keywords will have about 10-20 values for each entity. For
the above example, categ will have up to 4 values. Since I allow for
filtering on 4 category types, the index table gets large with
unnecessary values. The filtering logic looks like:
keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
categ = '99'

 Since there are 4 values in the categ list property, there will be
4^4 rows created in the index table (most of them will never be hit
due to the uniqueness guaranteed by design). Multiply it by the number
of values in the keywords table, the index table gets large very
quickly.

I would like to avoid creating multiple fields if possible because
when I want to make the number of category types to six, I would have
to change the underlying model and all the filtering code. Any
suggestions on how to construct the model such that it will allow for
ease of expansion in category types yet still not create large index
tables? I know there is a Category Property but not sure if it really
provides any specific benefit here.

Thanks!
-e
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Efficient way to structure my data model

2009-06-22 Thread ecognium

Thanks, Nick. Let me make sure I understand your comment correctly.
Suppose I have the following data:

ID  BlobProp1   BlobProp2-N KeywordsCateg
=
123 blahblahtag1,tag2,tag3  Circle,
Red,  Large, Dotted
345 blahblahtag3,tag4,tag5
Square, Blue, Small, Solid
678 blahblahtag1,tag3,tag4
Circle, Blue, Small, Solid
-

The field categ (list) contains four different types - Shape, Color,
Size and Line Type. Suppose the user wants to retrieve all entities
that are Small Dotted Blue Circles then the query will be:

Select * From MyModel where categ = Circle AND categ = Small AND
categ = Blue AND categ = Dotted

When I was reading about exploding indexes the example indicated the
issue was due to Cartesian product of two list elements. I thought the
same will hold true with one list field when used multiple times in a
query. Are you saying the above query will not need {Circle, Red,
Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
number of index entities for entity ID=123? I was getting index errors
when I was using the categ list property four times in my index
specification and that's why I was wondering if I should restructure
things. so I am guessing the following spec should not cause any index
issues in the future?

- kind: MyModel
  properties:
  - name: categ
  - name: categ
  - name: categ
  - name: categ
  - name: keywords
  - name: __key__   # used for paging

Thanks,
-e


On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 If I understand your problem correctly, every entity will have 0-4 entries
 in the 'categ' list, corresponding to the values for each of 4 categories
 (eg, Color, Size, Shape, etc)?

 The sample query you give, with only equality filters, will be satisfiable
 using the merge join query planner, which doesn't require custom indexes, so
 you won't have high indexing overhead. There will simply be one index entry
 for each item in each list.

 If you do need custom indexes, the number of index entries, isn't 4^4, as
 you suggest, but rather smaller. Assuming you want to be able to query with
 any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
 (depending on if the 0-category case requires its own index), and the total
 number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15.
 For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 =
 63, which is still a not-unreasonable number.

 -Nick Johnson



 On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote:

  Hi All,

     I would like to get your opinion on the best way to structure my
  data model.
  My app allows the users to filter the entities by four category types
  (say A,B,C,D). Each category can have multiple values (for e.g.,
  category type A can have values 1,2,3) but the
  user can  choose only one value per category for filtering.  Please
  note the values are unique across the category types as well. I could
  create four fields corresponding to the four types but it does not
  allow me to expand to more categories later easily. Right now, I just
  use one list field to store the different values as it is easy to add
  more category types later on.

  My model (simplified) looks like this:

  class Example(db.Model):

     categ        = db.StringListProperty()

     keywords = db.StringListProperty()

  The field keywords will have about 10-20 values for each entity. For
  the above example, categ will have up to 4 values. Since I allow for
  filtering on 4 category types, the index table gets large with
  unnecessary values. The filtering logic looks like:
  keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
  categ = '99'

   Since there are 4 values in the categ list property, there will be
  4^4 rows created in the index table (most of them will never be hit
  due to the uniqueness guaranteed by design). Multiply it by the number
  of values in the keywords table, the index table gets large very
  quickly.

  I would like to avoid creating multiple fields if possible because
  when I want to make the number of category types to six, I would have
  to change the underlying model and all the filtering code. Any
  suggestions on how to construct the model such that it will allow for
  ease of expansion in category types yet still not create large index
  tables? I know there is a Category Property but not sure if it really
  provides any specific benefit here.

  Thanks!
  -e

 --
 Nick Johnson, App Engine Developer Programs Engineer
 Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
 368047

[google-appengine] Re: Efficient way to structure my data model

2009-06-22 Thread ecognium

Thanks, Nick. Let me make sure I understand your comment correctly.
Suppose I have the following data:

ID  Blob1 Blob2-N Keywords  Categ

123 blah  blah  tag1,tag2,tag3  Circle,Red,  Large, Dotted
345 blah  blah  tag3,tag4,tag5  Square, Blue, Small, Solid
678 blah  blah  tag1,tag3,tag4  Circle, Blue, Small, Solid
--

The field categ (list) contains four different types - Shape, Color,
Size and Line Type. Suppose the user wants to retrieve all entities
that are Small Dotted Blue Circles then the query will be:

Select * From MyModel where categ = Circle AND categ = Small AND
categ = Blue AND categ = Dotted

When I was reading about exploding indexes the example indicated the
issue was due to Cartesian product of two list elements. I thought the
same will hold true with one list field when used multiple times in a
query. Are you saying the above query will not need {Circle, Red,
Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
number of index entities for entity ID=123? I was getting index errors
when I was using the categ list property four times in my index
specification and that's why I was wondering if I should restructure
things. so I am guessing the following spec should not cause any index
issues in the future?

- kind: MyModel
  properties:
  - name: categ
  - name: categ
  - name: categ
  - name: categ
  - name: keywords
  - name: __key__   # used for paging

Thanks,
-e

On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 If I understand your problem correctly, every entity will have 0-4 entries
 in the 'categ' list, corresponding to the values for each of 4 categories
 (eg, Color, Size, Shape, etc)?

 The sample query you give, with only equality filters, will be satisfiable
 using the merge join query planner, which doesn't require custom indexes, so
 you won't have high indexing overhead. There will simply be one index entry
 for each item in each list.

 If you do need custom indexes, the number of index entries, isn't 4^4, as
 you suggest, but rather smaller. Assuming you want to be able to query with
 any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
 (depending on if the 0-category case requires its own index), and the total
 number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15.
 For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 =
 63, which is still a not-unreasonable number.

 -Nick Johnson



 On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote:

  Hi All,

     I would like to get your opinion on the best way to structure my
  data model.
  My app allows the users to filter the entities by four category types
  (say A,B,C,D). Each category can have multiple values (for e.g.,
  category type A can have values 1,2,3) but the
  user can  choose only one value per category for filtering.  Please
  note the values are unique across the category types as well. I could
  create four fields corresponding to the four types but it does not
  allow me to expand to more categories later easily. Right now, I just
  use one list field to store the different values as it is easy to add
  more category types later on.

  My model (simplified) looks like this:

  class Example(db.Model):

     categ        = db.StringListProperty()

     keywords = db.StringListProperty()

  The field keywords will have about 10-20 values for each entity. For
  the above example, categ will have up to 4 values. Since I allow for
  filtering on 4 category types, the index table gets large with
  unnecessary values. The filtering logic looks like:
  keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
  categ = '99'

   Since there are 4 values in the categ list property, there will be
  4^4 rows created in the index table (most of them will never be hit
  due to the uniqueness guaranteed by design). Multiply it by the number
  of values in the keywords table, the index table gets large very
  quickly.

  I would like to avoid creating multiple fields if possible because
  when I want to make the number of category types to six, I would have
  to change the underlying model and all the filtering code. Any
  suggestions on how to construct the model such that it will allow for
  ease of expansion in category types yet still not create large index
  tables? I know there is a Category Property but not sure if it really
  provides any specific benefit here.

  Thanks!
  -e

 --
 Nick Johnson, App Engine Developer Programs Engineer
 Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
 368047
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe

[google-appengine] Re: Accessing SimpleDB from App Engine - any latency issues?

2009-06-21 Thread ecognium

Thanks Barry/Nick.  I was not aware of either projects and looks like
the geomodel project will probably be a better fit for what I am
trying to do. I will try to integrate it to see if it works for me.

Thanks again!

On Jun 18, 2:51 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,



 On Wed, Jun 17, 2009 at 9:02 PM, ecognium ecogn...@gmail.com wrote:

  Thanks, Nick. Yes I am already using a similar approach to paging. I
  did not know about this module, which probably can replace what i have
  written so will definitely look into that. Btw, the link to the
  pager.py file is down so here the google cache link to the required
  class:

 http://74.125.155.132/search?q=cache:23LPEeO2oHIJ:https://bitbucket.o...

  I also grabbed the source and put it on drop.io:http://drop.io/pagerquery

  Regarding the zip based filtering I was trying to do something like:
  assuming I know the lat/long of the zip code of interest then this
  simple query will get nearby zip codes (square region), which can be
  filtered down by using a great circle algo...

  SELECT * FROM ZipData WHERE latitude = x AND latitude = y AND
  longitude = r AND longitude = s

  As I understand the above query cannot be executed on Datastore due to
  the multiple inequality operator use. If you think there is another
  way to get the same result (that will work with datstore), please let
  me know.

 This query can't be executed efficiently with standard indexes on _any_
 database. A relational database using standard indexes will satisfy this
 query by doing an index scan on one inequality (eg, the latitude), then
 filtering by the other (eg, the longitude). In the worst case, this entails,
 for example, retrieving and filtering a slice of the entire west cost of the
 US, or everything near the equator, just to get a few points in one city!
 You can, of course, do this in App Engine if you want, but there's a better
 solution...

 The solution - both for relational databases and App Engine - is to use
 spatial indexing. There are a number of Python libraries that provide
 spatial indexing for App Engine. The best is undoubtedly Roman Nurik's
 geomodel:http://code.google.com/p/geomodel/

 -Nick Johnson





  #2:
  Yup, I meant just keeping the session.  I did not think of memcache
  and was just thinking about how the application itself is cached. It
  makes more sense just to use memcache so I can control the process.
  Thanks for your suggestion.

  -e

  On Jun 17, 3:09 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
   Hi ecognium,

   On Wed, Jun 17, 2009 at 9:17 AM, ecognium ecogn...@gmail.com wrote:

Hi All,
    My application requires certain types of query features that are
not currently possible through Datastore API and so I am thinking of
moving the query side of things to Anazon's SimpleDB (mainly use it to
return keys). For example, I would like to return entities that are
within a certain zip code range while supporting pagination. Since key
based paging takes the only inequality operator allowed, it is not
possible to do the traditional zip based retrieval.  Even without
paging there is no way to have two inequality operator.  Hence the
switch to SimpleDB -- let me know if there are any nice workarounds
for zip-based data retrieval.

   You may want to check this out:
 http://appengine-cookbook.appspot.com/recipe/efficient-paging-for-any...

I have two questions for App Engine members:

1. Have you noticed any major latency issues in accessing SimpleDB
from App Engine (thinking of using Boto module)? If so, any tips on
how to reduce it?

   I haven't personally used SimpleDB, but the same caveats apply as with
  any
   other service accessed over HTTP - latency is dependent on the service
  and
   its proximity.

2. When I tested SimpleDB from my dev machine, I noticed SimpleDB
takes up to 5 seconds to return results -- most of the time is
actually spent in authorizing the request. So I would like to initiate
the connection once in the app and reuse the object for all subsequent
queries. Where should I do this initialization? I am not familiar with
how App Engine caches the application. Should I create a
amazon_login.py, include the logic for auth (two lines of code) and
import the file in my code? or do I need to put it in a class and
instantiate the class in the same file?

   When you say initiate the connection once, are you referring to an
  actual
   TCP connection, or to a 'session'? I presume the latter, since SimpleDB
  is
   HTTP based.

   urlfetch doesn't let you control the lifetime of the underlying TCP
   connection for HTTP requests. If you're obtaining an authentication
  token,
   though, you can certainly do that on the first request to a given
  runtime,
   and then cache the result in a global or class-level variable, or cache
  it
   in memcache and reuse it across multiple

[google-appengine] Accessing SimpleDB from App Engine - any latency issues?

2009-06-17 Thread ecognium

Hi All,
 My application requires certain types of query features that are
not currently possible through Datastore API and so I am thinking of
moving the query side of things to Anazon's SimpleDB (mainly use it to
return keys). For example, I would like to return entities that are
within a certain zip code range while supporting pagination. Since key
based paging takes the only inequality operator allowed, it is not
possible to do the traditional zip based retrieval.  Even without
paging there is no way to have two inequality operator.  Hence the
switch to SimpleDB -- let me know if there are any nice workarounds
for zip-based data retrieval.

I have two questions for App Engine members:

1. Have you noticed any major latency issues in accessing SimpleDB
from App Engine (thinking of using Boto module)? If so, any tips on
how to reduce it?

2. When I tested SimpleDB from my dev machine, I noticed SimpleDB
takes up to 5 seconds to return results -- most of the time is
actually spent in authorizing the request. So I would like to initiate
the connection once in the app and reuse the object for all subsequent
queries. Where should I do this initialization? I am not familiar with
how App Engine caches the application. Should I create a
amazon_login.py, include the logic for auth (two lines of code) and
import the file in my code? or do I need to put it in a class and
instantiate the class in the same file?

Thanks
-e
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Accessing SimpleDB from App Engine - any latency issues?

2009-06-17 Thread ecognium

Thanks, Nick. Yes I am already using a similar approach to paging. I
did not know about this module, which probably can replace what i have
written so will definitely look into that. Btw, the link to the
pager.py file is down so here the google cache link to the required
class:

http://74.125.155.132/search?q=cache:23LPEeO2oHIJ:https://bitbucket.org/moraes/appengine/src/tip/pager.py+PagerQuery+app+enginecd=5hl=enct=clnkgl=usclient=firefox-a

I also grabbed the source and put it on drop.io: http://drop.io/pagerquery

Regarding the zip based filtering I was trying to do something like:
assuming I know the lat/long of the zip code of interest then this
simple query will get nearby zip codes (square region), which can be
filtered down by using a great circle algo...

SELECT * FROM ZipData WHERE latitude = x AND latitude = y AND
longitude = r AND longitude = s

As I understand the above query cannot be executed on Datastore due to
the multiple inequality operator use. If you think there is another
way to get the same result (that will work with datstore), please let
me know.

#2:
Yup, I meant just keeping the session.  I did not think of memcache
and was just thinking about how the application itself is cached. It
makes more sense just to use memcache so I can control the process.
Thanks for your suggestion.

-e

On Jun 17, 3:09 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 On Wed, Jun 17, 2009 at 9:17 AM, ecognium ecogn...@gmail.com wrote:

  Hi All,
      My application requires certain types of query features that are
  not currently possible through Datastore API and so I am thinking of
  moving the query side of things to Anazon's SimpleDB (mainly use it to
  return keys). For example, I would like to return entities that are
  within a certain zip code range while supporting pagination. Since key
  based paging takes the only inequality operator allowed, it is not
  possible to do the traditional zip based retrieval.  Even without
  paging there is no way to have two inequality operator.  Hence the
  switch to SimpleDB -- let me know if there are any nice workarounds
  for zip-based data retrieval.

 You may want to check this 
 out:http://appengine-cookbook.appspot.com/recipe/efficient-paging-for-any...



  I have two questions for App Engine members:

  1. Have you noticed any major latency issues in accessing SimpleDB
  from App Engine (thinking of using Boto module)? If so, any tips on
  how to reduce it?

 I haven't personally used SimpleDB, but the same caveats apply as with any
 other service accessed over HTTP - latency is dependent on the service and
 its proximity.



  2. When I tested SimpleDB from my dev machine, I noticed SimpleDB
  takes up to 5 seconds to return results -- most of the time is
  actually spent in authorizing the request. So I would like to initiate
  the connection once in the app and reuse the object for all subsequent
  queries. Where should I do this initialization? I am not familiar with
  how App Engine caches the application. Should I create a
  amazon_login.py, include the logic for auth (two lines of code) and
  import the file in my code? or do I need to put it in a class and
  instantiate the class in the same file?

 When you say initiate the connection once, are you referring to an actual
 TCP connection, or to a 'session'? I presume the latter, since SimpleDB is
 HTTP based.

 urlfetch doesn't let you control the lifetime of the underlying TCP
 connection for HTTP requests. If you're obtaining an authentication token,
 though, you can certainly do that on the first request to a given runtime,
 and then cache the result in a global or class-level variable, or cache it
 in memcache and reuse it across multiple instances.

 -Nick Johnson
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Bulk upload specifying a key name

2009-05-18 Thread ecognium

Thanks, Nick. It works!

On May 18, 5:00 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 You can do this by implementing the generate_key method in your
 bulkloader.Loader class. See the code for 
 details:http://code.google.com/p/googleappengine/source/browse/trunk/python/g...

 -Nick Johnson

 On Mon, May 18, 2009 at 4:54 AM, ecognium ecogn...@gmail.com wrote:

  Hi All,
     How do i specify a key name when using thebulkuploadtool? I am
  currently using WWW:: Mechanize touploadmy data one row at a time
  but would be nice if i can use thebulkloader as i don't want to
  reinvent the many options available in the tool.

  Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Bulk upload specifying a key name

2009-05-17 Thread ecognium

Hi All,
How do i specify a key name when using the bulk upload tool? I am
currently using WWW:: Mechanize to upload my data one row at a time
but would be nice if i can use the bulk loader as i don't want to
reinvent the many options available in the tool.

Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Getting Data Using Key vs. Key_Name

2009-05-13 Thread ecognium

Hello Everyone,
It looks like Model.get(keys) raises an exception (BadKeyError)
when the key does not exist even though it is not mentioned here
http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get.

On the other hand, Model.get_key_by_name(names) just returns None for
the keys that do not exist. If I understand it correctly keys and
key_names are not the same - i.e., i cannot use a key value inside the
get_by_key_name() call.. so how can i get data using key without
having to worry about determining which specific key was not valid?

I can create my own key names but I would like to use the default
uniqueness guarantee of the keys. If there is no easy way to deal with
this exception, is there an easy way to make the key_name same as the
key value provided by default?

In case you are wondering why the keys will be invalid: I am allowing
the users to bookmark certain objects.. so i store the key names in
the user prefs data... If the object referenced gets deleted then the
key information stored inside the userprefs will be outdated and when
they query for their bookmarks, it causes an issue.. I wanted to avoid
the whole process of going through every single user who may be using
the deleted key and remove  it real-time. so i thought i will just
handle it as a batch job...

Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Getting Data Using Key vs. Key_Name

2009-05-13 Thread ecognium

Thanks, Nick. You are right. I ran into some issues, which i tracked
it down to the key call but then I ended up  testing the keys using
values  '123' etc... So the BadKeyError is from my bad key request and
not unavailable key.


On May 13, 3:46 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi ecognium,

 Model.get does not return BadKeyError when attempting to fetch an
 entity that does not exist - it returns None, as documented.
 BadKeyError is raised when the key itself is invalid, because it lacks
 a name or id, or has an invalid name. You probably want to check that
 you're not generating invalid keys - such as one with a name that
 starts with a digit.

 -Nick Johnson

 On Wed, May 13, 2009 at 9:55 AM, ecognium ecogn...@gmail.com wrote:

  Hello Everyone,
     It looks like Model.get(keys) raises an exception (BadKeyError)
  when the key does not exist even though it is not mentioned here
 http://code.google.com/appengine/docs/python/datastore/modelclass.htm

  On the other hand, Model.get_key_by_name(names) just returns None for
  the keys that do not exist. If I understand it correctly keys and
  key_names are not the same - i.e., i cannot use a key value inside the
  get_by_key_name() call.. so how can i get data using key without
  having to worry about determining which specific key was not valid?

  I can create my own key names but I would like to use the default
  uniqueness guarantee of the keys. If there is no easy way to deal with
  this exception, is there an easy way to make the key_name same as the
  key value provided by default?

  In case you are wondering why the keys will be invalid: I am allowing
  the users to bookmark certain objects.. so i store the key names in
  the user prefs data... If the object referenced gets deleted then the
  key information stored inside the userprefs will be outdated and when
  they query for their bookmarks, it causes an issue.. I wanted to avoid
  the whole process of going through every single user who may be using
  the deleted key and remove  it real-time. so i thought i will just
  handle it as a batch job...

  Thanks!
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Image Resize Without Forcing Aspect Ratio

2009-05-02 Thread ecognium

Hello Everyone,
 It looks like the images.resize() always honors the aspect ratio.
Is there a way to force a certain dimension? For example, in
ImageMagick you can use an exclamation (!) at the end to override the
aspect ratio  Any suggestions will be appreciated.

Thanks!

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] GQL Query with IN operator Issue (bug or am i making a mistake?)

2009-04-20 Thread ecognium

Hello everyone, I noticed an odd behavior with GQL query when it has
two IN operators and a regular condition. Below is some basic code to
reproduce the problem:


class DummyData(db.Model):
x = db.StringListProperty()
y = db.TextProperty()

class Dummy(webapp.RequestHandler):
def get(self):
d = DummyData()
d.x = ['a', 'b','c']
d.y = test
d.put()
d = DummyData()
d.x = ['c', 'd','e']
d.y = test2
d.put()

q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and
x in ('a') )
results = q.fetch(10) # 10 instead of 2? - useful if you run
the test multiple times
for r in results:
self.response.headers['Content-Type'] = text/plain
self.response.out.write(x =  + ,.join(r.x) +  y =  +
r.y + \n)

When you run the above code you will see the following output:
x = a,b,c y = test

However when I replace the above query with the one below, I do not
get any  results (even though it should return the same result as
above):

# Note the addition of y = 'test'
q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in
('c') and x in ('a') )

Although here the IN conditions are the same as '=', my application
actually uses multiple list values.; I am just presenting a simpler
example.

If someone can confirm the issue, I can open a bug report for this.

Thanks!

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: GQL Query with IN operator Issue (bug or am i making a mistake?)

2009-04-20 Thread ecognium

Thanks, Andy. You are correct, my mistake. I saw some strange behavior
with the data that I was working with and I couldn't figure why the
return values made sense for the query. There I had StringProperty
instead of TextProperty. I thought I will construct a quick example
and made the mistake of using the wrong type in that process. That
said, I have made so many changes since this issue surfaced that I
have forgotten exactly which data values in the query caused the
problem. I will try to reproduce on the larger data that I have. If I
cannot then it is either a browser caching problem or it was too late
and was not thinking clearly :)

Thanks again!


On Apr 20, 7:21 am, Andy Freeman ana...@earthlink.net wrote:
 db.TextProperty is not an indexable property.  That means that it's
 not queryable either.

 It would be nice if to get an exception or some other indication of
 what's going on.

 However, note that indexable is something that happens in the
 datastore when an instance is store.  If you change a property from
 StringProperty to TextProperty or the reverse, strange things will
 probably happen.  (If you put some instances with StringProperty, I
 suspect that you can still successfully query for those instances
 using that property after you've switched to TextProperty.)

 On Apr 19, 1:24 am, ecognium ecogn...@gmail.com wrote:

  Hello everyone, I noticed an odd behavior with GQL query when it has
  two IN operators and a regular condition. Below is some basic code to
  reproduce the problem:

  class DummyData(db.Model):
          x = db.StringListProperty()
          y = db.TextProperty()

  class Dummy(webapp.RequestHandler):
      def get(self):
          d = DummyData()
          d.x = ['a', 'b','c']
          d.y = test
          d.put()
          d = DummyData()
          d.x = ['c', 'd','e']
          d.y = test2
          d.put()

          q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and
  x in ('a') )
          results = q.fetch(10) # 10 instead of 2? - useful if you run
  the test multiple times
          for r in results:
              self.response.headers['Content-Type'] = text/plain
              self.response.out.write(x =  + ,.join(r.x) +  y =  +
  r.y + \n)

  When you run the above code you will see the following output:
  x = a,b,c y = test

  However when I replace the above query with the one below, I do not
  get any  results (even though it should return the same result as
  above):

  # Note the addition of y = 'test'
  q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in
  ('c') and x in ('a') )

  Although here the IN conditions are the same as '=', my application
  actually uses multiple list values.; I am just presenting a simpler
  example.

  If someone can confirm the issue, I can open a bug report for this.

  Thanks!


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: GQL Query with IN operator Issue (bug or am i making a mistake?)

2009-04-20 Thread ecognium

I think I found it - didn't expect to get lucky so quickly...

class DummyData(db.Model):
x = db.StringListProperty()
y = db.StringProperty()


class Dummy(webapp.RequestHandler):
def get(self):

d = DummyData()
d.x = ['a','b','c']
d.y = 'test'
d.put()
d = DummyData()
d.x = ['c','d','e']
d.y = 'test'
d.put()
d.x = ['r','s','t']
d.y = 'test2'
d.put()


q = db.GqlQuery(SELECT * FROM DummyData where  x in ('c') and
x in ('r')  )
results = q.fetch(10)
self.response.headers['Content-Type'] = 'text/plain'
for r in results:
self.response.out.write (x =  + ,.join(r.x) +  y = 
+ r.y + \n)


Now when you run the above code you should 'not' get any results.
However, the query returns the 3rd record. If you switch the order of
the conditions, you get the first two records. My guess is since
multiple IN operators will create a combinatorial problem, Gql keeps
only the last IN operator. I did not see such a thing being
documented, did you see anything to that effect?

For my app I would like to use multiple IN operators and can still be
within the 30 query limit. I am trying to filter the results by
categories and also provide pagination. If I decide to filter
everything on the client side, I will have to send all the data to the
client which will be inefficient. When I tried to split the query into
multiple chunks, it becomes really  hard to keep track of previous
state. Maybe I should post it a separate question.

Please do let me know if you see the same issue. If I have made
another mistake then I am probably going crazy and stop worrying about
this issue :)

Thanks!

On Apr 20, 7:21 am, Andy Freeman ana...@earthlink.net wrote:
 db.TextProperty is not an indexable property.  That means that it's
 not queryable either.

 It would be nice if to get an exception or some other indication of
 what's going on.

 However, note that indexable is something that happens in the
 datastore when an instance is store.  If you change a property from
 StringProperty to TextProperty or the reverse, strange things will
 probably happen.  (If you put some instances with StringProperty, I
 suspect that you can still successfully query for those instances
 using that property after you've switched to TextProperty.)

 On Apr 19, 1:24 am, ecognium ecogn...@gmail.com wrote:

  Hello everyone, I noticed an odd behavior with GQL query when it has
  two IN operators and a regular condition. Below is some basic code to
  reproduce the problem:

  class DummyData(db.Model):
          x = db.StringListProperty()
          y = db.TextProperty()

  class Dummy(webapp.RequestHandler):
      def get(self):
          d = DummyData()
          d.x = ['a', 'b','c']
          d.y = test
          d.put()
          d = DummyData()
          d.x = ['c', 'd','e']
          d.y = test2
          d.put()

          q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and
  x in ('a') )
          results = q.fetch(10) # 10 instead of 2? - useful if you run
  the test multiple times
          for r in results:
              self.response.headers['Content-Type'] = text/plain
              self.response.out.write(x =  + ,.join(r.x) +  y =  +
  r.y + \n)

  When you run the above code you will see the following output:
  x = a,b,c y = test

  However when I replace the above query with the one below, I do not
  get any  results (even though it should return the same result as
  above):

  # Note the addition of y = 'test'
  q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in
  ('c') and x in ('a') )

  Although here the IN conditions are the same as '=', my application
  actually uses multiple list values.; I am just presenting a simpler
  example.

  If someone can confirm the issue, I can open a bug report for this.

  Thanks!


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---