[google-appengine] Re: Efficient way to structure my data model
Thanks very much! I think I understand indexing with Lists much better now... In the case of your index with 4 occurrences, this will rather be 6C4. Yup, I was looking at the max situation when I use 3 categ fields. Need to be careful as it grows pretty quickly. On Jun 24, 4:19 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Tue, Jun 23, 2009 at 6:11 PM, ecognium ecogn...@gmail.com wrote: Thanks again - this is very helpful. I will let you know if i run into any future index creation errors as it could have been caused by any number of other entries - i mistakenly thought it was all these categ list-based entries. So if i understand it right even with a 10 element list for keywords, there will only be 10 rows when 4 categ fields are used. Correct - 4C4 * 10C1 index entries for the custom index you specified earlier. In the event I use 'categ' only once in my query along with keywords field, it will have up to 40 rows (10 from keywords and 4C1 from categ list). Correct. Am I adding these up right? I do not see myself going beyond 6 elements in the categ list at this point (I guess the max will be 6C3 = 20 under such a situation). In the case of your index with 4 occurrences, this will rather be 6C4. The keyword list will be probably go into the 20s but do not see anything beyond that and will always be used only once in the query. Thanks, -e On Jun 23, 3:53 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Tue, Jun 23, 2009 at 1:35 AM, ecognium ecogn...@gmail.com wrote: Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID BlobProp1 BlobProp2-N Keywords Categ = 123 blah blah tag1,tag2,tag3 Circle, Red, Large, Dotted 345 blah blah tag3,tag4,tag5 Square, Blue, Small, Solid 678 blah blah tag1,tag3,tag4 Circle, Blue, Small, Solid - The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = Circle AND categ = Small AND categ = Blue AND categ = Dotted When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. That is indeed true, though it's not quite the cartesian product - the datastore won't bother indexing (Circle, Circle, Circle, Circle), or (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination, which is a substantially smaller number than the cartesian product. It's still only tractable for small lists, though, such as the 4 item lists you're dealing with. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? Correct - if you're not specifying a sort order, you can execute the query without any composite indexes whatsoever. The datastore satisfies equality-only queries using a merge join strategy. I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. How many items did you have in the list you were indexing in that case? If your list has 4 items and your index specification lists it 4 times, you should only get one index entry. so I am guessing the following spec should not cause any index issues in the future? Again, that depends on the number of entries in the 'categ' list. With 4 entries, this will only generate a single index entry, but the number of entries will expand exponentially as the list increases in size. -Nick Johnson - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, If I understand your problem correctly, every entity will have 0-4 entries in the 'categ' list, corresponding to the values for each of 4 categories (eg, Color, Size, Shape, etc)? The sample query you give, with only equality filters, will be satisfiable using the merge join query planner, which doesn't require custom indexes
[google-appengine] Re: Efficient way to structure my data model
Thanks again - this is very helpful. I will let you know if i run into any future index creation errors as it could have been caused by any number of other entries - i mistakenly thought it was all these categ list-based entries. So if i understand it right even with a 10 element list for keywords, there will only be 10 rows when 4 categ fields are used. In the event I use 'categ' only once in my query along with keywords field, it will have up to 40 rows (10 from keywords and 4C1 from categ list). Am I adding these up right? I do not see myself going beyond 6 elements in the categ list at this point (I guess the max will be 6C3 = 20 under such a situation). The keyword list will be probably go into the 20s but do not see anything beyond that and will always be used only once in the query. Thanks, -e On Jun 23, 3:53 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Tue, Jun 23, 2009 at 1:35 AM, ecognium ecogn...@gmail.com wrote: Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID BlobProp1 BlobProp2-N Keywords Categ = 123 blah blah tag1,tag2,tag3 Circle, Red, Large, Dotted 345 blah blah tag3,tag4,tag5 Square, Blue, Small, Solid 678 blah blah tag1,tag3,tag4 Circle, Blue, Small, Solid - The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = Circle AND categ = Small AND categ = Blue AND categ = Dotted When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. That is indeed true, though it's not quite the cartesian product - the datastore won't bother indexing (Circle, Circle, Circle, Circle), or (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination, which is a substantially smaller number than the cartesian product. It's still only tractable for small lists, though, such as the 4 item lists you're dealing with. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? Correct - if you're not specifying a sort order, you can execute the query without any composite indexes whatsoever. The datastore satisfies equality-only queries using a merge join strategy. I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. How many items did you have in the list you were indexing in that case? If your list has 4 items and your index specification lists it 4 times, you should only get one index entry. so I am guessing the following spec should not cause any index issues in the future? Again, that depends on the number of entries in the 'categ' list. With 4 entries, this will only generate a single index entry, but the number of entries will expand exponentially as the list increases in size. -Nick Johnson - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, If I understand your problem correctly, every entity will have 0-4 entries in the 'categ' list, corresponding to the values for each of 4 categories (eg, Color, Size, Shape, etc)? The sample query you give, with only equality filters, will be satisfiable using the merge join query planner, which doesn't require custom indexes, so you won't have high indexing overhead. There will simply be one index entry for each item in each list. If you do need custom indexes, the number of index entries, isn't 4^4, as you suggest, but rather smaller. Assuming you want to be able to query with any number of categories from 0 to 4, you'll need 3 or 4 custom indexes (depending on if the 0-category case requires its own index), and the total number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = 63, which is still a not-unreasonable number. -Nick Johnson On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote: Hi All, I would like to get
[google-appengine] Efficient way to structure my data model
Hi All, I would like to get your opinion on the best way to structure my data model. My app allows the users to filter the entities by four category types (say A,B,C,D). Each category can have multiple values (for e.g., category type A can have values 1,2,3) but the user can choose only one value per category for filtering. Please note the values are unique across the category types as well. I could create four fields corresponding to the four types but it does not allow me to expand to more categories later easily. Right now, I just use one list field to store the different values as it is easy to add more category types later on. My model (simplified) looks like this: class Example(db.Model): categ= db.StringListProperty() keywords = db.StringListProperty() The field keywords will have about 10-20 values for each entity. For the above example, categ will have up to 4 values. Since I allow for filtering on 4 category types, the index table gets large with unnecessary values. The filtering logic looks like: keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND categ = '99' Since there are 4 values in the categ list property, there will be 4^4 rows created in the index table (most of them will never be hit due to the uniqueness guaranteed by design). Multiply it by the number of values in the keywords table, the index table gets large very quickly. I would like to avoid creating multiple fields if possible because when I want to make the number of category types to six, I would have to change the underlying model and all the filtering code. Any suggestions on how to construct the model such that it will allow for ease of expansion in category types yet still not create large index tables? I know there is a Category Property but not sure if it really provides any specific benefit here. Thanks! -e --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Efficient way to structure my data model
Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID BlobProp1 BlobProp2-N KeywordsCateg = 123 blahblahtag1,tag2,tag3 Circle, Red, Large, Dotted 345 blahblahtag3,tag4,tag5 Square, Blue, Small, Solid 678 blahblahtag1,tag3,tag4 Circle, Blue, Small, Solid - The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = Circle AND categ = Small AND categ = Blue AND categ = Dotted When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. so I am guessing the following spec should not cause any index issues in the future? - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, If I understand your problem correctly, every entity will have 0-4 entries in the 'categ' list, corresponding to the values for each of 4 categories (eg, Color, Size, Shape, etc)? The sample query you give, with only equality filters, will be satisfiable using the merge join query planner, which doesn't require custom indexes, so you won't have high indexing overhead. There will simply be one index entry for each item in each list. If you do need custom indexes, the number of index entries, isn't 4^4, as you suggest, but rather smaller. Assuming you want to be able to query with any number of categories from 0 to 4, you'll need 3 or 4 custom indexes (depending on if the 0-category case requires its own index), and the total number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = 63, which is still a not-unreasonable number. -Nick Johnson On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote: Hi All, I would like to get your opinion on the best way to structure my data model. My app allows the users to filter the entities by four category types (say A,B,C,D). Each category can have multiple values (for e.g., category type A can have values 1,2,3) but the user can choose only one value per category for filtering. Please note the values are unique across the category types as well. I could create four fields corresponding to the four types but it does not allow me to expand to more categories later easily. Right now, I just use one list field to store the different values as it is easy to add more category types later on. My model (simplified) looks like this: class Example(db.Model): categ = db.StringListProperty() keywords = db.StringListProperty() The field keywords will have about 10-20 values for each entity. For the above example, categ will have up to 4 values. Since I allow for filtering on 4 category types, the index table gets large with unnecessary values. The filtering logic looks like: keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND categ = '99' Since there are 4 values in the categ list property, there will be 4^4 rows created in the index table (most of them will never be hit due to the uniqueness guaranteed by design). Multiply it by the number of values in the keywords table, the index table gets large very quickly. I would like to avoid creating multiple fields if possible because when I want to make the number of category types to six, I would have to change the underlying model and all the filtering code. Any suggestions on how to construct the model such that it will allow for ease of expansion in category types yet still not create large index tables? I know there is a Category Property but not sure if it really provides any specific benefit here. Thanks! -e -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
[google-appengine] Re: Efficient way to structure my data model
Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID Blob1 Blob2-N Keywords Categ 123 blah blah tag1,tag2,tag3 Circle,Red, Large, Dotted 345 blah blah tag3,tag4,tag5 Square, Blue, Small, Solid 678 blah blah tag1,tag3,tag4 Circle, Blue, Small, Solid -- The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = Circle AND categ = Small AND categ = Blue AND categ = Dotted When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. so I am guessing the following spec should not cause any index issues in the future? - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, If I understand your problem correctly, every entity will have 0-4 entries in the 'categ' list, corresponding to the values for each of 4 categories (eg, Color, Size, Shape, etc)? The sample query you give, with only equality filters, will be satisfiable using the merge join query planner, which doesn't require custom indexes, so you won't have high indexing overhead. There will simply be one index entry for each item in each list. If you do need custom indexes, the number of index entries, isn't 4^4, as you suggest, but rather smaller. Assuming you want to be able to query with any number of categories from 0 to 4, you'll need 3 or 4 custom indexes (depending on if the 0-category case requires its own index), and the total number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = 63, which is still a not-unreasonable number. -Nick Johnson On Mon, Jun 22, 2009 at 8:51 AM, ecognium ecogn...@gmail.com wrote: Hi All, I would like to get your opinion on the best way to structure my data model. My app allows the users to filter the entities by four category types (say A,B,C,D). Each category can have multiple values (for e.g., category type A can have values 1,2,3) but the user can choose only one value per category for filtering. Please note the values are unique across the category types as well. I could create four fields corresponding to the four types but it does not allow me to expand to more categories later easily. Right now, I just use one list field to store the different values as it is easy to add more category types later on. My model (simplified) looks like this: class Example(db.Model): categ = db.StringListProperty() keywords = db.StringListProperty() The field keywords will have about 10-20 values for each entity. For the above example, categ will have up to 4 values. Since I allow for filtering on 4 category types, the index table gets large with unnecessary values. The filtering logic looks like: keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND categ = '99' Since there are 4 values in the categ list property, there will be 4^4 rows created in the index table (most of them will never be hit due to the uniqueness guaranteed by design). Multiply it by the number of values in the keywords table, the index table gets large very quickly. I would like to avoid creating multiple fields if possible because when I want to make the number of category types to six, I would have to change the underlying model and all the filtering code. Any suggestions on how to construct the model such that it will allow for ease of expansion in category types yet still not create large index tables? I know there is a Category Property but not sure if it really provides any specific benefit here. Thanks! -e -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe
[google-appengine] Re: Accessing SimpleDB from App Engine - any latency issues?
Thanks Barry/Nick. I was not aware of either projects and looks like the geomodel project will probably be a better fit for what I am trying to do. I will try to integrate it to see if it works for me. Thanks again! On Jun 18, 2:51 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Wed, Jun 17, 2009 at 9:02 PM, ecognium ecogn...@gmail.com wrote: Thanks, Nick. Yes I am already using a similar approach to paging. I did not know about this module, which probably can replace what i have written so will definitely look into that. Btw, the link to the pager.py file is down so here the google cache link to the required class: http://74.125.155.132/search?q=cache:23LPEeO2oHIJ:https://bitbucket.o... I also grabbed the source and put it on drop.io:http://drop.io/pagerquery Regarding the zip based filtering I was trying to do something like: assuming I know the lat/long of the zip code of interest then this simple query will get nearby zip codes (square region), which can be filtered down by using a great circle algo... SELECT * FROM ZipData WHERE latitude = x AND latitude = y AND longitude = r AND longitude = s As I understand the above query cannot be executed on Datastore due to the multiple inequality operator use. If you think there is another way to get the same result (that will work with datstore), please let me know. This query can't be executed efficiently with standard indexes on _any_ database. A relational database using standard indexes will satisfy this query by doing an index scan on one inequality (eg, the latitude), then filtering by the other (eg, the longitude). In the worst case, this entails, for example, retrieving and filtering a slice of the entire west cost of the US, or everything near the equator, just to get a few points in one city! You can, of course, do this in App Engine if you want, but there's a better solution... The solution - both for relational databases and App Engine - is to use spatial indexing. There are a number of Python libraries that provide spatial indexing for App Engine. The best is undoubtedly Roman Nurik's geomodel:http://code.google.com/p/geomodel/ -Nick Johnson #2: Yup, I meant just keeping the session. I did not think of memcache and was just thinking about how the application itself is cached. It makes more sense just to use memcache so I can control the process. Thanks for your suggestion. -e On Jun 17, 3:09 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Wed, Jun 17, 2009 at 9:17 AM, ecognium ecogn...@gmail.com wrote: Hi All, My application requires certain types of query features that are not currently possible through Datastore API and so I am thinking of moving the query side of things to Anazon's SimpleDB (mainly use it to return keys). For example, I would like to return entities that are within a certain zip code range while supporting pagination. Since key based paging takes the only inequality operator allowed, it is not possible to do the traditional zip based retrieval. Even without paging there is no way to have two inequality operator. Hence the switch to SimpleDB -- let me know if there are any nice workarounds for zip-based data retrieval. You may want to check this out: http://appengine-cookbook.appspot.com/recipe/efficient-paging-for-any... I have two questions for App Engine members: 1. Have you noticed any major latency issues in accessing SimpleDB from App Engine (thinking of using Boto module)? If so, any tips on how to reduce it? I haven't personally used SimpleDB, but the same caveats apply as with any other service accessed over HTTP - latency is dependent on the service and its proximity. 2. When I tested SimpleDB from my dev machine, I noticed SimpleDB takes up to 5 seconds to return results -- most of the time is actually spent in authorizing the request. So I would like to initiate the connection once in the app and reuse the object for all subsequent queries. Where should I do this initialization? I am not familiar with how App Engine caches the application. Should I create a amazon_login.py, include the logic for auth (two lines of code) and import the file in my code? or do I need to put it in a class and instantiate the class in the same file? When you say initiate the connection once, are you referring to an actual TCP connection, or to a 'session'? I presume the latter, since SimpleDB is HTTP based. urlfetch doesn't let you control the lifetime of the underlying TCP connection for HTTP requests. If you're obtaining an authentication token, though, you can certainly do that on the first request to a given runtime, and then cache the result in a global or class-level variable, or cache it in memcache and reuse it across multiple
[google-appengine] Accessing SimpleDB from App Engine - any latency issues?
Hi All, My application requires certain types of query features that are not currently possible through Datastore API and so I am thinking of moving the query side of things to Anazon's SimpleDB (mainly use it to return keys). For example, I would like to return entities that are within a certain zip code range while supporting pagination. Since key based paging takes the only inequality operator allowed, it is not possible to do the traditional zip based retrieval. Even without paging there is no way to have two inequality operator. Hence the switch to SimpleDB -- let me know if there are any nice workarounds for zip-based data retrieval. I have two questions for App Engine members: 1. Have you noticed any major latency issues in accessing SimpleDB from App Engine (thinking of using Boto module)? If so, any tips on how to reduce it? 2. When I tested SimpleDB from my dev machine, I noticed SimpleDB takes up to 5 seconds to return results -- most of the time is actually spent in authorizing the request. So I would like to initiate the connection once in the app and reuse the object for all subsequent queries. Where should I do this initialization? I am not familiar with how App Engine caches the application. Should I create a amazon_login.py, include the logic for auth (two lines of code) and import the file in my code? or do I need to put it in a class and instantiate the class in the same file? Thanks -e --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Accessing SimpleDB from App Engine - any latency issues?
Thanks, Nick. Yes I am already using a similar approach to paging. I did not know about this module, which probably can replace what i have written so will definitely look into that. Btw, the link to the pager.py file is down so here the google cache link to the required class: http://74.125.155.132/search?q=cache:23LPEeO2oHIJ:https://bitbucket.org/moraes/appengine/src/tip/pager.py+PagerQuery+app+enginecd=5hl=enct=clnkgl=usclient=firefox-a I also grabbed the source and put it on drop.io: http://drop.io/pagerquery Regarding the zip based filtering I was trying to do something like: assuming I know the lat/long of the zip code of interest then this simple query will get nearby zip codes (square region), which can be filtered down by using a great circle algo... SELECT * FROM ZipData WHERE latitude = x AND latitude = y AND longitude = r AND longitude = s As I understand the above query cannot be executed on Datastore due to the multiple inequality operator use. If you think there is another way to get the same result (that will work with datstore), please let me know. #2: Yup, I meant just keeping the session. I did not think of memcache and was just thinking about how the application itself is cached. It makes more sense just to use memcache so I can control the process. Thanks for your suggestion. -e On Jun 17, 3:09 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, On Wed, Jun 17, 2009 at 9:17 AM, ecognium ecogn...@gmail.com wrote: Hi All, My application requires certain types of query features that are not currently possible through Datastore API and so I am thinking of moving the query side of things to Anazon's SimpleDB (mainly use it to return keys). For example, I would like to return entities that are within a certain zip code range while supporting pagination. Since key based paging takes the only inequality operator allowed, it is not possible to do the traditional zip based retrieval. Even without paging there is no way to have two inequality operator. Hence the switch to SimpleDB -- let me know if there are any nice workarounds for zip-based data retrieval. You may want to check this out:http://appengine-cookbook.appspot.com/recipe/efficient-paging-for-any... I have two questions for App Engine members: 1. Have you noticed any major latency issues in accessing SimpleDB from App Engine (thinking of using Boto module)? If so, any tips on how to reduce it? I haven't personally used SimpleDB, but the same caveats apply as with any other service accessed over HTTP - latency is dependent on the service and its proximity. 2. When I tested SimpleDB from my dev machine, I noticed SimpleDB takes up to 5 seconds to return results -- most of the time is actually spent in authorizing the request. So I would like to initiate the connection once in the app and reuse the object for all subsequent queries. Where should I do this initialization? I am not familiar with how App Engine caches the application. Should I create a amazon_login.py, include the logic for auth (two lines of code) and import the file in my code? or do I need to put it in a class and instantiate the class in the same file? When you say initiate the connection once, are you referring to an actual TCP connection, or to a 'session'? I presume the latter, since SimpleDB is HTTP based. urlfetch doesn't let you control the lifetime of the underlying TCP connection for HTTP requests. If you're obtaining an authentication token, though, you can certainly do that on the first request to a given runtime, and then cache the result in a global or class-level variable, or cache it in memcache and reuse it across multiple instances. -Nick Johnson --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Bulk upload specifying a key name
Thanks, Nick. It works! On May 18, 5:00 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, You can do this by implementing the generate_key method in your bulkloader.Loader class. See the code for details:http://code.google.com/p/googleappengine/source/browse/trunk/python/g... -Nick Johnson On Mon, May 18, 2009 at 4:54 AM, ecognium ecogn...@gmail.com wrote: Hi All, How do i specify a key name when using thebulkuploadtool? I am currently using WWW:: Mechanize touploadmy data one row at a time but would be nice if i can use thebulkloader as i don't want to reinvent the many options available in the tool. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Bulk upload specifying a key name
Hi All, How do i specify a key name when using the bulk upload tool? I am currently using WWW:: Mechanize to upload my data one row at a time but would be nice if i can use the bulk loader as i don't want to reinvent the many options available in the tool. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Getting Data Using Key vs. Key_Name
Hello Everyone, It looks like Model.get(keys) raises an exception (BadKeyError) when the key does not exist even though it is not mentioned here http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get. On the other hand, Model.get_key_by_name(names) just returns None for the keys that do not exist. If I understand it correctly keys and key_names are not the same - i.e., i cannot use a key value inside the get_by_key_name() call.. so how can i get data using key without having to worry about determining which specific key was not valid? I can create my own key names but I would like to use the default uniqueness guarantee of the keys. If there is no easy way to deal with this exception, is there an easy way to make the key_name same as the key value provided by default? In case you are wondering why the keys will be invalid: I am allowing the users to bookmark certain objects.. so i store the key names in the user prefs data... If the object referenced gets deleted then the key information stored inside the userprefs will be outdated and when they query for their bookmarks, it causes an issue.. I wanted to avoid the whole process of going through every single user who may be using the deleted key and remove it real-time. so i thought i will just handle it as a batch job... Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Getting Data Using Key vs. Key_Name
Thanks, Nick. You are right. I ran into some issues, which i tracked it down to the key call but then I ended up testing the keys using values '123' etc... So the BadKeyError is from my bad key request and not unavailable key. On May 13, 3:46 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi ecognium, Model.get does not return BadKeyError when attempting to fetch an entity that does not exist - it returns None, as documented. BadKeyError is raised when the key itself is invalid, because it lacks a name or id, or has an invalid name. You probably want to check that you're not generating invalid keys - such as one with a name that starts with a digit. -Nick Johnson On Wed, May 13, 2009 at 9:55 AM, ecognium ecogn...@gmail.com wrote: Hello Everyone, It looks like Model.get(keys) raises an exception (BadKeyError) when the key does not exist even though it is not mentioned here http://code.google.com/appengine/docs/python/datastore/modelclass.htm On the other hand, Model.get_key_by_name(names) just returns None for the keys that do not exist. If I understand it correctly keys and key_names are not the same - i.e., i cannot use a key value inside the get_by_key_name() call.. so how can i get data using key without having to worry about determining which specific key was not valid? I can create my own key names but I would like to use the default uniqueness guarantee of the keys. If there is no easy way to deal with this exception, is there an easy way to make the key_name same as the key value provided by default? In case you are wondering why the keys will be invalid: I am allowing the users to bookmark certain objects.. so i store the key names in the user prefs data... If the object referenced gets deleted then the key information stored inside the userprefs will be outdated and when they query for their bookmarks, it causes an issue.. I wanted to avoid the whole process of going through every single user who may be using the deleted key and remove it real-time. so i thought i will just handle it as a batch job... Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Image Resize Without Forcing Aspect Ratio
Hello Everyone, It looks like the images.resize() always honors the aspect ratio. Is there a way to force a certain dimension? For example, in ImageMagick you can use an exclamation (!) at the end to override the aspect ratio Any suggestions will be appreciated. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] GQL Query with IN operator Issue (bug or am i making a mistake?)
Hello everyone, I noticed an odd behavior with GQL query when it has two IN operators and a regular condition. Below is some basic code to reproduce the problem: class DummyData(db.Model): x = db.StringListProperty() y = db.TextProperty() class Dummy(webapp.RequestHandler): def get(self): d = DummyData() d.x = ['a', 'b','c'] d.y = test d.put() d = DummyData() d.x = ['c', 'd','e'] d.y = test2 d.put() q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and x in ('a') ) results = q.fetch(10) # 10 instead of 2? - useful if you run the test multiple times for r in results: self.response.headers['Content-Type'] = text/plain self.response.out.write(x = + ,.join(r.x) + y = + r.y + \n) When you run the above code you will see the following output: x = a,b,c y = test However when I replace the above query with the one below, I do not get any results (even though it should return the same result as above): # Note the addition of y = 'test' q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in ('c') and x in ('a') ) Although here the IN conditions are the same as '=', my application actually uses multiple list values.; I am just presenting a simpler example. If someone can confirm the issue, I can open a bug report for this. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: GQL Query with IN operator Issue (bug or am i making a mistake?)
Thanks, Andy. You are correct, my mistake. I saw some strange behavior with the data that I was working with and I couldn't figure why the return values made sense for the query. There I had StringProperty instead of TextProperty. I thought I will construct a quick example and made the mistake of using the wrong type in that process. That said, I have made so many changes since this issue surfaced that I have forgotten exactly which data values in the query caused the problem. I will try to reproduce on the larger data that I have. If I cannot then it is either a browser caching problem or it was too late and was not thinking clearly :) Thanks again! On Apr 20, 7:21 am, Andy Freeman ana...@earthlink.net wrote: db.TextProperty is not an indexable property. That means that it's not queryable either. It would be nice if to get an exception or some other indication of what's going on. However, note that indexable is something that happens in the datastore when an instance is store. If you change a property from StringProperty to TextProperty or the reverse, strange things will probably happen. (If you put some instances with StringProperty, I suspect that you can still successfully query for those instances using that property after you've switched to TextProperty.) On Apr 19, 1:24 am, ecognium ecogn...@gmail.com wrote: Hello everyone, I noticed an odd behavior with GQL query when it has two IN operators and a regular condition. Below is some basic code to reproduce the problem: class DummyData(db.Model): x = db.StringListProperty() y = db.TextProperty() class Dummy(webapp.RequestHandler): def get(self): d = DummyData() d.x = ['a', 'b','c'] d.y = test d.put() d = DummyData() d.x = ['c', 'd','e'] d.y = test2 d.put() q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and x in ('a') ) results = q.fetch(10) # 10 instead of 2? - useful if you run the test multiple times for r in results: self.response.headers['Content-Type'] = text/plain self.response.out.write(x = + ,.join(r.x) + y = + r.y + \n) When you run the above code you will see the following output: x = a,b,c y = test However when I replace the above query with the one below, I do not get any results (even though it should return the same result as above): # Note the addition of y = 'test' q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in ('c') and x in ('a') ) Although here the IN conditions are the same as '=', my application actually uses multiple list values.; I am just presenting a simpler example. If someone can confirm the issue, I can open a bug report for this. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: GQL Query with IN operator Issue (bug or am i making a mistake?)
I think I found it - didn't expect to get lucky so quickly... class DummyData(db.Model): x = db.StringListProperty() y = db.StringProperty() class Dummy(webapp.RequestHandler): def get(self): d = DummyData() d.x = ['a','b','c'] d.y = 'test' d.put() d = DummyData() d.x = ['c','d','e'] d.y = 'test' d.put() d.x = ['r','s','t'] d.y = 'test2' d.put() q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and x in ('r') ) results = q.fetch(10) self.response.headers['Content-Type'] = 'text/plain' for r in results: self.response.out.write (x = + ,.join(r.x) + y = + r.y + \n) Now when you run the above code you should 'not' get any results. However, the query returns the 3rd record. If you switch the order of the conditions, you get the first two records. My guess is since multiple IN operators will create a combinatorial problem, Gql keeps only the last IN operator. I did not see such a thing being documented, did you see anything to that effect? For my app I would like to use multiple IN operators and can still be within the 30 query limit. I am trying to filter the results by categories and also provide pagination. If I decide to filter everything on the client side, I will have to send all the data to the client which will be inefficient. When I tried to split the query into multiple chunks, it becomes really hard to keep track of previous state. Maybe I should post it a separate question. Please do let me know if you see the same issue. If I have made another mistake then I am probably going crazy and stop worrying about this issue :) Thanks! On Apr 20, 7:21 am, Andy Freeman ana...@earthlink.net wrote: db.TextProperty is not an indexable property. That means that it's not queryable either. It would be nice if to get an exception or some other indication of what's going on. However, note that indexable is something that happens in the datastore when an instance is store. If you change a property from StringProperty to TextProperty or the reverse, strange things will probably happen. (If you put some instances with StringProperty, I suspect that you can still successfully query for those instances using that property after you've switched to TextProperty.) On Apr 19, 1:24 am, ecognium ecogn...@gmail.com wrote: Hello everyone, I noticed an odd behavior with GQL query when it has two IN operators and a regular condition. Below is some basic code to reproduce the problem: class DummyData(db.Model): x = db.StringListProperty() y = db.TextProperty() class Dummy(webapp.RequestHandler): def get(self): d = DummyData() d.x = ['a', 'b','c'] d.y = test d.put() d = DummyData() d.x = ['c', 'd','e'] d.y = test2 d.put() q = db.GqlQuery(SELECT * FROM DummyData where x in ('c') and x in ('a') ) results = q.fetch(10) # 10 instead of 2? - useful if you run the test multiple times for r in results: self.response.headers['Content-Type'] = text/plain self.response.out.write(x = + ,.join(r.x) + y = + r.y + \n) When you run the above code you will see the following output: x = a,b,c y = test However when I replace the above query with the one below, I do not get any results (even though it should return the same result as above): # Note the addition of y = 'test' q = db.GqlQuery(SELECT * FROM DummyData where y = 'test' and x in ('c') and x in ('a') ) Although here the IN conditions are the same as '=', my application actually uses multiple list values.; I am just presenting a simpler example. If someone can confirm the issue, I can open a bug report for this. Thanks! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---