Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data:
ID BlobProp1 BlobProp2-N Keywords Categ ================================================= 123 blah blah tag1,tag2,tag3 Circle, Red, Large, Dotted 345 blah blah tag3,tag4,tag5 Square, Blue, Small, Solid 678 blah blah tag1,tag3,tag4 Circle, Blue, Small, Solid ------------------------------------------------------------------------------------------------------------- The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = "Circle" AND categ = "Small" AND categ = "Blue" AND categ = "Dotted" When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. so I am guessing the following spec should not cause any index issues in the future? - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, "Nick Johnson (Google)" <nick.john...@google.com> wrote: > Hi ecognium, > > If I understand your problem correctly, every entity will have 0-4 entries > in the 'categ' list, corresponding to the values for each of 4 categories > (eg, Color, Size, Shape, etc)? > > The sample query you give, with only equality filters, will be satisfiable > using the merge join query planner, which doesn't require custom indexes, so > you won't have high indexing overhead. There will simply be one index entry > for each item in each list. > > If you do need custom indexes, the number of index entries, isn't 4^4, as > you suggest, but rather smaller. Assuming you want to be able to query with > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > (depending on if the 0-category case requires its own index), and the total > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = > 63, which is still a not-unreasonable number. > > -Nick Johnson > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium <ecogn...@gmail.com> wrote: > > > Hi All, > > > I would like to get your opinion on the best way to structure my > > data model. > > My app allows the users to filter the entities by four category types > > (say A,B,C,D). Each category can have multiple values (for e.g., > > category type A can have values 1,2,3) but the > > user can choose only one value per category for filtering. Please > > note the values are unique across the category types as well. I could > > create four fields corresponding to the four types but it does not > > allow me to expand to more categories later easily. Right now, I just > > use one list field to store the different values as it is easy to add > > more category types later on. > > > My model (simplified) looks like this: > > > class Example(db.Model): > > > categ = db.StringListProperty() > > > keywords = db.StringListProperty() > > > The field keywords will have about 10-20 values for each entity. For > > the above example, categ will have up to 4 values. Since I allow for > > filtering on 4 category types, the index table gets large with > > unnecessary values. The filtering logic looks like: > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > categ = '99' > > > Since there are 4 values in the categ list property, there will be > > 4^4 rows created in the index table (most of them will never be hit > > due to the uniqueness guaranteed by design). Multiply it by the number > > of values in the keywords table, the index table gets large very > > quickly. > > > I would like to avoid creating multiple fields if possible because > > when I want to make the number of category types to six, I would have > > to change the underlying model and all the filtering code. Any > > suggestions on how to construct the model such that it will allow for > > ease of expansion in category types yet still not create large index > > tables? I know there is a Category Property but not sure if it really > > provides any specific benefit here. > > > Thanks! > > -e > > -- > Nick Johnson, App Engine Developer Programs Engineer > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: > 368047 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---