Thanks again - this is very helpful. I will let you know if i run into any future index creation errors as it could have been caused by any number of other entries - i mistakenly thought it was all these categ list-based entries.
So if i understand it right even with a 10 element list for keywords, there will only be 10 rows when 4 categ fields are used. In the event I use 'categ' only once in my query along with keywords field, it will have up to 40 rows (10 from keywords and 4C1 from categ list). Am I adding these up right? I do not see myself going beyond 6 elements in the categ list at this point (I guess the max will be 6C3 = 20 under such a situation). The keyword list will be probably go into the 20s but do not see anything beyond that and will always be used only once in the query. Thanks, -e On Jun 23, 3:53 am, "Nick Johnson (Google)" <nick.john...@google.com> wrote: > Hi ecognium, > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium <ecogn...@gmail.com> wrote: > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > Suppose I have the following data: > > > ID BlobProp1 BlobProp2-N Keywords > > Categ > > ================================================= > > 123 blah blah tag1,tag2,tag3 > > Circle, > > Red, Large, Dotted > > 345 blah blah tag3,tag4,tag5 > > Square, Blue, Small, Solid > > 678 blah blah tag1,tag3,tag4 > > Circle, Blue, Small, Solid > > > ------------------------------------------------------------------------------------------------------------- > > > The field categ (list) contains four different types - Shape, Color, > > Size and Line Type. Suppose the user wants to retrieve all entities > > that are Small Dotted Blue Circles then the query will be: > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > categ = "Blue" AND categ = "Dotted" > > > When I was reading about exploding indexes the example indicated the > > issue was due to Cartesian product of two list elements. I thought the > > same will hold true with one list field when used multiple times in a > > query. > > That is indeed true, though it's not quite the cartesian product - the > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination, > which is a substantially smaller number than the cartesian product. It's > still only tractable for small lists, though, such as the 4 item lists > you're dealing with. > > Are you saying the above query will not need {Circle, Red, > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > number of index entities for entity ID=123? > > Correct - if you're not specifying a sort order, you can execute the query > without any composite indexes whatsoever. The datastore satisfies > equality-only queries using a merge join strategy. > > > I was getting index errors > > when I was using the categ list property four times in my index > > specification and that's why I was wondering if I should restructure > > things. > > How many items did you have in the list you were indexing in that case? If > your list has 4 items and your index specification lists it 4 times, you > should only get one index entry. > > so I am guessing the following spec should not cause any index > > > issues in the future? > > Again, that depends on the number of entries in the 'categ' list. With 4 > entries, this will only generate a single index entry, but the number of > entries will expand exponentially as the list increases in size. > > -Nick Johnson > > > > > > > - kind: MyModel > > properties: > > - name: categ > > - name: categ > > - name: categ > > - name: categ > > - name: keywords > > - name: __key__ # used for paging > > > Thanks, > > -e > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" <nick.john...@google.com> > > wrote: > > > Hi ecognium, > > > > If I understand your problem correctly, every entity will have 0-4 > > entries > > > in the 'categ' list, corresponding to the values for each of 4 categories > > > (eg, Color, Size, Shape, etc)? > > > > The sample query you give, with only equality filters, will be > > satisfiable > > > using the merge join query planner, which doesn't require custom indexes, > > so > > > you won't have high indexing overhead. There will simply be one index > > entry > > > for each item in each list. > > > > If you do need custom indexes, the number of index entries, isn't 4^4, as > > > you suggest, but rather smaller. Assuming you want to be able to query > > with > > > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > > > (depending on if the 0-category case requires its own index), and the > > total > > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = > > 15. > > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 > > = > > > 63, which is still a not-unreasonable number. > > > > -Nick Johnson > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium <ecogn...@gmail.com> wrote: > > > > > Hi All, > > > > > I would like to get your opinion on the best way to structure my > > > > data model. > > > > My app allows the users to filter the entities by four category types > > > > (say A,B,C,D). Each category can have multiple values (for e.g., > > > > category type A can have values 1,2,3) but the > > > > user can choose only one value per category for filtering. Please > > > > note the values are unique across the category types as well. I could > > > > create four fields corresponding to the four types but it does not > > > > allow me to expand to more categories later easily. Right now, I just > > > > use one list field to store the different values as it is easy to add > > > > more category types later on. > > > > > My model (simplified) looks like this: > > > > > class Example(db.Model): > > > > > categ = db.StringListProperty() > > > > > keywords = db.StringListProperty() > > > > > The field keywords will have about 10-20 values for each entity. For > > > > the above example, categ will have up to 4 values. Since I allow for > > > > filtering on 4 category types, the index table gets large with > > > > unnecessary values. The filtering logic looks like: > > > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > > > categ = '99' > > > > > Since there are 4 values in the categ list property, there will be > > > > 4^4 rows created in the index table (most of them will never be hit > > > > due to the uniqueness guaranteed by design). Multiply it by the number > > > > of values in the keywords table, the index table gets large very > > > > quickly. > > > > > I would like to avoid creating multiple fields if possible because > > > > when I want to make the number of category types to six, I would have > > > > to change the underlying model and all the filtering code. Any > > > > suggestions on how to construct the model such that it will allow for > > > > ease of expansion in category types yet still not create large index > > > > tables? I know there is a Category Property but not sure if it really > > > > provides any specific benefit here. > > > > > Thanks! > > > > -e > > > > -- > > > Nick Johnson, App Engine Developer Programs Engineer > > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > Number: > > > 368047 > > -- > Nick Johnson, App Engine Developer Programs Engineer > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: > 368047 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---