Thanks very much! I think I understand indexing with Lists much better now... >> In the case of your index with 4 occurrences, this will rather be 6C4. Yup, I was looking at the max situation when I use 3 categ fields. Need to be careful as it grows pretty quickly.
On Jun 24, 4:19 am, "Nick Johnson (Google)" <nick.john...@google.com> wrote: > Hi ecognium, > > On Tue, Jun 23, 2009 at 6:11 PM, ecognium <ecogn...@gmail.com> wrote: > > > Thanks again - this is very helpful. I will let you know if i run into > > any future index creation errors as it could have been caused by any > > number of other entries - i mistakenly thought it was all these categ > > list-based entries. > > > So if i understand it right even with a 10 element list for keywords, > > there will only be 10 rows when 4 categ fields are used. > > Correct - 4C4 * 10C1 index entries for the custom index you specified > earlier. > > > In the event > > I use 'categ' only once in my query along with keywords field, it > > will have up to 40 rows (10 from keywords and 4C1 from categ list). > > Correct. > > > Am > > I adding these up right? > > > I do not see myself going beyond 6 elements in the categ list at this > > point (I guess the max will be 6C3 = 20 under such a situation). > > In the case of your index with 4 occurrences, this will rather be 6C4. > > > > > The > > keyword list will be probably go into the 20s but do not see anything > > beyond that and will always be used only once in the query. > > > Thanks, > > -e > > > On Jun 23, 3:53 am, "Nick Johnson (Google)" <nick.john...@google.com> > > wrote: > > > Hi ecognium, > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium <ecogn...@gmail.com> wrote: > > > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > > > Suppose I have the following data: > > > > > ID BlobProp1 BlobProp2-N Keywords > > > > Categ > > > > ================================================= > > > > 123 blah blah tag1,tag2,tag3 > > > > Circle, > > > > Red, Large, Dotted > > > > 345 blah blah tag3,tag4,tag5 > > > > Square, Blue, Small, Solid > > > > 678 blah blah tag1,tag3,tag4 > > > > Circle, Blue, Small, Solid > > > ------------------------------------------------------------------------------------------------------------- > > > > > The field categ (list) contains four different types - Shape, Color, > > > > Size and Line Type. Suppose the user wants to retrieve all entities > > > > that are Small Dotted Blue Circles then the query will be: > > > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > > > categ = "Blue" AND categ = "Dotted" > > > > > When I was reading about exploding indexes the example indicated the > > > > issue was due to Cartesian product of two list elements. I thought the > > > > same will hold true with one list field when used multiple times in a > > > > query. > > > > That is indeed true, though it's not quite the cartesian product - the > > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique > > combination, > > > which is a substantially smaller number than the cartesian product. It's > > > still only tractable for small lists, though, such as the 4 item lists > > > you're dealing with. > > > > Are you saying the above query will not need {Circle, Red, > > > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > > > number of index entities for entity ID=123? > > > > Correct - if you're not specifying a sort order, you can execute the > > query > > > without any composite indexes whatsoever. The datastore satisfies > > > equality-only queries using a merge join strategy. > > > > > I was getting index errors > > > > when I was using the categ list property four times in my index > > > > specification and that's why I was wondering if I should restructure > > > > things. > > > > How many items did you have in the list you were indexing in that case? > > If > > > your list has 4 items and your index specification lists it 4 times, you > > > should only get one index entry. > > > > so I am guessing the following spec should not cause any index > > > > > issues in the future? > > > > Again, that depends on the number of entries in the 'categ' list. With 4 > > > entries, this will only generate a single index entry, but the number of > > > entries will expand exponentially as the list increases in size. > > > > -Nick Johnson > > > > > - kind: MyModel > > > > properties: > > > > - name: categ > > > > - name: categ > > > > - name: categ > > > > - name: categ > > > > - name: keywords > > > > - name: __key__ # used for paging > > > > > Thanks, > > > > -e > > > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" <nick.john...@google.com> > > > > wrote: > > > > > Hi ecognium, > > > > > > If I understand your problem correctly, every entity will have 0-4 > > > > entries > > > > > in the 'categ' list, corresponding to the values for each of 4 > > categories > > > > > (eg, Color, Size, Shape, etc)? > > > > > > The sample query you give, with only equality filters, will be > > > > satisfiable > > > > > using the merge join query planner, which doesn't require custom > > indexes, > > > > so > > > > > you won't have high indexing overhead. There will simply be one index > > > > entry > > > > > for each item in each list. > > > > > > If you do need custom indexes, the number of index entries, isn't > > 4^4, as > > > > > you suggest, but rather smaller. Assuming you want to be able to > > query > > > > with > > > > > any number of categories from 0 to 4, you'll need 3 or 4 custom > > indexes > > > > > (depending on if the 0-category case requires its own index), and the > > > > total > > > > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 > > = > > > > 15. > > > > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 > > + 1 > > > > = > > > > > 63, which is still a not-unreasonable number. > > > > > > -Nick Johnson > > > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium <ecogn...@gmail.com> > > wrote: > > > > > > > Hi All, > > > > > > > I would like to get your opinion on the best way to structure my > > > > > > data model. > > > > > > My app allows the users to filter the entities by four category > > types > > > > > > (say A,B,C,D). Each category can have multiple values (for e.g., > > > > > > category type A can have values 1,2,3) but the > > > > > > user can choose only one value per category for filtering. Please > > > > > > note the values are unique across the category types as well. I > > could > > > > > > create four fields corresponding to the four types but it does not > > > > > > allow me to expand to more categories later easily. Right now, I > > just > > > > > > use one list field to store the different values as it is easy to > > add > > > > > > more category types later on. > > > > > > > My model (simplified) looks like this: > > > > > > > class Example(db.Model): > > > > > > > categ = db.StringListProperty() > > > > > > > keywords = db.StringListProperty() > > > > > > > The field keywords will have about 10-20 values for each entity. > > For > > > > > > the above example, categ will have up to 4 values. Since I allow > > for > > > > > > filtering on 4 category types, the index table gets large with > > > > > > unnecessary values. The filtering logic looks like: > > > > > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > > > > > categ = '99' > > > > > > > Since there are 4 values in the categ list property, there will be > > > > > > 4^4 rows created in the index table (most of them will never be hit > > > > > > due to the uniqueness guaranteed by design). Multiply it by the > > number > > > > > > of values in the keywords table, the index table gets large very > > > > > > quickly. > > > > > > > I would like to avoid creating multiple fields if possible because > > > > > > when I want to make the number of category types to six, I would > > have > > > > > > to change the underlying model and all the filtering code. Any > > > > > > suggestions on how to construct the model such that it will allow > > for > > > > > > ease of expansion in category types yet still not create large > > index > > > > > > tables? I know there is a Category Property but not sure if it > > really > > > > > > provides any specific benefit here. > > > > > > > Thanks! > > > > > > -e > > > > > > -- > > > > > Nick Johnson, App Engine Developer Programs Engineer > > > > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > > > Number: > > > > > 368047 > > > > -- > > > Nick Johnson, App Engine Developer Programs Engineer > > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > Number: > > > 368047 > > -- > Nick Johnson, App Engine Developer Programs Engineer > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: > 368047 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---