[google-appengine] Re: Efficient way to structure my data model
Thanks very much! I think I understand indexing with Lists much better now... >> In the case of your index with 4 occurrences, this will rather be 6C4. Yup, I was looking at the max situation when I use 3 categ fields. Need to be careful as it grows pretty quickly. On Jun 24, 4:19 am, "Nick Johnson (Google)" wrote: > Hi ecognium, > > On Tue, Jun 23, 2009 at 6:11 PM, ecognium wrote: > > > Thanks again - this is very helpful. I will let you know if i run into > > any future index creation errors as it could have been caused by any > > number of other entries - i mistakenly thought it was all these categ > > list-based entries. > > > So if i understand it right even with a 10 element list for keywords, > > there will only be 10 rows when 4 categ fields are used. > > Correct - 4C4 * 10C1 index entries for the custom index you specified > earlier. > > > In the event > > I use 'categ' only once in my query along with keywords field, it > > will have up to 40 rows (10 from keywords and 4C1 from categ list). > > Correct. > > > Am > > I adding these up right? > > > I do not see myself going beyond 6 elements in the categ list at this > > point (I guess the max will be 6C3 = 20 under such a situation). > > In the case of your index with 4 occurrences, this will rather be 6C4. > > > > > The > > keyword list will be probably go into the 20s but do not see anything > > beyond that and will always be used only once in the query. > > > Thanks, > > -e > > > On Jun 23, 3:53 am, "Nick Johnson (Google)" > > wrote: > > > Hi ecognium, > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium wrote: > > > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > > > Suppose I have the following data: > > > > > ID BlobProp1 BlobProp2-N Keywords > > > > Categ > > > > = > > > > 123 blah blah tag1,tag2,tag3 > > > > Circle, > > > > Red, Large, Dotted > > > > 345 blah blah tag3,tag4,tag5 > > > > Square, Blue, Small, Solid > > > > 678 blah blah tag1,tag3,tag4 > > > > Circle, Blue, Small, Solid > > > - > > > > > The field categ (list) contains four different types - Shape, Color, > > > > Size and Line Type. Suppose the user wants to retrieve all entities > > > > that are Small Dotted Blue Circles then the query will be: > > > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > > > categ = "Blue" AND categ = "Dotted" > > > > > When I was reading about exploding indexes the example indicated the > > > > issue was due to Cartesian product of two list elements. I thought the > > > > same will hold true with one list field when used multiple times in a > > > > query. > > > > That is indeed true, though it's not quite the cartesian product - the > > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique > > combination, > > > which is a substantially smaller number than the cartesian product. It's > > > still only tractable for small lists, though, such as the 4 item lists > > > you're dealing with. > > > > Are you saying the above query will not need {Circle, Red, > > > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > > > number of index entities for entity ID=123? > > > > Correct - if you're not specifying a sort order, you can execute the > > query > > > without any composite indexes whatsoever. The datastore satisfies > > > equality-only queries using a merge join strategy. > > > > > I was getting index errors > > > > when I was using the categ list property four times in my index > > > > specification and that's why I was wondering if I should restructure > > > > things. > > > > How many items did you have in the list you were indexing in that case? > > If > > > your list has 4 items and your index specification lists it 4 times, you > > > should only get one index entry. > > > > so I am guessing the following spec should not cause any index > > > > > issues in the future? > > > > Again, that depends on the number of entries in the 'categ' list. With 4 > > > entries, this will only generate a single index entry, but the number of > > > entries will expand exponentially as the list increases in size. > > > > -Nick Johnson > > > > > - kind: MyModel > > > > properties: > > > > - name: categ > > > > - name: categ > > > > - name: categ > > > > - name: categ > > > > - name: keywords > > > > - name: __key__ # used for paging > > > > > Thanks, > > > > -e > > > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" > > > > wrote: > > > > > Hi ecognium, > > > > > > If I understand your problem correctly, every entity will have 0-4 > > > > entries > > > > > in the 'categ' list, corresponding to the
[google-appengine] Re: Efficient way to structure my data model
Hi ecognium, On Tue, Jun 23, 2009 at 6:11 PM, ecognium wrote: > > Thanks again - this is very helpful. I will let you know if i run into > any future index creation errors as it could have been caused by any > number of other entries - i mistakenly thought it was all these categ > list-based entries. > > So if i understand it right even with a 10 element list for keywords, > there will only be 10 rows when 4 categ fields are used. Correct - 4C4 * 10C1 index entries for the custom index you specified earlier. > In the event > I use 'categ' only once in my query along with keywords field, it > will have up to 40 rows (10 from keywords and 4C1 from categ list). Correct. > Am > I adding these up right? > > I do not see myself going beyond 6 elements in the categ list at this > point (I guess the max will be 6C3 = 20 under such a situation). In the case of your index with 4 occurrences, this will rather be 6C4. > The > keyword list will be probably go into the 20s but do not see anything > beyond that and will always be used only once in the query. > > Thanks, > -e > > On Jun 23, 3:53 am, "Nick Johnson (Google)" > wrote: > > Hi ecognium, > > > > > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium wrote: > > > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > > Suppose I have the following data: > > > > > ID BlobProp1 BlobProp2-N Keywords > > > Categ > > > = > > > 123 blahblahtag1,tag2,tag3 > > > Circle, > > > Red, Large, Dotted > > > 345 blahblahtag3,tag4,tag5 > > > Square, Blue, Small, Solid > > > 678 blahblahtag1,tag3,tag4 > > > Circle, Blue, Small, Solid > > > > > > - > > > > > The field categ (list) contains four different types - Shape, Color, > > > Size and Line Type. Suppose the user wants to retrieve all entities > > > that are Small Dotted Blue Circles then the query will be: > > > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > > categ = "Blue" AND categ = "Dotted" > > > > > When I was reading about exploding indexes the example indicated the > > > issue was due to Cartesian product of two list elements. I thought the > > > same will hold true with one list field when used multiple times in a > > > query. > > > > That is indeed true, though it's not quite the cartesian product - the > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique > combination, > > which is a substantially smaller number than the cartesian product. It's > > still only tractable for small lists, though, such as the 4 item lists > > you're dealing with. > > > > Are you saying the above query will not need {Circle, Red, > > > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > > number of index entities for entity ID=123? > > > > Correct - if you're not specifying a sort order, you can execute the > query > > without any composite indexes whatsoever. The datastore satisfies > > equality-only queries using a merge join strategy. > > > > > I was getting index errors > > > when I was using the categ list property four times in my index > > > specification and that's why I was wondering if I should restructure > > > things. > > > > How many items did you have in the list you were indexing in that case? > If > > your list has 4 items and your index specification lists it 4 times, you > > should only get one index entry. > > > > so I am guessing the following spec should not cause any index > > > > > issues in the future? > > > > Again, that depends on the number of entries in the 'categ' list. With 4 > > entries, this will only generate a single index entry, but the number of > > entries will expand exponentially as the list increases in size. > > > > -Nick Johnson > > > > > > > > > > > > > - kind: MyModel > > > properties: > > > - name: categ > > > - name: categ > > > - name: categ > > > - name: categ > > > - name: keywords > > > - name: __key__ # used for paging > > > > > Thanks, > > > -e > > > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" > > > wrote: > > > > Hi ecognium, > > > > > > If I understand your problem correctly, every entity will have 0-4 > > > entries > > > > in the 'categ' list, corresponding to the values for each of 4 > categories > > > > (eg, Color, Size, Shape, etc)? > > > > > > The sample query you give, with only equality filters, will be > > > satisfiable > > > > using the merge join query planner, which doesn't require custom > indexes, > > > so > > > > you won't have high indexing overhead. There will simply be one index > > > entry > > > > for each item in each list. > > > > > > If you do need custom indexes, the number of
[google-appengine] Re: Efficient way to structure my data model
Thanks again - this is very helpful. I will let you know if i run into any future index creation errors as it could have been caused by any number of other entries - i mistakenly thought it was all these categ list-based entries. So if i understand it right even with a 10 element list for keywords, there will only be 10 rows when 4 categ fields are used. In the event I use 'categ' only once in my query along with keywords field, it will have up to 40 rows (10 from keywords and 4C1 from categ list). Am I adding these up right? I do not see myself going beyond 6 elements in the categ list at this point (I guess the max will be 6C3 = 20 under such a situation). The keyword list will be probably go into the 20s but do not see anything beyond that and will always be used only once in the query. Thanks, -e On Jun 23, 3:53 am, "Nick Johnson (Google)" wrote: > Hi ecognium, > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium wrote: > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > Suppose I have the following data: > > > ID BlobProp1 BlobProp2-N Keywords > > Categ > > = > > 123 blah blah tag1,tag2,tag3 > > Circle, > > Red, Large, Dotted > > 345 blah blah tag3,tag4,tag5 > > Square, Blue, Small, Solid > > 678 blah blah tag1,tag3,tag4 > > Circle, Blue, Small, Solid > > > - > > > The field categ (list) contains four different types - Shape, Color, > > Size and Line Type. Suppose the user wants to retrieve all entities > > that are Small Dotted Blue Circles then the query will be: > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > categ = "Blue" AND categ = "Dotted" > > > When I was reading about exploding indexes the example indicated the > > issue was due to Cartesian product of two list elements. I thought the > > same will hold true with one list field when used multiple times in a > > query. > > That is indeed true, though it's not quite the cartesian product - the > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination, > which is a substantially smaller number than the cartesian product. It's > still only tractable for small lists, though, such as the 4 item lists > you're dealing with. > > Are you saying the above query will not need {Circle, Red, > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > number of index entities for entity ID=123? > > Correct - if you're not specifying a sort order, you can execute the query > without any composite indexes whatsoever. The datastore satisfies > equality-only queries using a merge join strategy. > > > I was getting index errors > > when I was using the categ list property four times in my index > > specification and that's why I was wondering if I should restructure > > things. > > How many items did you have in the list you were indexing in that case? If > your list has 4 items and your index specification lists it 4 times, you > should only get one index entry. > > so I am guessing the following spec should not cause any index > > > issues in the future? > > Again, that depends on the number of entries in the 'categ' list. With 4 > entries, this will only generate a single index entry, but the number of > entries will expand exponentially as the list increases in size. > > -Nick Johnson > > > > > > > - kind: MyModel > > properties: > > - name: categ > > - name: categ > > - name: categ > > - name: categ > > - name: keywords > > - name: __key__ # used for paging > > > Thanks, > > -e > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" > > wrote: > > > Hi ecognium, > > > > If I understand your problem correctly, every entity will have 0-4 > > entries > > > in the 'categ' list, corresponding to the values for each of 4 categories > > > (eg, Color, Size, Shape, etc)? > > > > The sample query you give, with only equality filters, will be > > satisfiable > > > using the merge join query planner, which doesn't require custom indexes, > > so > > > you won't have high indexing overhead. There will simply be one index > > entry > > > for each item in each list. > > > > If you do need custom indexes, the number of index entries, isn't 4^4, as > > > you suggest, but rather smaller. Assuming you want to be able to query > > with > > > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > > > (depending on if the 0-category case requires its own index), and the > > total > > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = > > 15. > > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 > > = > > > 63, which is still a not-unreasonable num
[google-appengine] Re: Efficient way to structure my data model
Hi ecognium, On Tue, Jun 23, 2009 at 1:35 AM, ecognium wrote: > > Thanks, Nick. Let me make sure I understand your comment correctly. > Suppose I have the following data: > > ID BlobProp1 BlobProp2-N Keywords > Categ > = > 123 blahblahtag1,tag2,tag3 > Circle, > Red, Large, Dotted > 345 blahblahtag3,tag4,tag5 > Square, Blue, Small, Solid > 678 blahblahtag1,tag3,tag4 > Circle, Blue, Small, Solid > > - > > The field categ (list) contains four different types - Shape, Color, > Size and Line Type. Suppose the user wants to retrieve all entities > that are Small Dotted Blue Circles then the query will be: > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > categ = "Blue" AND categ = "Dotted" > > When I was reading about exploding indexes the example indicated the > issue was due to Cartesian product of two list elements. I thought the > same will hold true with one list field when used multiple times in a > query. That is indeed true, though it's not quite the cartesian product - the datastore won't bother indexing (Circle, Circle, Circle, Circle), or (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination, which is a substantially smaller number than the cartesian product. It's still only tractable for small lists, though, such as the 4 item lists you're dealing with. Are you saying the above query will not need {Circle, Red, > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > number of index entities for entity ID=123? Correct - if you're not specifying a sort order, you can execute the query without any composite indexes whatsoever. The datastore satisfies equality-only queries using a merge join strategy. > I was getting index errors > when I was using the categ list property four times in my index > specification and that's why I was wondering if I should restructure > things. How many items did you have in the list you were indexing in that case? If your list has 4 items and your index specification lists it 4 times, you should only get one index entry. so I am guessing the following spec should not cause any index > issues in the future? Again, that depends on the number of entries in the 'categ' list. With 4 entries, this will only generate a single index entry, but the number of entries will expand exponentially as the list increases in size. -Nick Johnson > > - kind: MyModel > properties: > - name: categ > - name: categ > - name: categ > - name: categ > - name: keywords > - name: __key__ # used for paging > > Thanks, > -e > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" > wrote: > > Hi ecognium, > > > > If I understand your problem correctly, every entity will have 0-4 > entries > > in the 'categ' list, corresponding to the values for each of 4 categories > > (eg, Color, Size, Shape, etc)? > > > > The sample query you give, with only equality filters, will be > satisfiable > > using the merge join query planner, which doesn't require custom indexes, > so > > you won't have high indexing overhead. There will simply be one index > entry > > for each item in each list. > > > > If you do need custom indexes, the number of index entries, isn't 4^4, as > > you suggest, but rather smaller. Assuming you want to be able to query > with > > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > > (depending on if the 0-category case requires its own index), and the > total > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = > 15. > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 > = > > 63, which is still a not-unreasonable number. > > > > -Nick Johnson > > > > > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium wrote: > > > > > Hi All, > > > > >I would like to get your opinion on the best way to structure my > > > data model. > > > My app allows the users to filter the entities by four category types > > > (say A,B,C,D). Each category can have multiple values (for e.g., > > > category type A can have values 1,2,3) but the > > > user can choose only one value per category for filtering. Please > > > note the values are unique across the category types as well. I could > > > create four fields corresponding to the four types but it does not > > > allow me to expand to more categories later easily. Right now, I just > > > use one list field to store the different values as it is easy to add > > > more category types later on. > > > > > My model (simplified) looks like this: > > > > > class Example(db.Model): > > > > >categ= db.StringListProperty() > > > > >keywords = db.StringListProperty() > > > > > The field keywords will have about
[google-appengine] Re: Efficient way to structure my data model
Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID Blob1 Blob2-N Keywords Categ 123 blah blah tag1,tag2,tag3 Circle,Red, Large, Dotted 345 blah blah tag3,tag4,tag5 Square, Blue, Small, Solid 678 blah blah tag1,tag3,tag4 Circle, Blue, Small, Solid -- The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = "Circle" AND categ = "Small" AND categ = "Blue" AND categ = "Dotted" When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. so I am guessing the following spec should not cause any index issues in the future? - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, "Nick Johnson (Google)" wrote: > Hi ecognium, > > If I understand your problem correctly, every entity will have 0-4 entries > in the 'categ' list, corresponding to the values for each of 4 categories > (eg, Color, Size, Shape, etc)? > > The sample query you give, with only equality filters, will be satisfiable > using the merge join query planner, which doesn't require custom indexes, so > you won't have high indexing overhead. There will simply be one index entry > for each item in each list. > > If you do need custom indexes, the number of index entries, isn't 4^4, as > you suggest, but rather smaller. Assuming you want to be able to query with > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > (depending on if the 0-category case requires its own index), and the total > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = > 63, which is still a not-unreasonable number. > > -Nick Johnson > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium wrote: > > > Hi All, > > > I would like to get your opinion on the best way to structure my > > data model. > > My app allows the users to filter the entities by four category types > > (say A,B,C,D). Each category can have multiple values (for e.g., > > category type A can have values 1,2,3) but the > > user can choose only one value per category for filtering. Please > > note the values are unique across the category types as well. I could > > create four fields corresponding to the four types but it does not > > allow me to expand to more categories later easily. Right now, I just > > use one list field to store the different values as it is easy to add > > more category types later on. > > > My model (simplified) looks like this: > > > class Example(db.Model): > > > categ = db.StringListProperty() > > > keywords = db.StringListProperty() > > > The field keywords will have about 10-20 values for each entity. For > > the above example, categ will have up to 4 values. Since I allow for > > filtering on 4 category types, the index table gets large with > > unnecessary values. The filtering logic looks like: > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > categ = '99' > > > Since there are 4 values in the categ list property, there will be > > 4^4 rows created in the index table (most of them will never be hit > > due to the uniqueness guaranteed by design). Multiply it by the number > > of values in the keywords table, the index table gets large very > > quickly. > > > I would like to avoid creating multiple fields if possible because > > when I want to make the number of category types to six, I would have > > to change the underlying model and all the filtering code. Any > > suggestions on how to construct the model such that it will allow for > > ease of expansion in category types yet still not create large index > > tables? I know there is a Category Property but not sure if it really > > provides any specific benefit here. > > > Thanks! > > -e > > -- > Nick Johnson, App Engine Developer Programs Engineer > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: > 368047 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post
[google-appengine] Re: Efficient way to structure my data model
Thanks, Nick. Let me make sure I understand your comment correctly. Suppose I have the following data: ID BlobProp1 BlobProp2-N KeywordsCateg = 123 blahblahtag1,tag2,tag3 Circle, Red, Large, Dotted 345 blahblahtag3,tag4,tag5 Square, Blue, Small, Solid 678 blahblahtag1,tag3,tag4 Circle, Blue, Small, Solid - The field categ (list) contains four different types - Shape, Color, Size and Line Type. Suppose the user wants to retrieve all entities that are Small Dotted Blue Circles then the query will be: Select * From MyModel where categ = "Circle" AND categ = "Small" AND categ = "Blue" AND categ = "Dotted" When I was reading about exploding indexes the example indicated the issue was due to Cartesian product of two list elements. I thought the same will hold true with one list field when used multiple times in a query. Are you saying the above query will not need {Circle, Red, Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } number of index entities for entity ID=123? I was getting index errors when I was using the categ list property four times in my index specification and that's why I was wondering if I should restructure things. so I am guessing the following spec should not cause any index issues in the future? - kind: MyModel properties: - name: categ - name: categ - name: categ - name: categ - name: keywords - name: __key__ # used for paging Thanks, -e On Jun 22, 2:10 am, "Nick Johnson (Google)" wrote: > Hi ecognium, > > If I understand your problem correctly, every entity will have 0-4 entries > in the 'categ' list, corresponding to the values for each of 4 categories > (eg, Color, Size, Shape, etc)? > > The sample query you give, with only equality filters, will be satisfiable > using the merge join query planner, which doesn't require custom indexes, so > you won't have high indexing overhead. There will simply be one index entry > for each item in each list. > > If you do need custom indexes, the number of index entries, isn't 4^4, as > you suggest, but rather smaller. Assuming you want to be able to query with > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes > (depending on if the 0-category case requires its own index), and the total > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = > 63, which is still a not-unreasonable number. > > -Nick Johnson > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium wrote: > > > Hi All, > > > I would like to get your opinion on the best way to structure my > > data model. > > My app allows the users to filter the entities by four category types > > (say A,B,C,D). Each category can have multiple values (for e.g., > > category type A can have values 1,2,3) but the > > user can choose only one value per category for filtering. Please > > note the values are unique across the category types as well. I could > > create four fields corresponding to the four types but it does not > > allow me to expand to more categories later easily. Right now, I just > > use one list field to store the different values as it is easy to add > > more category types later on. > > > My model (simplified) looks like this: > > > class Example(db.Model): > > > categ = db.StringListProperty() > > > keywords = db.StringListProperty() > > > The field keywords will have about 10-20 values for each entity. For > > the above example, categ will have up to 4 values. Since I allow for > > filtering on 4 category types, the index table gets large with > > unnecessary values. The filtering logic looks like: > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > categ = '99' > > > Since there are 4 values in the categ list property, there will be > > 4^4 rows created in the index table (most of them will never be hit > > due to the uniqueness guaranteed by design). Multiply it by the number > > of values in the keywords table, the index table gets large very > > quickly. > > > I would like to avoid creating multiple fields if possible because > > when I want to make the number of category types to six, I would have > > to change the underlying model and all the filtering code. Any > > suggestions on how to construct the model such that it will allow for > > ease of expansion in category types yet still not create large index > > tables? I know there is a Category Property but not sure if it really > > provides any specific benefit here. > > > Thanks! > > -e > > -- > Nick Johnson, App Engine Developer Programs Engineer > Google Ireland Ltd. :: Registered in Dublin, Ireland, R
[google-appengine] Re: Efficient way to structure my data model
Hi ecognium, If I understand your problem correctly, every entity will have 0-4 entries in the 'categ' list, corresponding to the values for each of 4 categories (eg, Color, Size, Shape, etc)? The sample query you give, with only equality filters, will be satisfiable using the merge join query planner, which doesn't require custom indexes, so you won't have high indexing overhead. There will simply be one index entry for each item in each list. If you do need custom indexes, the number of index entries, isn't 4^4, as you suggest, but rather smaller. Assuming you want to be able to query with any number of categories from 0 to 4, you'll need 3 or 4 custom indexes (depending on if the 0-category case requires its own index), and the total number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15. For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 = 63, which is still a not-unreasonable number. -Nick Johnson On Mon, Jun 22, 2009 at 8:51 AM, ecognium wrote: > > Hi All, > >I would like to get your opinion on the best way to structure my > data model. > My app allows the users to filter the entities by four category types > (say A,B,C,D). Each category can have multiple values (for e.g., > category type A can have values 1,2,3) but the > user can choose only one value per category for filtering. Please > note the values are unique across the category types as well. I could > create four fields corresponding to the four types but it does not > allow me to expand to more categories later easily. Right now, I just > use one list field to store the different values as it is easy to add > more category types later on. > > My model (simplified) looks like this: > > > > class Example(db.Model): > >categ= db.StringListProperty() > >keywords = db.StringListProperty() > > > > The field keywords will have about 10-20 values for each entity. For > the above example, categ will have up to 4 values. Since I allow for > filtering on 4 category types, the index table gets large with > unnecessary values. The filtering logic looks like: > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > categ = '99' > > Since there are 4 values in the categ list property, there will be > 4^4 rows created in the index table (most of them will never be hit > due to the uniqueness guaranteed by design). Multiply it by the number > of values in the keywords table, the index table gets large very > quickly. > > I would like to avoid creating multiple fields if possible because > when I want to make the number of category types to six, I would have > to change the underlying model and all the filtering code. Any > suggestions on how to construct the model such that it will allow for > ease of expansion in category types yet still not create large index > tables? I know there is a Category Property but not sure if it really > provides any specific benefit here. > > Thanks! > -e > > > -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---