[google-appengine] Re: Efficient way to structure my data model

2009-06-24 Thread ecognium

Thanks very much! I think I understand indexing with Lists much better
now...
>> In the case of your index with 4 occurrences, this will rather be 6C4.
Yup, I was looking at the max situation when I use 3 categ fields.
Need to be careful as it grows pretty quickly.


On Jun 24, 4:19 am, "Nick Johnson (Google)" 
wrote:
> Hi ecognium,
>
> On Tue, Jun 23, 2009 at 6:11 PM, ecognium  wrote:
>
> > Thanks again - this is very helpful. I will let you know if i run into
> > any future index creation errors as it could have been caused by any
> > number of other entries - i mistakenly thought it was all these categ
> > list-based entries.
>
> > So if i understand it right even with a 10 element list for keywords,
> > there will only be 10 rows when 4 categ fields are used.
>
> Correct - 4C4 * 10C1 index entries for the custom index you specified
> earlier.
>
> > In the event
> > I use  'categ' only once in my query along with keywords field, it
> > will have up to 40 rows (10 from keywords and 4C1 from categ list).
>
> Correct.
>
> > Am
> > I adding these up right?
>
> > I do not see myself going beyond 6 elements in the categ list at this
> > point (I guess the max will be 6C3 = 20 under such a situation).
>
> In the case of your index with 4 occurrences, this will rather be 6C4.
>
>
>
> > The
> > keyword list will be probably go into the 20s but do not see anything
> > beyond that and will always be used only once in the query.
>
> > Thanks,
> > -e
>
> > On Jun 23, 3:53 am, "Nick Johnson (Google)" 
> > wrote:
> > > Hi ecognium,
>
> > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium  wrote:
>
> > > > Thanks, Nick. Let me make sure I understand your comment correctly.
> > > > Suppose I have the following data:
>
> > > > ID      BlobProp1       BlobProp2-N     Keywords
> > > >  Categ
> > > > =
> > > > 123     blah                    blah                    tag1,tag2,tag3
> > > >  Circle,
> > > > Red,  Large, Dotted
> > > > 345     blah                    blah                    tag3,tag4,tag5
> > > > Square, Blue, Small, Solid
> > > > 678     blah                    blah                    tag1,tag3,tag4
> > > > Circle, Blue, Small, Solid
>
> > -
>
> > > > The field categ (list) contains four different types - Shape, Color,
> > > > Size and Line Type. Suppose the user wants to retrieve all entities
> > > > that are Small Dotted Blue Circles then the query will be:
>
> > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND
> > > > categ = "Blue" AND categ = "Dotted"
>
> > > > When I was reading about exploding indexes the example indicated the
> > > > issue was due to Cartesian product of two list elements. I thought the
> > > > same will hold true with one list field when used multiple times in a
> > > > query.
>
> > > That is indeed true, though it's not quite the cartesian product - the
> > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or
> > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique
> > combination,
> > > which is a substantially smaller number than the cartesian product. It's
> > > still only tractable for small lists, though, such as the 4 item lists
> > > you're dealing with.
>
> > > Are you saying the above query will not need {Circle, Red,
>
> > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
> > > > number of index entities for entity ID=123?
>
> > > Correct - if you're not specifying a sort order, you can execute the
> > query
> > > without any composite indexes whatsoever. The datastore satisfies
> > > equality-only queries using a merge join strategy.
>
> > > > I was getting index errors
> > > > when I was using the categ list property four times in my index
> > > > specification and that's why I was wondering if I should restructure
> > > > things.
>
> > > How many items did you have in the list you were indexing in that case?
> > If
> > > your list has 4 items and your index specification lists it 4 times, you
> > > should only get one index entry.
>
> > > so I am guessing the following spec should not cause any index
>
> > > > issues in the future?
>
> > > Again, that depends on the number of entries in the 'categ' list. With 4
> > > entries, this will only generate a single index entry, but the number of
> > > entries will expand exponentially as the list increases in size.
>
> > > -Nick Johnson
>
> > > > - kind: MyModel
> > > >  properties:
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: keywords
> > > >  - name: __key__   # used for paging
>
> > > > Thanks,
> > > > -e
>
> > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" 
> > > > wrote:
> > > > > Hi ecognium,
>
> > > > > If I understand your problem correctly, every entity will have 0-4
> > > > entries
> > > > > in the 'categ' list, corresponding to the 

[google-appengine] Re: Efficient way to structure my data model

2009-06-24 Thread Nick Johnson (Google)
Hi ecognium,

On Tue, Jun 23, 2009 at 6:11 PM, ecognium  wrote:

>
> Thanks again - this is very helpful. I will let you know if i run into
> any future index creation errors as it could have been caused by any
> number of other entries - i mistakenly thought it was all these categ
> list-based entries.
>
> So if i understand it right even with a 10 element list for keywords,
> there will only be 10 rows when 4 categ fields are used.


Correct - 4C4 * 10C1 index entries for the custom index you specified
earlier.


> In the event
> I use  'categ' only once in my query along with keywords field, it
> will have up to 40 rows (10 from keywords and 4C1 from categ list).


Correct.


> Am
> I adding these up right?
>
> I do not see myself going beyond 6 elements in the categ list at this
> point (I guess the max will be 6C3 = 20 under such a situation).


In the case of your index with 4 occurrences, this will rather be 6C4.


> The
> keyword list will be probably go into the 20s but do not see anything
> beyond that and will always be used only once in the query.
>
> Thanks,
> -e
>
> On Jun 23, 3:53 am, "Nick Johnson (Google)" 
> wrote:
> > Hi ecognium,
> >
> >
> >
> > On Tue, Jun 23, 2009 at 1:35 AM, ecognium  wrote:
> >
> > > Thanks, Nick. Let me make sure I understand your comment correctly.
> > > Suppose I have the following data:
> >
> > > ID  BlobProp1   BlobProp2-N Keywords
> > >  Categ
> > > =
> > > 123 blahblahtag1,tag2,tag3
> > >  Circle,
> > > Red,  Large, Dotted
> > > 345 blahblahtag3,tag4,tag5
> > > Square, Blue, Small, Solid
> > > 678 blahblahtag1,tag3,tag4
> > > Circle, Blue, Small, Solid
> >
> > >
> -
> >
> > > The field categ (list) contains four different types - Shape, Color,
> > > Size and Line Type. Suppose the user wants to retrieve all entities
> > > that are Small Dotted Blue Circles then the query will be:
> >
> > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND
> > > categ = "Blue" AND categ = "Dotted"
> >
> > > When I was reading about exploding indexes the example indicated the
> > > issue was due to Cartesian product of two list elements. I thought the
> > > same will hold true with one list field when used multiple times in a
> > > query.
> >
> > That is indeed true, though it's not quite the cartesian product - the
> > datastore won't bother indexing (Circle, Circle, Circle, Circle), or
> > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique
> combination,
> > which is a substantially smaller number than the cartesian product. It's
> > still only tractable for small lists, though, such as the 4 item lists
> > you're dealing with.
> >
> > Are you saying the above query will not need {Circle, Red,
> >
> > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
> > > number of index entities for entity ID=123?
> >
> > Correct - if you're not specifying a sort order, you can execute the
> query
> > without any composite indexes whatsoever. The datastore satisfies
> > equality-only queries using a merge join strategy.
> >
> > > I was getting index errors
> > > when I was using the categ list property four times in my index
> > > specification and that's why I was wondering if I should restructure
> > > things.
> >
> > How many items did you have in the list you were indexing in that case?
> If
> > your list has 4 items and your index specification lists it 4 times, you
> > should only get one index entry.
> >
> > so I am guessing the following spec should not cause any index
> >
> > > issues in the future?
> >
> > Again, that depends on the number of entries in the 'categ' list. With 4
> > entries, this will only generate a single index entry, but the number of
> > entries will expand exponentially as the list increases in size.
> >
> > -Nick Johnson
> >
> >
> >
> >
> >
> > > - kind: MyModel
> > >  properties:
> > >  - name: categ
> > >  - name: categ
> > >  - name: categ
> > >  - name: categ
> > >  - name: keywords
> > >  - name: __key__   # used for paging
> >
> > > Thanks,
> > > -e
> >
> > > On Jun 22, 2:10 am, "Nick Johnson (Google)" 
> > > wrote:
> > > > Hi ecognium,
> >
> > > > If I understand your problem correctly, every entity will have 0-4
> > > entries
> > > > in the 'categ' list, corresponding to the values for each of 4
> categories
> > > > (eg, Color, Size, Shape, etc)?
> >
> > > > The sample query you give, with only equality filters, will be
> > > satisfiable
> > > > using the merge join query planner, which doesn't require custom
> indexes,
> > > so
> > > > you won't have high indexing overhead. There will simply be one index
> > > entry
> > > > for each item in each list.
> >
> > > > If you do need custom indexes, the number of

[google-appengine] Re: Efficient way to structure my data model

2009-06-23 Thread ecognium

Thanks again - this is very helpful. I will let you know if i run into
any future index creation errors as it could have been caused by any
number of other entries - i mistakenly thought it was all these categ
list-based entries.

So if i understand it right even with a 10 element list for keywords,
there will only be 10 rows when 4 categ fields are used. In the event
I use  'categ' only once in my query along with keywords field, it
will have up to 40 rows (10 from keywords and 4C1 from categ list). Am
I adding these up right?

I do not see myself going beyond 6 elements in the categ list at this
point (I guess the max will be 6C3 = 20 under such a situation). The
keyword list will be probably go into the 20s but do not see anything
beyond that and will always be used only once in the query.

Thanks,
-e

On Jun 23, 3:53 am, "Nick Johnson (Google)" 
wrote:
> Hi ecognium,
>
>
>
> On Tue, Jun 23, 2009 at 1:35 AM, ecognium  wrote:
>
> > Thanks, Nick. Let me make sure I understand your comment correctly.
> > Suppose I have the following data:
>
> > ID      BlobProp1       BlobProp2-N     Keywords
> >  Categ
> > =
> > 123     blah                    blah                    tag1,tag2,tag3
> >  Circle,
> > Red,  Large, Dotted
> > 345     blah                    blah                    tag3,tag4,tag5
> > Square, Blue, Small, Solid
> > 678     blah                    blah                    tag1,tag3,tag4
> > Circle, Blue, Small, Solid
>
> > -
>
> > The field categ (list) contains four different types - Shape, Color,
> > Size and Line Type. Suppose the user wants to retrieve all entities
> > that are Small Dotted Blue Circles then the query will be:
>
> > Select * From MyModel where categ = "Circle" AND categ = "Small" AND
> > categ = "Blue" AND categ = "Dotted"
>
> > When I was reading about exploding indexes the example indicated the
> > issue was due to Cartesian product of two list elements. I thought the
> > same will hold true with one list field when used multiple times in a
> > query.
>
> That is indeed true, though it's not quite the cartesian product - the
> datastore won't bother indexing (Circle, Circle, Circle, Circle), or
> (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination,
> which is a substantially smaller number than the cartesian product. It's
> still only tractable for small lists, though, such as the 4 item lists
> you're dealing with.
>
> Are you saying the above query will not need {Circle, Red,
>
> > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
> > number of index entities for entity ID=123?
>
> Correct - if you're not specifying a sort order, you can execute the query
> without any composite indexes whatsoever. The datastore satisfies
> equality-only queries using a merge join strategy.
>
> > I was getting index errors
> > when I was using the categ list property four times in my index
> > specification and that's why I was wondering if I should restructure
> > things.
>
> How many items did you have in the list you were indexing in that case? If
> your list has 4 items and your index specification lists it 4 times, you
> should only get one index entry.
>
> so I am guessing the following spec should not cause any index
>
> > issues in the future?
>
> Again, that depends on the number of entries in the 'categ' list. With 4
> entries, this will only generate a single index entry, but the number of
> entries will expand exponentially as the list increases in size.
>
> -Nick Johnson
>
>
>
>
>
> > - kind: MyModel
> >  properties:
> >  - name: categ
> >  - name: categ
> >  - name: categ
> >  - name: categ
> >  - name: keywords
> >  - name: __key__   # used for paging
>
> > Thanks,
> > -e
>
> > On Jun 22, 2:10 am, "Nick Johnson (Google)" 
> > wrote:
> > > Hi ecognium,
>
> > > If I understand your problem correctly, every entity will have 0-4
> > entries
> > > in the 'categ' list, corresponding to the values for each of 4 categories
> > > (eg, Color, Size, Shape, etc)?
>
> > > The sample query you give, with only equality filters, will be
> > satisfiable
> > > using the merge join query planner, which doesn't require custom indexes,
> > so
> > > you won't have high indexing overhead. There will simply be one index
> > entry
> > > for each item in each list.
>
> > > If you do need custom indexes, the number of index entries, isn't 4^4, as
> > > you suggest, but rather smaller. Assuming you want to be able to query
> > with
> > > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
> > > (depending on if the 0-category case requires its own index), and the
> > total
> > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 =
> > 15.
> > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1
> > =
> > > 63, which is still a not-unreasonable num

[google-appengine] Re: Efficient way to structure my data model

2009-06-23 Thread Nick Johnson (Google)
Hi ecognium,

On Tue, Jun 23, 2009 at 1:35 AM, ecognium  wrote:

>
> Thanks, Nick. Let me make sure I understand your comment correctly.
> Suppose I have the following data:
>
> ID  BlobProp1   BlobProp2-N Keywords
>  Categ
> =
> 123 blahblahtag1,tag2,tag3
>  Circle,
> Red,  Large, Dotted
> 345 blahblahtag3,tag4,tag5
> Square, Blue, Small, Solid
> 678 blahblahtag1,tag3,tag4
> Circle, Blue, Small, Solid
>
> -
>
> The field categ (list) contains four different types - Shape, Color,
> Size and Line Type. Suppose the user wants to retrieve all entities
> that are Small Dotted Blue Circles then the query will be:
>
> Select * From MyModel where categ = "Circle" AND categ = "Small" AND
> categ = "Blue" AND categ = "Dotted"
>
> When I was reading about exploding indexes the example indicated the
> issue was due to Cartesian product of two list elements. I thought the
> same will hold true with one list field when used multiple times in a
> query.


That is indeed true, though it's not quite the cartesian product - the
datastore won't bother indexing (Circle, Circle, Circle, Circle), or
(Dotted, Dotted, Dotted, Dotted) - it only indexes every unique combination,
which is a substantially smaller number than the cartesian product. It's
still only tractable for small lists, though, such as the 4 item lists
you're dealing with.

Are you saying the above query will not need {Circle, Red,
> Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
> number of index entities for entity ID=123?


Correct - if you're not specifying a sort order, you can execute the query
without any composite indexes whatsoever. The datastore satisfies
equality-only queries using a merge join strategy.


> I was getting index errors
> when I was using the categ list property four times in my index
> specification and that's why I was wondering if I should restructure
> things.


How many items did you have in the list you were indexing in that case? If
your list has 4 items and your index specification lists it 4 times, you
should only get one index entry.

so I am guessing the following spec should not cause any index
> issues in the future?


Again, that depends on the number of entries in the 'categ' list. With 4
entries, this will only generate a single index entry, but the number of
entries will expand exponentially as the list increases in size.

-Nick Johnson


>
> - kind: MyModel
>  properties:
>  - name: categ
>  - name: categ
>  - name: categ
>  - name: categ
>  - name: keywords
>  - name: __key__   # used for paging
>
> Thanks,
> -e
>
>
> On Jun 22, 2:10 am, "Nick Johnson (Google)" 
> wrote:
> > Hi ecognium,
> >
> > If I understand your problem correctly, every entity will have 0-4
> entries
> > in the 'categ' list, corresponding to the values for each of 4 categories
> > (eg, Color, Size, Shape, etc)?
> >
> > The sample query you give, with only equality filters, will be
> satisfiable
> > using the merge join query planner, which doesn't require custom indexes,
> so
> > you won't have high indexing overhead. There will simply be one index
> entry
> > for each item in each list.
> >
> > If you do need custom indexes, the number of index entries, isn't 4^4, as
> > you suggest, but rather smaller. Assuming you want to be able to query
> with
> > any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
> > (depending on if the 0-category case requires its own index), and the
> total
> > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 =
> 15.
> > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1
> =
> > 63, which is still a not-unreasonable number.
> >
> > -Nick Johnson
> >
> >
> >
> > On Mon, Jun 22, 2009 at 8:51 AM, ecognium  wrote:
> >
> > > Hi All,
> >
> > >I would like to get your opinion on the best way to structure my
> > > data model.
> > > My app allows the users to filter the entities by four category types
> > > (say A,B,C,D). Each category can have multiple values (for e.g.,
> > > category type A can have values 1,2,3) but the
> > > user can  choose only one value per category for filtering.  Please
> > > note the values are unique across the category types as well. I could
> > > create four fields corresponding to the four types but it does not
> > > allow me to expand to more categories later easily. Right now, I just
> > > use one list field to store the different values as it is easy to add
> > > more category types later on.
> >
> > > My model (simplified) looks like this:
> >
> > > class Example(db.Model):
> >
> > >categ= db.StringListProperty()
> >
> > >keywords = db.StringListProperty()
> >
> > > The field keywords will have about 

[google-appengine] Re: Efficient way to structure my data model

2009-06-22 Thread ecognium

Thanks, Nick. Let me make sure I understand your comment correctly.
Suppose I have the following data:

ID  Blob1 Blob2-N Keywords  Categ

123 blah  blah  tag1,tag2,tag3  Circle,Red,  Large, Dotted
345 blah  blah  tag3,tag4,tag5  Square, Blue, Small, Solid
678 blah  blah  tag1,tag3,tag4  Circle, Blue, Small, Solid
--

The field categ (list) contains four different types - Shape, Color,
Size and Line Type. Suppose the user wants to retrieve all entities
that are Small Dotted Blue Circles then the query will be:

Select * From MyModel where categ = "Circle" AND categ = "Small" AND
categ = "Blue" AND categ = "Dotted"

When I was reading about exploding indexes the example indicated the
issue was due to Cartesian product of two list elements. I thought the
same will hold true with one list field when used multiple times in a
query. Are you saying the above query will not need {Circle, Red,
Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
number of index entities for entity ID=123? I was getting index errors
when I was using the categ list property four times in my index
specification and that's why I was wondering if I should restructure
things. so I am guessing the following spec should not cause any index
issues in the future?

- kind: MyModel
  properties:
  - name: categ
  - name: categ
  - name: categ
  - name: categ
  - name: keywords
  - name: __key__   # used for paging

Thanks,
-e

On Jun 22, 2:10 am, "Nick Johnson (Google)" 
wrote:
> Hi ecognium,
>
> If I understand your problem correctly, every entity will have 0-4 entries
> in the 'categ' list, corresponding to the values for each of 4 categories
> (eg, Color, Size, Shape, etc)?
>
> The sample query you give, with only equality filters, will be satisfiable
> using the merge join query planner, which doesn't require custom indexes, so
> you won't have high indexing overhead. There will simply be one index entry
> for each item in each list.
>
> If you do need custom indexes, the number of index entries, isn't 4^4, as
> you suggest, but rather smaller. Assuming you want to be able to query with
> any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
> (depending on if the 0-category case requires its own index), and the total
> number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15.
> For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 =
> 63, which is still a not-unreasonable number.
>
> -Nick Johnson
>
>
>
> On Mon, Jun 22, 2009 at 8:51 AM, ecognium  wrote:
>
> > Hi All,
>
> >    I would like to get your opinion on the best way to structure my
> > data model.
> > My app allows the users to filter the entities by four category types
> > (say A,B,C,D). Each category can have multiple values (for e.g.,
> > category type A can have values 1,2,3) but the
> > user can  choose only one value per category for filtering.  Please
> > note the values are unique across the category types as well. I could
> > create four fields corresponding to the four types but it does not
> > allow me to expand to more categories later easily. Right now, I just
> > use one list field to store the different values as it is easy to add
> > more category types later on.
>
> > My model (simplified) looks like this:
>
> > class Example(db.Model):
>
> >    categ        = db.StringListProperty()
>
> >    keywords = db.StringListProperty()
>
> > The field keywords will have about 10-20 values for each entity. For
> > the above example, categ will have up to 4 values. Since I allow for
> > filtering on 4 category types, the index table gets large with
> > unnecessary values. The filtering logic looks like:
> > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
> > categ = '99'
>
> >  Since there are 4 values in the categ list property, there will be
> > 4^4 rows created in the index table (most of them will never be hit
> > due to the uniqueness guaranteed by design). Multiply it by the number
> > of values in the keywords table, the index table gets large very
> > quickly.
>
> > I would like to avoid creating multiple fields if possible because
> > when I want to make the number of category types to six, I would have
> > to change the underlying model and all the filtering code. Any
> > suggestions on how to construct the model such that it will allow for
> > ease of expansion in category types yet still not create large index
> > tables? I know there is a Category Property but not sure if it really
> > provides any specific benefit here.
>
> > Thanks!
> > -e
>
> --
> Nick Johnson, App Engine Developer Programs Engineer
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post

[google-appengine] Re: Efficient way to structure my data model

2009-06-22 Thread ecognium

Thanks, Nick. Let me make sure I understand your comment correctly.
Suppose I have the following data:

ID  BlobProp1   BlobProp2-N KeywordsCateg
=
123 blahblahtag1,tag2,tag3  Circle,
Red,  Large, Dotted
345 blahblahtag3,tag4,tag5
Square, Blue, Small, Solid
678 blahblahtag1,tag3,tag4
Circle, Blue, Small, Solid
-

The field categ (list) contains four different types - Shape, Color,
Size and Line Type. Suppose the user wants to retrieve all entities
that are Small Dotted Blue Circles then the query will be:

Select * From MyModel where categ = "Circle" AND categ = "Small" AND
categ = "Blue" AND categ = "Dotted"

When I was reading about exploding indexes the example indicated the
issue was due to Cartesian product of two list elements. I thought the
same will hold true with one list field when used multiple times in a
query. Are you saying the above query will not need {Circle, Red,
Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
number of index entities for entity ID=123? I was getting index errors
when I was using the categ list property four times in my index
specification and that's why I was wondering if I should restructure
things. so I am guessing the following spec should not cause any index
issues in the future?

- kind: MyModel
  properties:
  - name: categ
  - name: categ
  - name: categ
  - name: categ
  - name: keywords
  - name: __key__   # used for paging

Thanks,
-e


On Jun 22, 2:10 am, "Nick Johnson (Google)" 
wrote:
> Hi ecognium,
>
> If I understand your problem correctly, every entity will have 0-4 entries
> in the 'categ' list, corresponding to the values for each of 4 categories
> (eg, Color, Size, Shape, etc)?
>
> The sample query you give, with only equality filters, will be satisfiable
> using the merge join query planner, which doesn't require custom indexes, so
> you won't have high indexing overhead. There will simply be one index entry
> for each item in each list.
>
> If you do need custom indexes, the number of index entries, isn't 4^4, as
> you suggest, but rather smaller. Assuming you want to be able to query with
> any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
> (depending on if the 0-category case requires its own index), and the total
> number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15.
> For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 =
> 63, which is still a not-unreasonable number.
>
> -Nick Johnson
>
>
>
> On Mon, Jun 22, 2009 at 8:51 AM, ecognium  wrote:
>
> > Hi All,
>
> >    I would like to get your opinion on the best way to structure my
> > data model.
> > My app allows the users to filter the entities by four category types
> > (say A,B,C,D). Each category can have multiple values (for e.g.,
> > category type A can have values 1,2,3) but the
> > user can  choose only one value per category for filtering.  Please
> > note the values are unique across the category types as well. I could
> > create four fields corresponding to the four types but it does not
> > allow me to expand to more categories later easily. Right now, I just
> > use one list field to store the different values as it is easy to add
> > more category types later on.
>
> > My model (simplified) looks like this:
>
> > class Example(db.Model):
>
> >    categ        = db.StringListProperty()
>
> >    keywords = db.StringListProperty()
>
> > The field keywords will have about 10-20 values for each entity. For
> > the above example, categ will have up to 4 values. Since I allow for
> > filtering on 4 category types, the index table gets large with
> > unnecessary values. The filtering logic looks like:
> > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
> > categ = '99'
>
> >  Since there are 4 values in the categ list property, there will be
> > 4^4 rows created in the index table (most of them will never be hit
> > due to the uniqueness guaranteed by design). Multiply it by the number
> > of values in the keywords table, the index table gets large very
> > quickly.
>
> > I would like to avoid creating multiple fields if possible because
> > when I want to make the number of category types to six, I would have
> > to change the underlying model and all the filtering code. Any
> > suggestions on how to construct the model such that it will allow for
> > ease of expansion in category types yet still not create large index
> > tables? I know there is a Category Property but not sure if it really
> > provides any specific benefit here.
>
> > Thanks!
> > -e
>
> --
> Nick Johnson, App Engine Developer Programs Engineer
> Google Ireland Ltd. :: Registered in Dublin, Ireland, R

[google-appengine] Re: Efficient way to structure my data model

2009-06-22 Thread Nick Johnson (Google)
Hi ecognium,

If I understand your problem correctly, every entity will have 0-4 entries
in the 'categ' list, corresponding to the values for each of 4 categories
(eg, Color, Size, Shape, etc)?

The sample query you give, with only equality filters, will be satisfiable
using the merge join query planner, which doesn't require custom indexes, so
you won't have high indexing overhead. There will simply be one index entry
for each item in each list.

If you do need custom indexes, the number of index entries, isn't 4^4, as
you suggest, but rather smaller. Assuming you want to be able to query with
any number of categories from 0 to 4, you'll need 3 or 4 custom indexes
(depending on if the 0-category case requires its own index), and the total
number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 = 15.
For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 + 1 =
63, which is still a not-unreasonable number.

-Nick Johnson

On Mon, Jun 22, 2009 at 8:51 AM, ecognium  wrote:

>
> Hi All,
>
>I would like to get your opinion on the best way to structure my
> data model.
> My app allows the users to filter the entities by four category types
> (say A,B,C,D). Each category can have multiple values (for e.g.,
> category type A can have values 1,2,3) but the
> user can  choose only one value per category for filtering.  Please
> note the values are unique across the category types as well. I could
> create four fields corresponding to the four types but it does not
> allow me to expand to more categories later easily. Right now, I just
> use one list field to store the different values as it is easy to add
> more category types later on.
>
> My model (simplified) looks like this:
>
>
>
> class Example(db.Model):
>
>categ= db.StringListProperty()
>
>keywords = db.StringListProperty()
>
>
>
> The field keywords will have about 10-20 values for each entity. For
> the above example, categ will have up to 4 values. Since I allow for
> filtering on 4 category types, the index table gets large with
> unnecessary values. The filtering logic looks like:
> keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
> categ = '99'
>
>  Since there are 4 values in the categ list property, there will be
> 4^4 rows created in the index table (most of them will never be hit
> due to the uniqueness guaranteed by design). Multiply it by the number
> of values in the keywords table, the index table gets large very
> quickly.
>
> I would like to avoid creating multiple fields if possible because
> when I want to make the number of category types to six, I would have
> to change the underlying model and all the filtering code. Any
> suggestions on how to construct the model such that it will allow for
> ease of expansion in category types yet still not create large index
> tables? I know there is a Category Property but not sure if it really
> provides any specific benefit here.
>
> Thanks!
> -e
> >
>


-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---