Thanks very much! I think I understand indexing with Lists much better
now...
>> In the case of your index with 4 occurrences, this will rather be 6C4.
Yup, I was looking at the max situation when I use 3 categ fields.
Need to be careful as it grows pretty quickly.


On Jun 24, 4:19 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi ecognium,
>
> On Tue, Jun 23, 2009 at 6:11 PM, ecognium <ecogn...@gmail.com> wrote:
>
> > Thanks again - this is very helpful. I will let you know if i run into
> > any future index creation errors as it could have been caused by any
> > number of other entries - i mistakenly thought it was all these categ
> > list-based entries.
>
> > So if i understand it right even with a 10 element list for keywords,
> > there will only be 10 rows when 4 categ fields are used.
>
> Correct - 4C4 * 10C1 index entries for the custom index you specified
> earlier.
>
> > In the event
> > I use  'categ' only once in my query along with keywords field, it
> > will have up to 40 rows (10 from keywords and 4C1 from categ list).
>
> Correct.
>
> > Am
> > I adding these up right?
>
> > I do not see myself going beyond 6 elements in the categ list at this
> > point (I guess the max will be 6C3 = 20 under such a situation).
>
> In the case of your index with 4 occurrences, this will rather be 6C4.
>
>
>
> > The
> > keyword list will be probably go into the 20s but do not see anything
> > beyond that and will always be used only once in the query.
>
> > Thanks,
> > -e
>
> > On Jun 23, 3:53 am, "Nick Johnson (Google)" <nick.john...@google.com>
> > wrote:
> > > Hi ecognium,
>
> > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium <ecogn...@gmail.com> wrote:
>
> > > > Thanks, Nick. Let me make sure I understand your comment correctly.
> > > > Suppose I have the following data:
>
> > > > ID      BlobProp1       BlobProp2-N     Keywords
> > > >  Categ
> > > > =================================================
> > > > 123     blah                    blah                    tag1,tag2,tag3
> > > >  Circle,
> > > > Red,  Large, Dotted
> > > > 345     blah                    blah                    tag3,tag4,tag5
> > > > Square, Blue, Small, Solid
> > > > 678     blah                    blah                    tag1,tag3,tag4
> > > > Circle, Blue, Small, Solid
>
> > -------------------------------------------------------------------------------------------------------------
>
> > > > The field categ (list) contains four different types - Shape, Color,
> > > > Size and Line Type. Suppose the user wants to retrieve all entities
> > > > that are Small Dotted Blue Circles then the query will be:
>
> > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND
> > > > categ = "Blue" AND categ = "Dotted"
>
> > > > When I was reading about exploding indexes the example indicated the
> > > > issue was due to Cartesian product of two list elements. I thought the
> > > > same will hold true with one list field when used multiple times in a
> > > > query.
>
> > > That is indeed true, though it's not quite the cartesian product - the
> > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or
> > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique
> > combination,
> > > which is a substantially smaller number than the cartesian product. It's
> > > still only tractable for small lists, though, such as the 4 item lists
> > > you're dealing with.
>
> > > Are you saying the above query will not need {Circle, Red,
>
> > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , }
> > > > number of index entities for entity ID=123?
>
> > > Correct - if you're not specifying a sort order, you can execute the
> > query
> > > without any composite indexes whatsoever. The datastore satisfies
> > > equality-only queries using a merge join strategy.
>
> > > > I was getting index errors
> > > > when I was using the categ list property four times in my index
> > > > specification and that's why I was wondering if I should restructure
> > > > things.
>
> > > How many items did you have in the list you were indexing in that case?
> > If
> > > your list has 4 items and your index specification lists it 4 times, you
> > > should only get one index entry.
>
> > > so I am guessing the following spec should not cause any index
>
> > > > issues in the future?
>
> > > Again, that depends on the number of entries in the 'categ' list. With 4
> > > entries, this will only generate a single index entry, but the number of
> > > entries will expand exponentially as the list increases in size.
>
> > > -Nick Johnson
>
> > > > - kind: MyModel
> > > >  properties:
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: categ
> > > >  - name: keywords
> > > >  - name: __key__   # used for paging
>
> > > > Thanks,
> > > > -e
>
> > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" <nick.john...@google.com>
> > > > wrote:
> > > > > Hi ecognium,
>
> > > > > If I understand your problem correctly, every entity will have 0-4
> > > > entries
> > > > > in the 'categ' list, corresponding to the values for each of 4
> > categories
> > > > > (eg, Color, Size, Shape, etc)?
>
> > > > > The sample query you give, with only equality filters, will be
> > > > satisfiable
> > > > > using the merge join query planner, which doesn't require custom
> > indexes,
> > > > so
> > > > > you won't have high indexing overhead. There will simply be one index
> > > > entry
> > > > > for each item in each list.
>
> > > > > If you do need custom indexes, the number of index entries, isn't
> > 4^4, as
> > > > > you suggest, but rather smaller. Assuming you want to be able to
> > query
> > > > with
> > > > > any number of categories from 0 to 4, you'll need 3 or 4 custom
> > indexes
> > > > > (depending on if the 0-category case requires its own index), and the
> > > > total
> > > > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1
> > =
> > > > 15.
> > > > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6
> > + 1
> > > > =
> > > > > 63, which is still a not-unreasonable number.
>
> > > > > -Nick Johnson
>
> > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium <ecogn...@gmail.com>
> > wrote:
>
> > > > > > Hi All,
>
> > > > > >    I would like to get your opinion on the best way to structure my
> > > > > > data model.
> > > > > > My app allows the users to filter the entities by four category
> > types
> > > > > > (say A,B,C,D). Each category can have multiple values (for e.g.,
> > > > > > category type A can have values 1,2,3) but the
> > > > > > user can  choose only one value per category for filtering.  Please
> > > > > > note the values are unique across the category types as well. I
> > could
> > > > > > create four fields corresponding to the four types but it does not
> > > > > > allow me to expand to more categories later easily. Right now, I
> > just
> > > > > > use one list field to store the different values as it is easy to
> > add
> > > > > > more category types later on.
>
> > > > > > My model (simplified) looks like this:
>
> > > > > > class Example(db.Model):
>
> > > > > >    categ        = db.StringListProperty()
>
> > > > > >    keywords = db.StringListProperty()
>
> > > > > > The field keywords will have about 10-20 values for each entity.
> > For
> > > > > > the above example, categ will have up to 4 values. Since I allow
> > for
> > > > > > filtering on 4 category types, the index table gets large with
> > > > > > unnecessary values. The filtering logic looks like:
> > > > > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND
> > > > > > categ = '99'
>
> > > > > >  Since there are 4 values in the categ list property, there will be
> > > > > > 4^4 rows created in the index table (most of them will never be hit
> > > > > > due to the uniqueness guaranteed by design). Multiply it by the
> > number
> > > > > > of values in the keywords table, the index table gets large very
> > > > > > quickly.
>
> > > > > > I would like to avoid creating multiple fields if possible because
> > > > > > when I want to make the number of category types to six, I would
> > have
> > > > > > to change the underlying model and all the filtering code. Any
> > > > > > suggestions on how to construct the model such that it will allow
> > for
> > > > > > ease of expansion in category types yet still not create large
> > index
> > > > > > tables? I know there is a Category Property but not sure if it
> > really
> > > > > > provides any specific benefit here.
>
> > > > > > Thanks!
> > > > > > -e
>
> > > > > --
> > > > > Nick Johnson, App Engine Developer Programs Engineer
> > > > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > > > Number:
> > > > > 368047
>
> > > --
> > > Nick Johnson, App Engine Developer Programs Engineer
> > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > Number:
> > > 368047
>
> --
> Nick Johnson, App Engine Developer Programs Engineer
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to