Re: [nhibernate-development] implementing FetchMode.SubSelect per query, and improving it

Fabio Maulo Wed, 01 Sep 2010 11:12:47 -0700

Yes you can, but I think you are introducing a breaking change if you are
going for:
where c.FooId in (p1, p2, p3, p4..... pn)
Subselect does not suffer the 2100 parameters limit, with your patch it will
have this new limitation.



On Wed, Sep 1, 2010 at 2:56 PM, nadav s <[email protected]> wrote:

> before starting to work on the batch size thing, i want to apply a patch
> for making subselect more sain because that work is already done, the
> problem is, in the old way, if  the first query used paging, the sub select
> ignored the paging, fetching all the children of the fathers of the first
> query, if it hadn't used paging.
>
> there is a test called SubselectFetchWithLimit which creates 3 parents,
> fetches only 2 parents, initiliaze their collections, then fetches the third
> parent and expects its collection to already by initialized (although it
> wasn't returned by the paged query).
>
> my improvement breaks this test, because paging is now taking into
> consideration when fetching by ids, which i think is the much more correct
> way to go.
>
> so the assert with the comment
> // The test for True is the test of H3.2
> now breaks. can i change the test intentially so that subselect will
> consider paging?
>
>
> On Wed, Sep 1, 2010 at 8:38 PM, nadav s <[email protected]> wrote:
>
>> of course i'll be doing the same work as batch size, i'm not set to
>> implement batch size all over again, but trying to allow it to be query
>> specific, meaning, being able to look for owners of a specific query, and
>> not all owners that are in the session (owners from different query might be
>> there), and allowing it to be overrideable
>>
>>
>> On Wed, Sep 1, 2010 at 8:09 PM, Fabio Maulo <[email protected]> wrote:
>>
>>> ah...
>>> take care with "using IDs is more efficient" because "subselect" does not
>>> suffer the problem of max-parameter (IIRC 2100 in msSQL)
>>>
>>>
>>> On Wed, Sep 1, 2010 at 2:03 PM, nadav s <[email protected]> wrote:
>>>
>>>> great. thanks
>>>>
>>>>
>>>> On Wed, Sep 1, 2010 at 7:56 PM, nadav s <[email protected]> wrote:
>>>>
>>>>> you know the internal of nhibernate much much much better than me, and
>>>>> i won't get into an implementation argue with you, but it is possible to
>>>>> implement.
>>>>>
>>>>> with subselect (again i'm talking about subselect because i didn't do
>>>>> any research on the batch size, but i guess the idea is similar because it
>>>>> works the same, only batch size issues a good query and subselect issues 
>>>>> an
>>>>> evil one), as i've noticed, there is a special one-to-many collection
>>>>> persister, that knows once the collection is accessed, use a sub select
>>>>> batcher that loads the collections of all the owners that were returned by
>>>>> the initial query.
>>>>>
>>>>> if the persister could have been set, or modified, for a specific
>>>>> instance of a collection, it would have been possible - you could have set
>>>>> the batch size\subselect for a specific query, which in turn would have 
>>>>> set
>>>>> a different persister for the collections that their persisters needs
>>>>> modification, and then when a collection would have been accessed, the
>>>>> persister would have done its thing.
>>>>>
>>>>> of course, i'm not sure thats the proper way of implementing it, but as
>>>>> an idea - tell the specific collections that are created for the entities 
>>>>> of
>>>>> a specific query to do something else than the default, it is possible
>>>>>
>>>>>
>>>>> On Wed, Sep 1, 2010 at 7:49 PM, John Davidson <[email protected]>wrote:
>>>>>
>>>>>> I think nadav is saying that subselect from NHibernate is an issue,
>>>>>> but the implementation he is proposing will fix that problem
>>>>>>
>>>>>> John Davidson
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 1, 2010 at 12:46 PM, Fabio Maulo <[email protected]>wrote:
>>>>>>
>>>>>>> LOL!!
>>>>>>> Your first assertion : "btw, i don't really get what is the problem
>>>>>>> with subselect"
>>>>>>> Your second assertion : "the sub select is always inefficient"
>>>>>>>
>>>>>>> On Wed, Sep 1, 2010 at 1:42 PM, nadav s <[email protected]> wrote:
>>>>>>>
>>>>>>>> the sub select is always inefficient, especially when there is an
>>>>>>>> initial complex query (with sub queries in it), and its a killer when 
>>>>>>>> its a
>>>>>>>> two level tree (when fetching the grandchildren). fixing it was really
>>>>>>>> really easy, and i can't see any downside to it.
>>>>>>>>
>>>>>>>> different use cases in a web app:
>>>>>>>>
>>>>>>>> use case 1: sub select\batch size is NOT desired
>>>>>>>>
>>>>>>>>    the user searches for car companies by some criteria. the user
>>>>>>>> will then choose (double click on a grid's row or something) one of the
>>>>>>>>    companies to see it in full details. each company has one-to-many
>>>>>>>> car types (mazda -> mazda 3, mazda 5, mazda 6...) and each
>>>>>>>>    car type will be displayed in its own tab, when at first, the
>>>>>>>> newest car type or the most expensive one, doesn't matter is selected.
>>>>>>>>    each car type has its models, mazda3 2008 isn't the same as 2010
>>>>>>>> (i don't that much about cars and not sure the years are correct,
>>>>>>>>    but there are differences between the models).
>>>>>>>>
>>>>>>>>    the result: if carType.Models is mapped with some batch size, say
>>>>>>>> 10, the models of 10 of the car types are now fetched, although
>>>>>>>>    the user only watches the models of one of the car types, if
>>>>>>>> there could be lots of models for each car type, it slowed the first 
>>>>>>>> tab,
>>>>>>>>    and made the other tabs faster, because their car types are now
>>>>>>>> loaded, but its not what is desired, because the user is expected to
>>>>>>>>    click on only one of other tabs or something.
>>>>>>>>
>>>>>>>>  use case 2: desired:
>>>>>>>>
>>>>>>>>     the user wanna see some custom developed report (ui that can be
>>>>>>>> implemented with MRS/Cognus or any other reporting framework,
>>>>>>>>     and we have all kinds of reports that live up to this
>>>>>>>> definition, and for some good reasons also). for the report the user
>>>>>>>> searches for
>>>>>>>>     car companies by some criteria (some search form) and then
>>>>>>>> expects to see the returned companies, paged of course, but with all
>>>>>>>>     of their car types, and for each of the car type - all of its
>>>>>>>> models. here, a sub select or batch fetching is a must or else we'll 
>>>>>>>> get a
>>>>>>>> CP
>>>>>>>>     with join fetching, or N^2 + 1 if we do regular lazy loading
>>>>>>>> (like we wanted to do in the first situation).
>>>>>>>>
>>>>>>>> of course we can work around that, and thats exactly what we do,
>>>>>>>> using a generic mechanizm that for reports, eager fetches with sub 
>>>>>>>> selects
>>>>>>>> and not joins, the association it was asked to fetch. for the regular
>>>>>>>> queries, it just use the default which is regular lazy.
>>>>>>>>
>>>>>>>> it would have been really really nice, if i could have set, for the
>>>>>>>> report query, query.SetFetchMode("CarTypes", FetchMode.SubSelect)
>>>>>>>> or if you will, query.SetBatchSize("CarTypes", 20)
>>>>>>>> and same for models
>>>>>>>> query.SetFetchMode("CarTypes.Models", FetchMode.SubSelect) or
>>>>>>>> query.SetBatchSize("CarTypes.Models", int.MaxValue).
>>>>>>>>
>>>>>>>> it must be max value because i want all the models, and can't
>>>>>>>> possibly know how many car types are going to be there. of course it 
>>>>>>>> won't
>>>>>>>> be alot, because the "query" is going to use paging, but i don't 
>>>>>>>> really know
>>>>>>>> if its 20, 40, or something else.
>>>>>>>>
>>>>>>>> batch size, currently makes me choose between the use cases, slowing
>>>>>>>> down one of them, or makes me query and connect the associations my 
>>>>>>>> self.
>>>>>>>> same goes for sub select, which also issues an inefficient query for
>>>>>>>> CarTypes and a killer query for the Models
>>>>>>>> before my fix it would have been:
>>>>>>>> select ...
>>>>>>>> from Models m
>>>>>>>> where m.CarTypeId in
>>>>>>>>    (select c.Id
>>>>>>>>     from CarTypes c
>>>>>>>>     where c.CompanyId in
>>>>>>>>             (select company.Id
>>>>>>>>              from Companies company
>>>>>>>>              where <could be some crazy crteria - this is the same
>>>>>>>> where clause of the very original query>))
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (i was able to make itthe inefficiency of the query
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 1, 2010 at 6:58 PM, Fabio Maulo 
>>>>>>>> <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> I don't know which is the problem... you said that there is a
>>>>>>>>> problem and you want change it using the same tech used by batch-size 
>>>>>>>>> (using
>>>>>>>>> uploaded ids) because subselect seems inefficient in some cases.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 1, 2010 at 12:48 PM, nadav s <[email protected]>wrote:
>>>>>>>>>
>>>>>>>>>> btw, i don't really get what is the problem with subselect, as it
>>>>>>>>>> lets you efficiently fetch a whole object graph for the N fathers 
>>>>>>>>>> that were
>>>>>>>>>> fetched in some query, in the most efficient way possible
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 1, 2010 at 6:46 PM, nadav s <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>> i don't think its thats low priority, because it is actually a
>>>>>>>>>>> thing people expect to happen when they set a fetch mode to Eager, 
>>>>>>>>>>> at least
>>>>>>>>>>> i've seen alot of situations when people really thought that thats 
>>>>>>>>>>> whats
>>>>>>>>>>> going to happen  (later finding out it killed their query with CP)
>>>>>>>>>>>
>>>>>>>>>>> about when it is helpful - exactly in the situations diego
>>>>>>>>>>> described. two use cases,
>>>>>>>>>>> in one of them you query the fathers and gonna need only one of
>>>>>>>>>>> the father's collection, and for the other
>>>>>>>>>>> you're gonna need all of their collections.
>>>>>>>>>>> it gets more complicated when there are grandchildren involved,
>>>>>>>>>>> and in one of the situations you want the grand children of one of 
>>>>>>>>>>> the
>>>>>>>>>>> childs, and in the other situation, because you load an object 
>>>>>>>>>>> graph, you're
>>>>>>>>>>> gonna need all of them.
>>>>>>>>>>>
>>>>>>>>>>> now, either you implement (similar to what diego said) the
>>>>>>>>>>> loading of the collections yourself, or you gonna have to live with 
>>>>>>>>>>> the
>>>>>>>>>>> batch size slowing down the first situation, where you would have 
>>>>>>>>>>> prefered
>>>>>>>>>>> lazy loading without batching
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 1, 2010 at 5:22 PM, Diego Mijelshon <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I have entities where batch loading helps in some use cases but
>>>>>>>>>>>> it loads lots of unneeded entities/collections in other complex 
>>>>>>>>>>>> use cases,
>>>>>>>>>>>> where I have many proxies but only use a few.
>>>>>>>>>>>> My current workaround is doing "manual batch loading" (i.e.
>>>>>>>>>>>> dummy query) in the cases where I need it.
>>>>>>>>>>>>
>>>>>>>>>>>> It would be definitely a low-priority but nice-to-have feature.
>>>>>>>>>>>>
>>>>>>>>>>>>     Diego
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 1, 2010 at 10:12, Fabio Maulo <[email protected]
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It is possible for batcher (INSERT, UPDATE,DELETE).
>>>>>>>>>>>>> I don't understand where it is useful for collection/relations
>>>>>>>>>>>>> batch-size.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 1, 2010 at 9:37 AM, Diego Mijelshon <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Being able to override batch-size would be useful.
>>>>>>>>>>>>>> Implementing it requires messing with more than one part of the
>>>>>>>>>>>>>> infrastructure, though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Diego
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Fabio Maulo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Fabio Maulo
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Fabio Maulo
>>>
>>>
>>
>


-- 
Fabio Maulo

Re: [nhibernate-development] implementing FetchMode.SubSelect per query, and improving it

Reply via email to