RE: [nhibernate-development] implementing FetchMode.SubSelect per query, and improving it

Frans Bouma Wed, 01 Sep 2010 11:29:27 -0700

> Yes you can, but I think you are introducing a breaking change if you are
> going for:
> where c.FooId in (p1, p2, p3, p4..... pn) Subselect does not suffer the
2100
> parameters limit, with your patch it will have this new limitation.


        Yep, paging is a problem. We have the same feature, called
'parameterized prefetch paths', which works roughly like this: you have a
threshold T which is say by default 100. If there are < T parentID's, it
switches to a where x.FkField in (p1, p...) query for children, and if >= T,
it will use a subquery. 

        Paging however doesn't work with a subquery, you have to use the
parameterized variant with eager loading. This isn't that hard to achieve
however: just keep the page size < T. 

        In NH land, the 'T' is the batchsize, but it's not configurable at
runtime, correct? So you can't set it per query. It depends whether a
subquery is faster than a parameterized variant. Speed tests on sqlserver
suggest that when you hit 100 or more parameters, a subquery is quicker, but
complex subqueries could make this number become different, so it is highly
useful to have a configurable batch size. 

                FB 

> 
> 
> On Wed, Sep 1, 2010 at 2:56 PM, nadav s <[email protected]> wrote:
> 
> 
>       before starting to work on the batch size thing, i want to apply a
> patch for making subselect more sain because that work is already done,
the
> problem is, in the old way, if  the first query used paging, the sub
select
> ignored the paging, fetching all the children of the fathers of the first
> query, if it hadn't used paging.
> 
>       there is a test called SubselectFetchWithLimit which creates 3
> parents, fetches only 2 parents, initiliaze their collections, then
fetches
> the third parent and expects its collection to already by initialized
> (although it wasn't returned by the paged query).
> 
>       my improvement breaks this test, because paging is now taking into
> consideration when fetching by ids, which i think is the much more correct
> way to go.
> 
>       so the assert with the comment
>       // The test for True is the test of H3.2
>       now breaks. can i change the test intentially so that subselect will
> consider paging?
> 
> 
>       On Wed, Sep 1, 2010 at 8:38 PM, nadav s <[email protected]> wrote:
> 
> 
>               of course i'll be doing the same work as batch size, i'm not
> set to implement batch size all over again, but trying to allow it to be
> query specific, meaning, being able to look for owners of a specific
query,
> and not all owners that are in the session (owners from different query
> might be there), and allowing it to be overrideable
> 
> 
>               On Wed, Sep 1, 2010 at 8:09 PM, Fabio Maulo
> <[email protected]> wrote:
> 
> 
>                       ah...
>                       take care with "using IDs is more efficient" because
> "subselect" does not suffer the problem of max-parameter (IIRC 2100 in
> msSQL)
> 
> 
>                       On Wed, Sep 1, 2010 at 2:03 PM, nadav s
> <[email protected]> wrote:
> 
> 
>                               great. thanks
> 
> 
>                               On Wed, Sep 1, 2010 at 7:56 PM, nadav s
> <[email protected]> wrote:
> 
> 
>                                       you know the internal of nhibernate
much
> much much better than me, and i won't get into an implementation argue
with
> you, but it is possible to implement.
> 
>                                       with subselect (again i'm talking
about
> subselect because i didn't do any research on the batch size, but i guess
> the idea is similar because it works the same, only batch size issues a
good
> query and subselect issues an evil one), as i've noticed, there is a
special
> one-to-many collection persister, that knows once the collection is
> accessed, use a sub select batcher that loads the collections of all the
> owners that were returned by the initial query.
> 
>                                       if the persister could have been
set, or
> modified, for a specific instance of a collection, it would have been
> possible - you could have set the batch size\subselect for a specific
query,
> which in turn would have set a different persister for the collections
that
> their persisters needs modification, and then when a collection would have
> been accessed, the persister would have done its thing.
> 
>                                       of course, i'm not sure thats the
proper
> way of implementing it, but as an idea - tell the specific collections
that
> are created for the entities of a specific query to do something else than
> the default, it is possible
> 
> 
>                                       On Wed, Sep 1, 2010 at 7:49 PM, John
> Davidson <[email protected]> wrote:
> 
> 
>                                               I think nadav is saying that
> subselect from NHibernate is an issue, but the implementation he is
> proposing will fix that problem
> 
>                                               John Davidson
> 
> 
>                                               On Wed, Sep 1, 2010 at 12:46
PM,
> Fabio Maulo <[email protected]> wrote:
> 
> 
>                                                       LOL!!
>                                                       Your first assertion
: "btw,
> i don't really get what is the problem with subselect"
>                                                       Your second
assertion : "the
> sub select is always inefficient"
> 
>                                                       On Wed, Sep 1, 2010
at 1:42
> PM, nadav s <[email protected]> wrote:
> 
> 
>                                                               the sub
select is
> always inefficient, especially when there is an initial complex query
(with
> sub queries in it), and its a killer when its a two level tree (when
> fetching the grandchildren). fixing it was really really easy, and i can't
> see any downside to it.
> 
>                                                               different
use cases
> in a web app:
> 
>                                                               use case 1:
sub
> select\batch size is NOT desired
> 
>                                                                  the user
searches
> for car companies by some criteria. the user will then choose (double
click
> on a grid's row or something) one of the
>                                                                  companies
to see
> it in full details. each company has one-to-many car types (mazda -> mazda
> 3, mazda 5, mazda 6...) and each
>                                                                  car type
will be
> displayed in its own tab, when at first, the newest car type or the most
> expensive one, doesn't matter is selected.
>                                                                  each car
type has
> its models, mazda3 2008 isn't the same as 2010 (i don't that much about
cars
> and not sure the years are correct,
>                                                                  but there
are
> differences between the models).
> 
>                                                                  the
result: if
> carType.Models is mapped with some batch size, say 10, the models of 10 of
> the car types are now fetched, although
>                                                                  the user
only
> watches the models of one of the car types, if there could be lots of
models
> for each car type, it slowed the first tab,
>                                                                  and made
the other
> tabs faster, because their car types are now loaded, but its not what is
> desired, because the user is expected to
>                                                                  click on
only one
> of other tabs or something.
> 
>                                                                use case 2:
desired:
> 
>                                                                   the user
wanna
> see some custom developed report (ui that can be implemented with
MRS/Cognus
> or any other reporting framework,
>                                                                   and we
have all
> kinds of reports that live up to this definition, and for some good
reasons
> also). for the report the user searches for
>                                                                   car
companies by
> some criteria (some search form) and then expects to see the returned
> companies, paged of course, but with all
>                                                                   of their
car
> types, and for each of the car type - all of its models. here, a sub
select
> or batch fetching is a must or else we'll get a CP
>                                                                   with
join
> fetching, or N^2 + 1 if we do regular lazy loading (like we wanted to do
in
> the first situation).
> 
>                                                               of course we
can work
> around that, and thats exactly what we do, using a generic mechanizm that
> for reports, eager fetches with sub selects and not joins, the association
> it was asked to fetch. for the regular queries, it just use the default
> which is regular lazy.
> 
>                                                               it would
have been
> really really nice, if i could have set, for the report query,
> query.SetFetchMode("CarTypes", FetchMode.SubSelect)
>                                                               or if you
will,
> query.SetBatchSize("CarTypes", 20)
>                                                               and same for
models
> 
>       query.SetFetchMode("CarTypes.Models", FetchMode.SubSelect) or
> 
>       query.SetBatchSize("CarTypes.Models", int.MaxValue).
> 
>                                                               it must be
max value
> because i want all the models, and can't possibly know how many car types
> are going to be there. of course it won't be alot, because the "query" is
> going to use paging, but i don't really know if its 20, 40, or something
> else.
> 
> 
>                                                               batch size,
currently
> makes me choose between the use cases, slowing down one of them, or makes
me
> query and connect the associations my self. same goes for sub select,
which
> also issues an inefficient query for CarTypes and a killer query for the
> Models
>                                                               before my
fix it
> would have been:
>                                                               select ...
>                                                               from Models
m
>                                                               where
m.CarTypeId in
>                                                                  (select
c.Id
>                                                                   from
CarTypes c
>                                                                   where
c.CompanyId
> in
>
(select
> company.Id
>
from
> Companies company
>
where
> <could be some crazy crteria - this is the same where clause of the very
> original query>))
> 
> 
> 
>                                                               (i was able
to make
> itthe inefficiency of the query
> 
> 
> 
> 
>                                                               On Wed, Sep
1, 2010
> at 6:58 PM, Fabio Maulo <[email protected]> wrote:
> 
> 
>                                                                       I
don't know
> which is the problem... you said that there is a problem and you want
change
> it using the same tech used by batch-size (using uploaded ids) because
> subselect seems inefficient in some cases.
> 
> 
>                                                                       On
Wed, Sep 1,
> 2010 at 12:48 PM, nadav s <[email protected]> wrote:
> 
> 
>
btw, i
> don't really get what is the problem with subselect, as it lets you
> efficiently fetch a whole object graph for the N fathers that were fetched
> in some query, in the most efficient way possible
> 
> 
>
On Wed,
> Sep 1, 2010 at 6:46 PM, nadav s <[email protected]> wrote:
> 
> 
> 
>       i don't think its thats low priority, because it is actually a thing
> people expect to happen when they set a fetch mode to Eager, at least i've
> seen alot of situations when people really thought that thats whats going
to
> happen  (later finding out it killed their query with CP)
> 
> 
>       about when it is helpful - exactly in the situations diego
described.
> two use cases,
> 
>       in one of them you query the fathers and gonna need only one of the
> father's collection, and for the other
> 
>       you're gonna need all of their collections.
> 
>       it gets more complicated when there are grandchildren involved, and
> in one of the situations you want the grand children of one of the childs,
> and in the other situation, because you load an object graph, you're gonna
> need all of them.
> 
> 
>       now, either you implement (similar to what diego said) the loading
of
> the collections yourself, or you gonna have to live with the batch size
> slowing down the first situation, where you would have prefered lazy
loading
> without batching
> 
> 
> 
>       On Wed, Sep 1, 2010 at 5:22 PM, Diego Mijelshon
> <[email protected]> wrote:
> 
> 
> 
>       I have entities where batch loading helps in some use cases but it
> loads lots of unneeded entities/collections in other complex use cases,
> where I have many proxies but only use a few.
> 
>       My current workaround is doing "manual batch loading" (i.e. dummy
> query) in the cases where I need it.
> 
> 
>       It would be definitely a low-priority but nice-to-have feature.
> 
> 
> 
>           Diego
> 
> 
> 
> 
>       On Wed, Sep 1, 2010 at 10:12, Fabio Maulo <[email protected]>
> wrote:
> 
> 
> 
> 
>               It is possible for batcher (INSERT, UPDATE,DELETE).
> 
>               I don't understand where it is useful for
collection/relations
> batch-size.
> 
> 
> 
>               On Wed, Sep 1, 2010 at 9:37 AM, Diego Mijelshon
> <[email protected]> wrote:
> 
> 
> 
> 
>                       Being able to override batch-size would be useful.
> Implementing it requires messing with more than one part of the
> infrastructure, though.
> 
> 
> 
>                           Diego
> 
> 
> 
> 
> 
> 
> 
>                                                                       --
>
Fabio Maulo
> 
> 
> 
> 
> 
> 
> 
>                                                       --
>                                                       Fabio Maulo
> 
> 
> 
> 
> 
> 
> 
> 
> 
>                       --
>                       Fabio Maulo
> 
> 
> 
> 
> 
> 
> 
> 
> --
> Fabio Maulo
>

RE: [nhibernate-development] implementing FetchMode.SubSelect per query, and improving it

Reply via email to