Re: [nhibernate-development] A new scheme for eager loading

nadav s Sun, 20 Feb 2011 14:46:58 -0800

implementing subselect as a fetch mode is something i wanted to do a long
time ago.
instead i was advised to make batch size enabled to be overridden in a
session context.
If it helps you, vote for it:


http://216.121.112.228/browse/NH-2316
<http://216.121.112.228/browse/NH-2316>

On Sat, Feb 19, 2011 at 3:39 AM, Niels Kühnel <[email protected]>wrote:

> That could be really cool. Although not beeing decent before the "Denali"
> release, SQL server supports paging by means of sub queries.
>
> In general eager loading performance in NHibernate could be improved if,
> offhand, something like this could be implemented in dialects:
>
> bool SupportsReusableSubSelectFilter
> SqlString GetResuableSubSelectFilter(idFields, query, alias)
> SqlString GetSubSelectFilterReference(alias)
>
> and the FetchMode enum had:
> SubSelect
> SubSelectPreFilter (expert option to use when pages are small or where
> conditions complex)
>
> Of course, this would require quite some effort but just including
> SubSelect in FetchMode and implementing paging using the existing
> GetLimitString from dialects would be an awesome start :)
>
> Cheers,
> Niels
>
>
> On 17 February 2011 13:32, Fabio Maulo <[email protected]> wrote:
>
>> The matter is not how do the query but how hydrate the model.
>> Some years ago, I and Oren have talked about the introduction of a new
>> fetch mode for relations and collections: fetch="batchquery"...
>> btw, perhaps, we can put some more effort in "subselect" fetching at least
>> for those RDBMS supporting a decent paging syntax.
>>
>> On Thu, Feb 17, 2011 at 8:51 AM, Niels Kühnel <[email protected]>wrote:
>>
>>> All right. I underestimated the penalty from the increased size of the
>>> result set.
>>> When not paging subselects are faster than my idea unless you have some
>>> really complicated where clauses
>>> When paging batching is faster unless you have big pages.
>>> This means that it is not worth the effort.
>>>
>>> Thanks for your input.
>>>
>>> Now, subselects would be faster in all cases if the where clause could be
>>> reused.
>>>
>>> DECLARE @IDs TABLE (ID bigint primary key);
>>> INSERT @IDs select TOP 100 ID FROM [Person] WHERE {tricky stuff}
>>>
>>> SELECT {all fields} FROM [Person] WHERE ID IN (SELECT ID FROM @IDs)
>>> SELECT {all fields} FROM [Tag] tags0_ WHERE tags0_.Person_id in (select
>>> ID FROM @IDs)
>>> SELECT {all fields} FROM [Phonenumber] phonenumbe0_ WHERE
>>> phonenumbe0_.Person_id in (select ID FROM @IDs)
>>>
>>> What do you think is the best way to make queries like this?
>>>
>>> Thanks,
>>> Niels
>>>
>>>
>>>
>>> On 16 February 2011 21:01, Niels Kühnel <[email protected]> wrote:
>>>
>>>> I'm not sure why i used INNER JOIN .. 1=1. Cross join is more clear,
>>>> thanks.
>>>>
>>>> Anyway, I like the power in making general expressions as:
>>>> "I want to show these companies in my list, and for all of them I need
>>>> these relations for what I'm going to show. Please give me them as fast as
>>>> possible, but I don't really care how"
>>>>
>>>> _session.QueryOver<Company>().Skip(20).Take(20).OrderBy(x=>x.Name).Asc)
>>>>                 .EagerFetchMany(x => x.PushReportInfos)
>>>>                 .EagerFetch(x => x.AgreementStatus)
>>>>                 .EagerFetchMany(x => x.SecurityTokens)
>>>>                 .EagerFetchMany(x => x.NaceEntries)
>>>>                 .EagerFetchMany(x => x.NaceEntries.First().NaceEntry)
>>>>                 .Where "some really complicated stuff with EXISTS and
>>>> stuff"
>>>>
>>>> You can get that with subselects too, but in that case "some really
>>>> complicated stuff" will be repeated for each sub select.
>>>> In my approach filtering is only done once, and the database engine
>>>> simply joins the tables on the resulting root entities
>>>>
>>>> If you follow an approach whith a query for only the root entities and
>>>> then get the rest by using some kind of " WHERE IN (root ids)" approach
>>>> you're essentially doing the same as me, except that your need more code
>>>> and you are sending a lot of ids to and fro.
>>>> You are especially likely to get into trouble if you need nested
>>>> relations or if you for some reason need to consider thousands of entities.
>>>>
>>>> I still need to do some performance comparisons though, as everything
>>>> else is just hand waving :)
>>>>
>>>> On 16 February 2011 20:36, nadav s <[email protected]> wrote:
>>>>
>>>>> fetch=subselect does a subselect for the original query that isn't
>>>>> really neccessary because you have the ids in the session, and that
>>>>> subselect might be a big one sometimes.
>>>>> I think batch size is the best approach.
>>>>> The thing is that currently you have to set the value in the
>>>>> mappings...
>>>>> there is a patch i've uploaded to jira, i know fabio doesn't like it,
>>>>> still don't really understand why exactly, but it enables you to override
>>>>> the batch size that was set in the mappings per session, and i think it
>>>>> could be useful in plenty of use cases.
>>>>>
>>>>>
>>>>> On Wed, Feb 16, 2011 at 7:41 PM, Fabio Maulo <[email protected]>wrote:
>>>>>
>>>>>> You can try even fetch="subselect" to upload each collection in just
>>>>>> one roundtrip
>>>>>> For instance given a Company with a collection of Contacts when you
>>>>>> get the first page of Companies NH will fetch all contacts of the first 
>>>>>> page
>>>>>> in one roundtrip.
>>>>>> fetch="subselect" may fail in some circumstances under MSSQL thank to
>>>>>> its nice way to get pagination (perhaps in 2012 the will realize that
>>>>>> result-sets are paged in web and that developers are humans).
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 16, 2011 at 2:21 PM, Diego Mijelshon <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Niels,
>>>>>>>
>>>>>>> Your concepts are interesting.
>>>>>>> A simple question: why are you using INNER JOIN (...) ON 1=1 instead
>>>>>>> of CROSS JOIN?
>>>>>>>
>>>>>>> Anyway: while the whole idea might work, it would be interesting to
>>>>>>> measure it against existing approaches using real use cases. I suspect
>>>>>>> (read: guess) the gains do not offset the loses.
>>>>>>> Also, my general approach is to NEVER fetch collections. I prefer to
>>>>>>> use load batching, or a couple roundtrips.
>>>>>>> As applied to paging, this will get even easier when we introduce HQL
>>>>>>> syntax for paging.
>>>>>>>
>>>>>>> But as said: all research is useful, even if we don't end up using
>>>>>>> the exact original idea.
>>>>>>>
>>>>>>>     Diego
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 16, 2011 at 10:15, Niels Kühnel 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm new on this list, so I'm very sorry if I break all your rules...
>>>>>>>>
>>>>>>>> I have devised a new way for eager loading collections, collections'
>>>>>>>> collections etc. at once without making the Cartesian product. I've
>>>>>>>> done a proof of concept implementation but I think it's solid enough
>>>>>>>> for further investigation
>>>>>>>>
>>>>>>>> The basic idea is to add a "fork" to the query to add the missing
>>>>>>>> structure when two unrelated tables are joined as:
>>>>>>>>
>>>>>>>> SELECT A.ID AS A, Forks.Fork, B.ID AS B, C.ID AS C FROM A
>>>>>>>>    INNER JOIN (SELECT 1 AS Fork UNION ALL SELECT 2) AS Forks ON 1=1
>>>>>>>>    LEFT JOIN B ON B.A = A.ID AND Forks.Fork = 1
>>>>>>>>    LEFT JOIN C ON C.A = A.ID AND Forks.Fork = 2
>>>>>>>>
>>>>>>>> While spending some time with eager loading I've also improved
>>>>>>>> paging
>>>>>>>> in Sql server 2005+/CE 4.0 so that you can get root entities, say,
>>>>>>>> 10
>>>>>>>> to 20 even though the query has eagerly fetched collections. I
>>>>>>>> haven't
>>>>>>>> yet looked into if the same method can be used with other databases
>>>>>>>>
>>>>>>>> If you have the time, you can read more about my ideas at
>>>>>>>>
>>>>>>>> http://kuhnel.wordpress.com/2010/10/15/the-eager-fork-%E2%80%93-taming-cartesian-explosions-in-databases/
>>>>>>>> and
>>>>>>>>
>>>>>>>> http://kuhnel.wordpress.com/2011/02/13/paging-eager-fetched-nhibernate-collections-efficiently/
>>>>>>>>
>>>>>>>> and some proof of concept code at
>>>>>>>> https://bitbucket.org/nielskuhnel/eager-nh-extensions
>>>>>>>>
>>>>>>>> I still have to do a proper performance comparison against lazy
>>>>>>>> loading and futures but my ad hoc results are encouraging. I've also
>>>>>>>> studied execution plans in SQL Server and the "fork" tables doesn't
>>>>>>>> seem to upset it in any way.
>>>>>>>> It's implemented as a custom dialect so no changes to the core are
>>>>>>>> needed per se.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Niels Kühnel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Fabio Maulo
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Fabio Maulo
>>
>>
>

Re: [nhibernate-development] A new scheme for eager loading

Reply via email to