Re: [h2] Re: Time To First Row on large ResultSet

val Tue, 10 Nov 2020 09:22:13 -0800


I haven't traced it through all the way.    
*org.h2.result.LocalResultImpl#addRowsToDisk 
*is a good starting point.


By the way I was using the word serialize 'loosely', its not using Java 
serialization to serialize the result set. From a quick code review, its 
placing the result sets into 

*org.h2.mvstore.db.MVPlainTempResult*
*org.h2.mvstore.db.MVSortedTempResult*

So its using these MVStore classes to temporarily persist the ResultSet to 
disk.

Based on information here http://www.h2database.com/html/mvstore.html it 
maybe possible to start returning rows from the MVStore while the full 
content is still being written. That's probably easier then replacing this 
whole implementation with a new mechanism like CSV or ASCII; as it would 
already contains the  ResultSet meta data like types, sizes, etc.. which 
would need a brand new custom implementation to be encoded in csv/ascii




On Tuesday, November 10, 2020 at 3:03:23 AM UTC-5 
and...@manticore-projects.com wrote:

> Noel and Val, good morning.
>
> Would you like to give us a hint, where exactly this 
> "temporally serialization of a resultset" happens in the code?
> We could try to modify it so it writes Flat ASCII files or CSV files 
> instead of binary blobs (on demand, per hint), which would enable us to 
> read line by line and access the first rows while the full content is still 
> written.
>
> Best regards
> Andreas
>
>
> On Tue, 2020-11-10 at 08:52 +0200, Noel Grandin wrote:
>
>  > On Tuesday, November 10, 2020 at 12:03:30 AM UTC-5 val wrote:
>  >     I'm guessing this is  how the h2 client/driver (not the db engine 
> it self) behaves to handle a ResultSet.  It will
>  >     fully serialize the contents of the ResultSet to disk first and 
> then serve that. Perhaps this is necessary to
>
> No, it's just the easiest way to handle multiple resultset performantly, 
> because then the engine doesn't have to 
> remember any state about the resultset.
>
> On 2020/11/10 7:31 am, val wrote:
>
> Adding *LAZY_QUERY_EXECUTION=1 * to the connection url is closer to what I 
> was looking for.
>
>
> Yeah, that is currently our best answer for extremely large result sets, 
> note that it only works for relatively simple 
> queries, and it will likely be quite bad performance-wise if you have more 
> than connection attempting to hit the same table.
>
> The alternative is to grab chunks of the resultset at a time, using 
> OFFSET...LIMIT....
>
>
> I wonder where the bottleneck is. What would it take to bring this 20 
> minutes total iteration time down to the 6 minutes 
> it takes if you prebuild the ResultSet first.
>
>
> You'd have to run a profiler, either our simple one (see our performance 
> page), or one of the other ones, to see where 
> the bottleneck is.
>
> TBH, H2 is mostly optimised for small quick queries, not really large ones.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/87f5020a-3ca4-46ad-b88d-ddc2a656d106n%40googlegroups.com.

Re: [h2] Re: Time To First Row on large ResultSet

Reply via email to