[h2] Re: Time To First Row on large ResultSet

val Mon, 09 Nov 2020 21:32:19 -0800

Well here I go into a monologue,  answering my own question.

After searching for many fruitless hours, I finally stumbled on a useful 
post,  here, https://groups.google.com/g/h2-database/c/rgpD_y3Xp7w


Adding *LAZY_QUERY_EXECUTION=1 * to the connection url is closer to what I 
was looking for.

TTFR is now 0.017seconds

Interestingly enough the over all elapsed time to iterate over all 
300million rows was ~20 minutes, that's comparable to the previous TTFR of 
23 minutes that it took to first serialize the  whole ResultSet to disk. In 
that case the whole elapsed time was 29 minutes, 23 minutes to build result 
set and 6 minute to iterate the 300 million records.

I wonder where the bottleneck is. What would it take to bring this 20 
minutes total iteration time down to the 6 minutes it takes if you prebuild 
the ResultSet first. 


On Tuesday, November 10, 2020 at 12:03:30 AM UTC-5 val wrote:

> On further inspection I can see that H2 is writing a large temp file, 
> before my results start flowing. 20+ Gigs of data in about 23 minutues. 
> Likely the whole contents of my large table rewritten to disk.
>
> I'm guessing this is  how the h2 client/driver (not the db engine it self) 
> behaves to handle a ResultSet.  It will fully serialize the contents of the 
> ResultSet to disk first and then serve that. Perhaps this is necessary to 
> support scrollable ResultSets? If this was not taking place on SSD it would 
> have taken hours too see a result. And I dont need the ResultSet to support 
> cursors/scrolling.
>
> I guess I expected something more like ResultSet streaming, perhaps H2 
> doesnt support that.
>
> And before anyone says that it is bad practice to have such a large result 
> set; this is being used for Machine Learning. I'm dumping the data into a 
> Neural Network. 
>
> 330180000 records, 
> Time to First Row: 23 minutes, 27.473 seconds 
>
>
> On Monday, November 9, 2020 at 3:56:17 PM UTC-5 val wrote:
>
>> I'm playing around with a large H2 data set.The db in question is 40gb+, 
>> the table has 300 million records.+ Via JDBC.
>>
>> When I do a simple  "select * from mylargetable" it takes 10-15 minute 
>> before I see the first row, and the data starts flowing. 
>>
>> This was unexpected as H2 has been extremely performant in all other 
>> situations.
>>
>> Is there something I can do to reduce this Time To First Row? If I add a 
>> limit on the query, the results start comming back faster, ex limit 10000 
>> comes back sub second, 1000000 comes back in a few seconds, etc... The TTFR 
>> keeps growing as the limit increases.
>>
>> Is H2 trying to fully fill some buffer before starting to return data?
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/947ea895-295b-4c84-b921-7625d1974a08n%40googlegroups.com.

[h2] Re: Time To First Row on large ResultSet

Reply via email to