Re: [h2] A single update can 10x the filesize of the database

2024-03-17 Thread Vincent
Hello,

I am not a maintainer of H2 but I was curious and I have quickly looked at 
your test case.
If I print the filestore stats after the inserts/update, I think the 
behavior is quite normal/expected actually.

After the inserts, we have:
info.FILL_RATE=100
info.CHUNKS_FILL_RATE=99
info.CHUNKS_FILL_RATE_RW=100
info.CHUNK_COUNT=148

Meaning the file is quite packed: No empty space in the file 
(FILL_RATE=100%) and all chunks in the file are fully used (no dead pages, 
CHUNKS_FILL_RATE=99%)

Directly after the update:
info.FILL_RATE=100
info.CHUNKS_FILL_RATE=10
info.CHUNKS_FILL_RATE_RW=100
info.CHUNK_COUNT=1062

As expected the number of chunks has greatly increased: The update takes a 
while and the filestore autocommit is triggered regularly (after a very 
small period of time or if pending changes size is bigger than a threshold, 
cf. WRITE_DELAY, MVStore autoCommitBufferSize), creating a lot of new 
chunks at the end of the file. -> Chunks fill rate has dropped to 10%, 
meaning most of the chunks contain mostly dead pages. And that has explains 
why you see a ~10 times file size increase btw.

But then H2 has a background housekeeping that kicks in (after some time 
because you use the default RETENTION_TIME of 45 sec. meaning it cannot 
rewrite immediately the chunks).
We end up with those stats:
info.FILL_RATE=11
info.CHUNKS_FILL_RATE=98
info.CHUNKS_FILL_RATE_RW=99
info.CHUNK_COUNT=223

So H2 did rewrite most of the chunks and then deleted the chunks containing 
only dead pages: There was a big decrease of the numbers of chunks from 
1062 containing mainly dead pages (CHUNK_FILL_RATE=10%) to 223 chunks with 
mainly alive pages (CHUNK_FILL_RATE=98%).
Now the file contains a lot of "empty space" (FILL_RATE=11%), as dead 
chunks are considered as free/empty space and alive chunks are probably at 
the end of the file.
That means that while the file remains quite large, any new chunks that 
will be created will use the freed space and the file will not increase 
anymore.
After additional writes, you could even probably end up in a situation 
where new chunks are written in the empty space at the beginning of the 
file, freeing the chunks at the end of the file and that would trigger a 
truncation of the file (=> file size reduction)

In summary, it looks to me that H2 filestore is not designed/optimized to 
keep the file as small as possible but it will somehow limit the file size 
and make sure the space can be reused. If you really want to keep the file 
size as minimal, I guess some tuning could be performed (mainly on 
autocommit parameters ?).

One of maintainer (Andrei Tokar ?) can probably share more on this topic 
(and also assess whether what have explained in this comment is correct or 
not :) )

Best regards,
Vincent

Le dimanche 17 mars 2024 à 13:15:39 UTC+1, Alexander Kainz a écrit :

> As I said I was able to avoid the issue, but figured a h2 db developer 
> might want to know there is an issue.
>
> I think an UPDATE should not increase the filesize by 10x or the behaviour 
> should at least be documented, I think.
>
>
>
> On Sunday 17 March 2024 at 14:13:36 UTC+7 Andreas Reichel wrote:
>
>> Greetings!
>>
>> Please try to either a) partitioning you update into smaller 
>> chunks/commits or b) use a CTAS instead (although I am not a big fan of 
>> this).
>> Good luck
>>
>> Andreas
>>
>>
>> On Sat, 2024-03-16 at 22:11 -0700, Alexander Kainz wrote:
>>
>> Hi,
>>
>> I wanted to mention this issue that took me a while to identify.
>>
>> Here's the output from my test code (https://github.com/akainz/h2db)
>>
>> Table created successfully.12 KB 
>> Rows inserted successfully. File 986 MB actual data 983 MB 
>> Rows updated successfully.* 9 GB*
>>
>> the code sets up a table, then inserts 100K rows and the does an 
>> UPDATE sample_table SET COUNT = 0
>> and the db file just 10x's in size.
>>
>> That seems to be excessive and forced me to stop updating all rows, maybe 
>> you want to look into it.
>>
>> Alex
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "H2 Database" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to h2-database...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com
>>  
>> 
>> .
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/95585e26-d348-43ad-b64b-78a4f99ff9a8n%40googlegroups.com.


Re: [h2] A single update can 10x the filesize of the database

2024-03-17 Thread Alexander Kainz
As I said I was able to avoid the issue, but figured a h2 db developer 
might want to know there is an issue.

I think an UPDATE should not increase the filesize by 10x or the behaviour 
should at least be documented, I think.



On Sunday 17 March 2024 at 14:13:36 UTC+7 Andreas Reichel wrote:

> Greetings!
>
> Please try to either a) partitioning you update into smaller 
> chunks/commits or b) use a CTAS instead (although I am not a big fan of 
> this).
> Good luck
>
> Andreas
>
>
> On Sat, 2024-03-16 at 22:11 -0700, Alexander Kainz wrote:
>
> Hi,
>
> I wanted to mention this issue that took me a while to identify.
>
> Here's the output from my test code (https://github.com/akainz/h2db)
>
> Table created successfully.12 KB 
> Rows inserted successfully. File 986 MB actual data 983 MB 
> Rows updated successfully.* 9 GB*
>
> the code sets up a table, then inserts 100K rows and the does an 
> UPDATE sample_table SET COUNT = 0
> and the db file just 10x's in size.
>
> That seems to be excessive and forced me to stop updating all rows, maybe 
> you want to look into it.
>
> Alex
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to h2-database...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com
>  
> 
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/e8642d26-9e3c-4d41-af65-68b02839f17fn%40googlegroups.com.


Re: [h2] A single update can 10x the filesize of the database

2024-03-17 Thread Andreas Reichel
Greetings!

Please try to either a) partitioning you update into smaller
chunks/commits or b) use a CTAS instead (although I am not a big fan of
this).
Good luck

Andreas


On Sat, 2024-03-16 at 22:11 -0700, Alexander Kainz wrote:
> Hi,
>
> I wanted to mention this issue that took me a while to identify.
>
> Here's the output from my test code (https://github.com/akainz/h2db)
>
> Table created successfully.12 KB 
> Rows inserted successfully. File 986 MB actual data 983 MB 
> Rows updated successfully. 9 GB
>
> the code sets up a table, then inserts 100K rows and the does an
> UPDATE sample_table SET COUNT = 0
> and the db file just 10x's in size.
>
> That seems to be excessive and forced me to stop updating all rows,
> maybe you want to look into it.
>
> Alex
> --
> You received this message because you are subscribed to the Google
> Groups "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to h2-database+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/519379422a74235fd7b5d92897fe55c14cee59ce.camel%40manticore-projects.com.


[h2] A single update can 10x the filesize of the database

2024-03-17 Thread Alexander Kainz
Hi,

I wanted to mention this issue that took me a while to identify.

Here's the output from my test code (https://github.com/akainz/h2db)

Table created successfully.12 KB 
Rows inserted successfully. File 986 MB actual data 983 MB 
Rows updated successfully.* 9 GB*

the code sets up a table, then inserts 100K rows and the does an 
UPDATE sample_table SET COUNT = 0
and the db file just 10x's in size.

That seems to be excessive and forced me to stop updating all rows, maybe 
you want to look into it.

Alex

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com.