Hi Andrei,

as I said I'm coming from a place where I do not need help. I got rid of 
the UPDATE and that fixed it. Retention Time did not fix any of it. 
I'm happy to have access to great Open Source software and this is more 
about me giving back by trying to point out a problem.

If you or the community thinks 10x the file size that is acceptable 
behavior on an update and for example is a trade-off between speed and 
filesize I'm totally ok with it.
I'm not making the decisions here. But I do suggest putting it somewhere in 
the documentation at least as a warning, because I can't be the only one 
that will run into this issue.
"Be careful about updating all the rows in a table as it can massively 
increase size on disk" for example.

I don't really understand why the update has to have such a massive 
increase in file size though. I would totally understand doubling the size 
but 10x does seem a bit much to me as a total noob on db architectures.
HSQLDB does not seem to have that issue, but I think they do not have 
multiversion concurrency turned on by default?

Can multi-version concurrency be turned off, as a tradeoff between speed 
and filesize?

Thanks,

Alex

On Tuesday 19 March 2024 at 22:58:34 UTC+7 Andrei Tokar wrote:

> Hi Alexander,
> Your statement that "an UPDATE should not increase the filesize by 10x" is 
> based on assumption that data is overwritten in-place, and it's definitely 
> not the case.
> It's not just H2, but I think any database with multi-version concurrency 
> (or even any app using SSD, although this fact is hidden from us by OS file 
> system).
> How much slack is allowed and what the strategy should be to minimize it, 
> is still an area of some development in H2, i.e. 
> https://github.com/h2database/h2database/issues/3909.
> As far as tuning goes, I would decrease RETENTION_TIME to 1000, or even 0 
> (don't be scared by name, chunks are retained this days as long as needed, 
> regardless), and do update in batches, as was already suggested.
>
> Vincent,
> Thank you for this thorough explanation, I think you assessment is 100% 
> correct.
>
> On Sunday, March 17, 2024 at 5:12:20 PM UTC-4 Vincent wrote:
>
>> Hello,
>>
>> I am not a maintainer of H2 but I was curious and I have quickly looked 
>> at your test case.
>> If I print the filestore stats after the inserts/update, I think the 
>> behavior is quite normal/expected actually.
>>
>> After the inserts, we have:
>> info.FILL_RATE=100                        
>> info.CHUNKS_FILL_RATE=99
>> info.CHUNKS_FILL_RATE_RW=100
>> info.CHUNK_COUNT=148
>>
>> Meaning the file is quite packed: No empty space in the file 
>> (FILL_RATE=100%) and all chunks in the file are fully used (no dead pages, 
>> CHUNKS_FILL_RATE=99%)
>>
>> Directly after the update:
>> info.FILL_RATE=100
>> info.CHUNKS_FILL_RATE=10
>> info.CHUNKS_FILL_RATE_RW=100
>> info.CHUNK_COUNT=1062
>>
>> As expected the number of chunks has greatly increased: The update takes 
>> a while and the filestore autocommit is triggered regularly (after a very 
>> small period of time or if pending changes size is bigger than a threshold, 
>> cf. WRITE_DELAY, MVStore autoCommitBufferSize), creating a lot of new 
>> chunks at the end of the file. -> Chunks fill rate has dropped to 10%, 
>> meaning most of the chunks contain mostly dead pages. And that has explains 
>> why you see a ~10 times file size increase btw.
>>
>> But then H2 has a background housekeeping that kicks in (after some time 
>> because you use the default RETENTION_TIME of 45 sec. meaning it cannot 
>> rewrite immediately the chunks).
>> We end up with those stats:
>> info.FILL_RATE=11
>> info.CHUNKS_FILL_RATE=98
>> info.CHUNKS_FILL_RATE_RW=99
>> info.CHUNK_COUNT=223
>>
>> So H2 did rewrite most of the chunks and then deleted the chunks 
>> containing only dead pages: There was a big decrease of the numbers of 
>> chunks from 1062 containing mainly dead pages (CHUNK_FILL_RATE=10%) to 223 
>> chunks with mainly alive pages (CHUNK_FILL_RATE=98%).
>> Now the file contains a lot of "empty space" (FILL_RATE=11%), as dead 
>> chunks are considered as free/empty space and alive chunks are probably at 
>> the end of the file.
>> That means that while the file remains quite large, any new chunks that 
>> will be created will use the freed space and the file will not increase 
>> anymore.
>> After additional writes, you could even probably end up in a situation 
>> where new chunks are written in the empty space at the beginning of the 
>> file, freeing the chunks at the end of the file and that would trigger a 
>> truncation of the file (=> file size reduction)
>>
>> In summary, it looks to me that H2 filestore is not designed/optimized to 
>> keep the file as small as possible but it will somehow limit the file size 
>> and make sure the space can be reused. If you really want to keep the file 
>> size as minimal, I guess some tuning could be performed (mainly on 
>> autocommit parameters ?).
>>
>> One of maintainer (Andrei Tokar ?) can probably share more on this topic 
>> (and also assess whether what have explained in this comment is correct or 
>> not :) )
>>
>> Best regards,
>> Vincent
>>
>> Le dimanche 17 mars 2024 à 13:15:39 UTC+1, Alexander Kainz a écrit :
>>
>>> As I said I was able to avoid the issue, but figured a h2 db developer 
>>> might want to know there is an issue.
>>>
>>> I think an UPDATE should not increase the filesize by 10x or the 
>>> behaviour should at least be documented, I think.
>>>
>>>
>>>
>>> On Sunday 17 March 2024 at 14:13:36 UTC+7 Andreas Reichel wrote:
>>>
>>>> Greetings!
>>>>
>>>> Please try to either a) partitioning you update into smaller 
>>>> chunks/commits or b) use a CTAS instead (although I am not a big fan of 
>>>> this).
>>>> Good luck
>>>>
>>>> Andreas
>>>>
>>>>
>>>> On Sat, 2024-03-16 at 22:11 -0700, Alexander Kainz wrote:
>>>>
>>>> Hi,
>>>>
>>>> I wanted to mention this issue that took me a while to identify.
>>>>
>>>> Here's the output from my test code (https://github.com/akainz/h2db)
>>>>
>>>> Table created successfully.12 KB 
>>>> Rows inserted successfully. File 986 MB actual data 983 MB 
>>>> Rows updated successfully.* 9 GB*
>>>>
>>>> the code sets up a table, then inserts 100K rows and the does an 
>>>> UPDATE sample_table SET COUNT = 0
>>>> and the db file just 10x's in size.
>>>>
>>>> That seems to be excessive and forced me to stop updating all rows, 
>>>> maybe you want to look into it.
>>>>
>>>> Alex
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "H2 Database" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to h2-database...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/h2-database/91a34196-107b-4fc7-860b-0cc9c3ce523dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/e65fd0d7-79d3-4279-a441-97962e801657n%40googlegroups.com.

Reply via email to