Heh... Or a possibly simpler explanation: is 400 bytes the minimum overhead of a row in h2, or related in an unfortunately way to a block size? I just found that the 2x problem goes away when I change it to 4000.
On Friday, June 20, 2014 9:20:04 PM UTC-7, Brian Craft wrote: > > Following up on the large db files, I ran a few tests, loading into this > table: > > CREATE TABLE IF NOT EXISTS `scores` (`id` INT NOT NULL PRIMARY > KEY,`scores` VARBINARY(400) NOT NULL) > > I varied the settings of LOG, and UNDO_LOG, used one csvread or a sequence > of INSERT statements, inside a transaction, or without, and a variety of > sizes. > > In all cases the h2 data file is over 2x the size of the input data. 10M > of binary data partitioned and stored in the varbinary scores field will > result in 20M and change on disk. This is a huge increase. > > Additionally, with LOG=1 (rather than LOG=0), another 100% increase in > size will occur, though I only saw this with larger data sets (say 100M). > Some examples: > > csvread of 1G with LOG=1 becomes 3.6G on disk, with or without a > transaction (I'm guessing they are equivalent). > csvread of 1G with LOG=0 becomes 2.5G on disk,. > csvread of 100M with LOG=0 becomes 250M on disk, etc. > inserts of 100M with LOG=0 becomes 250M on disk > > The 2x size increase seems like a hard limit, and I'm wondering if this > could be an encoding problem. Does h2 store varbinary as binary, or is it > converting it to hex, or something? > > > > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to h2-database+unsubscr...@googlegroups.com. To post to this group, send email to h2-database@googlegroups.com. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.