[h2] ram usage in h2

Thomas Mueller Tue, 02 Jun 2015 22:42:19 -0700

Hi,

Using "varchar(300)" does not use less memory or disk space than
"varchar(30000000)" or simply "varchar". The length is a _condition_, and
does not affect memory or disk space usage. See also the documentation.


One option would be to use a "file system in memory", for example
"jdbc:h2:memFS:test" or "jdbc:h2:memLZF:test". This is slower than pure in
memory, but needs less heap memory. You can use defrag and so on there.

> create a schema first

"create table as select" is usually faster. Best is table creation with the
right data types, as in "create table(id int, name varchar) as select ...
from csvread...".

> why does h2 use so much more memory in "mem" mode vs the file size on
disk, for the same amount of data?

It's hard to explain, but yes, it needs much more memory. Part of the
reason is Java uses a lot of memory for objects (java.lang.Integer,
java.lang.String, and so on). Internally, a String uses an array of char,
which is 2 bytes per character. On disk, only one byte is used for English.

> why would the mem database perform so much quicker than a file database
stored in ram?

Disk access is slow.

> if I am required to use the "mem" mode for the performance that we need

For your case, I would create indexes, even for in-memory databases.

Regards,
Thomas





On Tue, Jun 2, 2015 at 8:19 AM, Christian MICHON <[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:

> Are you using some commands like "create table mytable as select * from
> csvread('myfile.csv')" ?
>
> If so, I've few possible pieces of advice:
> - create a schema first, with the right types expected for attributes.
> otherwise each attribute will be casted as text with 2Gb max allocation.
> - insert like this example:
>   INSERT INTO LANG_CHINESE (ISO ,NAME ) SELECT * FROM
> CSVREAD('CHINESE.CSV',NULL,'UTF8') ;
> - do not create indexes first, create indexes after the csv import is
> completed.
>
> Can your data be shared?
>
>
> On Friday, May 29, 2015 at 7:45:53 PM UTC+2, John Smith wrote:
>>
>> I have several csv files I'm loading up into an h2 instance (running the
>> beta version of h2, ver 1.4.187).
>>
>> Total bytes on disk for csv files is approx 4 GB.
>>
>> After importing this in h2, adding a few indexes, the size of my h2 mv.db
>> database file is 3.6 gigs. Same csv files loaded up into an h2 "mem"
>> database, all in ram, windows server shows usage of 85 gigs. The memory
>> database just explodes in size compared to the file database, all else
>> between the databases remains the same (tables + indexes etc).
>>
>> Does that make sense? The huge difference in size there? Even accounting
>> for some of that 85 gigs being used as a cache of some sort that might also
>> be present with the file database (I forgot to test how much ram the h2
>> process consumed when using file database), why is there such a huge
>> increase in ram usage? My csv files themselves are only 4 gigs, I add one
>> or two indexes on each table (there are two tables), I don't see how 4 gb
>> of csv translates to 85 gigs once loaded in ram?
>>
>> Second thing is speed. I moved my file database to a ram / memory disk.
>> Then loaded up h2 and executed a query. It took 8.4 minutes. It was a
>> simple query, no joins, but used one orderby on an unindexed column. The
>> same query using the h2 mem database returned in 15 seconds; same thing, no
>> index on the orderby. I get that I could add an index to cut down the query
>> time, but that's not the point; I wanted to test how much difference in
>> performance there is b/n the native mem mode, and the h2 file database
>> stored on a ram disk.
>>
>> So to sum up:
>> 1: why does h2 use so much more memory in "mem" mode vs the file size on
>> disk, for the same amount of data?
>> 2: why would the mem database perform so much quicker than a file
>> database stored in ram?
>> 3: if I am required to use the "mem" mode for the performance that we
>> need, is there any way I can "defrag" the database while it's in ram? After
>> several drops, deletes, updates etc., the mem based database continues to
>> grow and I can't shrink it using "shutdown defrag" like I can with a file
>> based database.
>>
>> Thanks in advance for any thoughts on all this.
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <javascript:_e(%7B%7D,'cvml','h2-database%[email protected]');>
> .
> To post to this group, send email to [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>.
> Visit this group at http://groups.google.com/group/h2-database.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] ram usage in h2

Reply via email to