Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-24 Thread Ralf Junker
Jay A. Kreibich wrote:

>> >Are you using a 32 bit or 64 bit process.
>> 
>> 32, but it does not matter to the problem.
>
>  When you give examples like "PRAGMA cache_size = 1000;", or 10M
>  pages which would take something on the order of 15 GB of memory to
>  service with the default 1K page size, a few of us start to wonder.

Good point. But I believe you misunderstood my intention. I was not interested 
in absolute but in relative numbers, regardless of a 32 or 64 bit system. All I 
need to know is related to the amount of RAM available when the application 
starts. The aim is to use as much RAM as possible, but never more than 
reasonably available. The last bit is the difficult one.

>> >I am curious why you think memory is the bottleneck anyway!
>> 
>> It has often been pointed out on this list that inserts into
>> indexed tables (regular or FTS) run faster with a high page cache.
>> My own tests 2nd this. A few 100 MB more or less can make an
>> difference of more than 100%.
>
>  Given that the default page cache is 2000 pages, or on the order of
>  3MB, it seems that you're hitting some serious limits.  If hundreds
>  of megabytes (!) is giving you a return on the order of 2x, then there
>  is no magic bullet-- you aren't going to find a setting that suddenly
>  gives you a 10x speedup.  You're hitting diminishing returns in a
>  serious kind of way.

Some numbers: 32 bit app, inserting 6 million records into 2 tables each (text, 
fts, and blob). The database finally grows to over 6 GB in size. As the last 
step, a simple index is created on one text field.

With the default 2000 pages cache size (1 KB page size), this takes about a 
full day or more. Raising the page cache to some 18 pages uses about 270 MB 
of memory but brings the timing down to less than one hour.

My testing shows that inserts with lots of random disk searches (indexes, fts) 
hugely benefit from a large cache size for the simple reason that it reduces 
disk IO.

>  Personally, I'd pick a number, like half your RAM size or ~1.5GB*
>  (whichever is smaller), set the page cache, and be done with it.

That's what I ended up doing. In addition, I regularly check 
sqlite3_memory_used() and reduce the page cache if it exceeds the limit. Then I 
call sqlite3_release_memory() repeatedly until the memory usage has dropped 
sufficiently.

>  It sounds like you've already found most of the other PRAGMAs that
>  are going to get you something.  You might be able to tune the size
>  of your INSERT transactions, but if you're around 1000 or so, going
>  higher isn't likely to buy you too much.

Currently I use just a single transaction for all inserts into a newly created 
database. This reduces the number of cache flushes to a single time when all 
data is inserted and just the used memory is being freed.

As another optimization option I am looking forward for the new journal pragma 
and will hopefully not need to use journal file after all.

Thanks for the feedback and my apologies for the late response,

Ralf 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-14 Thread Jay A. Kreibich

On Mon, Apr 14, 2008 at 09:51:40AM +0200, Ralf Junker scratched on the wall:
> >> * OPTION 1: PRAGMA cache_size = 1000;
> >>
> >> Advantage: SQLite will use ample memory, but no more than that.
> >>
> >> Disadvantage: Difficulty to establish exact memory requirements in  
> >> advance. The help states that "Each page uses about 1.5K of  
> >> memory.", but I found this to be wrong. Memory usage obviously  
> >> depends on the page size, and my measurement shows that there is an  
> >> additional small overhead of undocumented size. Is there a formula  
> >> to calculate the required memory for a cache_size of x?
> 
> I'd be curious if you know an answer to this, too?

  Given that the default page size is 1K, it seems like the statement
  "Each page uses about 1.5K of memory" would account for most of the
  overhead.

> >> * OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
> >>
> >> Disadvantage: My tests indicate that SQLite slows down drastically  
> >> when it hits the memory limit. Inserts drop from a few hundred per  
> >> second to just one or two per sec.
> >
> >That is an odd result. How did you test it? 
> 
> I set up a high cache_size and a lower sqlite3_soft_heap_limit() and 
> started inserting blobs.

  Be aware that the default setting for this is "unlimited", so any
  value set is only going to reduce the amount of available memory.
  I understand you might try to do this to keep the whole memory image
  in RAM, but as long as the page cache is something realistic, I'm not
  sure any additional limits are advisable.

  Also, I'm not sure I would consider this test valid.  If the system
  runs up against the soft heap limit, it is going recover memory from
  the page cache.  Setting up a large page cache and a low soft heap
  limit is going to cause the memory allocator and page cache to fight
  with each other.
  
  Things may really go south in this specific case, as the first pages
  the recovery system is going to go after are those that don't require
  a sync, i.e. those pages that have been read but not written to.
  That would include most of the internal B-Tree index pages that are
  providing your performance gain for the indexed inserts.


 ===


On Mon, Apr 14, 2008 at 09:38:50AM +0200, Ralf Junker scratched on the wall:
> Roger Binns wrote:
> 
> >Are you using a 32 bit or 64 bit process.
> 
> 32, but it does not matter to the problem.

  When you give examples like "PRAGMA cache_size = 1000;", or 10M
  pages which would take something on the order of 15 GB of memory to
  service with the default 1K page size, a few of us start to wonder.

  Not only is that impossible with a 32-bit system, if your database is
  only 6GB, that cache setting is much larger than the database itself,
  which only has about six million pages.

> >Also is there a requirement to create the database in the filesystem?
> 
> Yes.
> 
> > If not you could ensure your swap is sufficiently large (I use a
> > mininmum of 16GB on my machines :-) and create in a tmpfs filesystem,
> > and then copy the database to persistent storage when you are done.
> 
> The aim is to avoid slow swap memory but use fast RAM only.

  Yes, but that's impossible with a 6GB database (and a 32bit system).
  Some of it is going to sit on disk.  The big advantage of using tmpfs
  is that it is linked directly to the operating system, so it will use
  as much RAM as possible (but never more) and require very little tuning.

> >I am curious why you think memory is the bottleneck anyway!
> 
> It has often been pointed out on this list that inserts into
> indexed tables (regular or FTS) run faster with a high page cache.
> My own tests 2nd this. A few 100 MB more or less can make an
> difference of more than 100%.

  Given that the default page cache is 2000 pages, or on the order of
  3MB, it seems that you're hitting some serious limits.  If hundreds
  of megabytes (!) is giving you a return on the order of 2x, then there
  is no magic bullet-- you aren't going to find a setting that suddenly
  gives you a 10x speedup.  You're hitting diminishing returns in a
  serious kind of way.

  Personally, I'd pick a number, like half your RAM size or ~1.5GB*
  (whichever is smaller), set the page cache, and be done with it.
  It sounds like you've already found most of the other PRAGMAs that
  are going to get you something.  You might be able to tune the size
  of your INSERT transactions, but if you're around 1000 or so, going
  higher isn't likely to buy you too much.

  * This upper limit is OS-dependent.  I'd use 1.5GB on Windows and older
Linux systems, 2.5GB for some of the more modern Linux systems, 3.5GB
for Mac OS X.

  Like all things performance tuning, unless you have a really good
  idea of how the system is working, you're just shooting in the dark.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' 

Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-14 Thread Ralf Junker
Dan,

many thanks for the quick response and detailed answers. However, a question or 
two still puzzle me.

>> * OPTION 1: PRAGMA cache_size = 1000;
>>
>> Advantage: SQLite will use ample memory, but no more than that.
>>
>> Disadvantage: Difficulty to establish exact memory requirements in  
>> advance. The help states that "Each page uses about 1.5K of  
>> memory.", but I found this to be wrong. Memory usage obviously  
>> depends on the page size, and my measurement shows that there is an  
>> additional small overhead of undocumented size. Is there a formula  
>> to calculate the required memory for a cache_size of x?

I'd be curious if you know an answer to this, too?

>> * OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
>>
>> Disadvantage: My tests indicate that SQLite slows down drastically  
>> when it hits the memory limit. Inserts drop from a few hundred per  
>> second to just one or two per sec.
>
>That is an odd result. How did you test it? 

I set up a high cache_size and a lower sqlite3_soft_heap_limit() and started 
inserting blobs.

>What was the memory limit? Any chance the machine started using swap space?

I will test again and let you know.

>> * Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free  
>> used pages and release their memory straight away?
>
>No. If the cache_size parameter is set to a value that
>is less than the number of pages currently allocated for the
>cache, no more pages will be allocated. But no existing
>pages will be freed.

Good to know. So I would reduce the cache_size and then use 
sqlite3_release_memory() to free memory, right?

Maybe this is worth documenting?

>Does SQLite really run faster with 1GB available than it would with 100MB?

Yes. If there are indexes (regular or FTS) on the table, SQLite needs quick 
access to lots of pages for searching and rearranging b-tree entries. My 
timings show that 100MB or 500MB can sometimes make a difference of more than 
100%.

Richard recently talked about upcoming indexing performance improvements. I 
wonder if they are part of the performance refactoring due with the next 
release? :-)

Ralf 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-14 Thread Ralf Junker
Roger Binns wrote:

>Are you using a 32 bit or 64 bit process.

32, but it does not matter to the problem.

>Also is there a requirement to create the database in the filesystem?

Yes.

>If not you could ensure your swap is sufficiently large (I use a mininmum of 
>16GB on my machines :-) and create in a tmpfs filesystem, and then copy the 
>database to
>persistent storage when you are done.

The aim is to avoid slow swap memory but use fast RAM only.

>You also didn't list turning off synchronous etc while creating the database 
>and turning it back on when done.

Performance settings are:

* PRAGMA locking_mode=exclusive;
* PRAGMA synchronous=off;
* Disable journal file :-)

>I am curious why you think memory is the bottleneck anyway!

It has often been pointed out on this list that inserts into indexed tables 
(regular or FTS) run faster with a high page cache. My own tests 2nd this. A 
few 100 MB more or less can make an difference of more than 100%.

Ralf 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-13 Thread Dan

On Apr 14, 2008, at 5:53 AM, Ralf Junker wrote:

> I need to create a huge database (about 6 GB, more than 6 mio  
> records, blobs, and FTS text) in as little time as possible. Since  
> memory is the key to speed, I try to use as much memory as is  
> available. However, there is the danger of running out of memory.  
> This is where memory usage control comes into play. I can see there  
> are two options:
>
>
> * OPTION 1: PRAGMA cache_size = 1000;
>
> Advantage: SQLite will use ample memory, but no more than that.
>
> Disadvantage: Difficulty to establish exact memory requirements in  
> advance. The help states that "Each page uses about 1.5K of  
> memory.", but I found this to be wrong. Memory usage obviously  
> depends on the page size, and my measurement shows that there is an  
> additional small overhead of undocumented size. Is there a formula  
> to calculate the required memory for a cache_size of x?
>
>
> * OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);
>
> Advantage: Memory limit can be set to a known value (amount of free  
> memory as returned from the OS).
>
> Disadvantage: My tests indicate that SQLite slows down drastically  
> when it hits the memory limit. Inserts drop from a few hundred per  
> second to just one or two per sec.

That is an odd result. How did you test it? What was the memory
limit? Any chance the machine started using swap space?

>
> * OPTION 3: Catch out-of-memory errors and reduce cache_size  
> accordingly (untested scenario).
>
> Advantage: Use memory up to the least bits available.
>
> Disadvantage: How to avoid data loss after the out-of-memory error.  
> Can I just call sqlite3_release_memory(some_bytes) and sqlite3_step  
> again and again until it passes without the out-of-memory error?
>
>
> This raises a few questions:
>
> * Do sqlite3_soft_heap_limit(), or "PRAGMA cache_size=x;", or both  
> establish SQLite's upper memory limit? Do they work independently  
> of each other, i.e. does the lower limit always kick in first?

Both limits can be used simultaneously. The cache_size limit is
per database cache, soft_heap_limit() sets a global parameter
that governs all sqlite connections opened by the process.

>
> * Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free  
> used pages and release their memory straight away?

No. If the cache_size parameter is set to a value that
is less than the number of pages currently allocated for the
cache, no more pages will be allocated. But no existing
pages will be freed.

> * Is there another runtime -- important! -- setting to establish a  
> maximum memory limit, possibly undocumented?

There is the SQLITE_MEMORY_SIZE option. But that's not really
useful for the very large memory limits you're talking about.
So soft_heap_limit() and pragma cache_size are it.

> In the end this boils down to a simple problem:
>
> * Wow to keep SQLite's memory usage as close to, but not exceeding  
> the memory available to applications?

It's not really that simple. On a workstation, not all memory is
equal. The maximum amount of memory available to an application
is all of the RAM + all of the swap space. Best performance
probably comes by using up all of the RAM and never using the swap.

Realistically, you should probably just set a large cache_size as
in option 1. Does SQLite really run faster with 1GB available than
it would with 100MB?

Dan.




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-13 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ralf Junker wrote:
> I need to create a huge database (about 6 GB, more than 6 mio records, blobs, 
> and FTS text) in as little time as possible. Since memory is the key to 
> speed, I try to use as much memory as is available. However, there is the 
> danger of running out of memory. This is where memory usage control comes 
> into play. I can see there are two options:

Are you using a 32 bit or 64 bit process.  Also is there a requirement
to create the database in the filesystem?  If not you could ensure your
swap is sufficiently large (I use a mininmum of 16GB on my machines :-)
and create in a tmpfs filesystem, and then copy the database to
persistent storage when you are done.

You also didn't list turning off synchronous etc while creating the
database and turning it back on when done.  I am curious why you think
memory is the bottleneck anyway!

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIArulmOOfHg372QQRAvNrAKDM5b4Tvf+QWfp2tWk6fYIuILE4xgCgj7tQ
QatXpI5lnZEw6uPjDtnBGu0=
=toXf
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to set memory usage as high as possible -- but not too high?

2008-04-13 Thread Ralf Junker
I need to create a huge database (about 6 GB, more than 6 mio records, blobs, 
and FTS text) in as little time as possible. Since memory is the key to speed, 
I try to use as much memory as is available. However, there is the danger of 
running out of memory. This is where memory usage control comes into play. I 
can see there are two options:


* OPTION 1: PRAGMA cache_size = 1000;

Advantage: SQLite will use ample memory, but no more than that.

Disadvantage: Difficulty to establish exact memory requirements in advance. The 
help states that "Each page uses about 1.5K of memory.", but I found this to be 
wrong. Memory usage obviously depends on the page size, and my measurement 
shows that there is an additional small overhead of undocumented size. Is there 
a formula to calculate the required memory for a cache_size of x?


* OPTION 2: sqlite3_soft_heap_limit(max_mem_bytes);

Advantage: Memory limit can be set to a known value (amount of free memory as 
returned from the OS).

Disadvantage: My tests indicate that SQLite slows down drastically when it hits 
the memory limit. Inserts drop from a few hundred per second to just one or two 
per sec.


* OPTION 3: Catch out-of-memory errors and reduce cache_size accordingly 
(untested scenario).

Advantage: Use memory up to the least bits available.

Disadvantage: How to avoid data loss after the out-of-memory error. Can I just 
call sqlite3_release_memory(some_bytes) and sqlite3_step again and again until 
it passes without the out-of-memory error?


This raises a few questions:

* Do sqlite3_soft_heap_limit(), or "PRAGMA cache_size=x;", or both establish 
SQLite's upper memory limit? Do they work independently of each other, i.e. 
does the lower limit always kick in first?

* Does "PRAGMA cache_size=some_smaller_value;" cause SQLite to free used pages 
and release their memory straight away?

* Is there another runtime -- important! -- setting to establish a maximum 
memory limit, possibly undocumented?


In the end this boils down to a simple problem:

* Wow to keep SQLite's memory usage as close to, but not exceeding the memory 
available to applications?

I will be very grateful for any suggestion!

Many thanks,

Ralf

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users