Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
John Machin wrote: > On 16/03/2009 5:48 PM, Pierre Chatelier wrote: >>> A few hundred blocks of raw data? Blocksize approx 300K bytes? >>> Database >>> created and dropped by the same process? 500 blocks is approx 150M >>> bytes; why not keep it in a hash table in memory? If you keep it in a >>> database or the file system, it's going to be washed through your real >>> memory and pagefile-aka-swap-partition anyway, so just cut out the >>> middlemen :-) >> You're right, but who said I have only 1 DB at a time :-) ? > > You didn't say anything at all about how many DBs you have, so it wasn't > you. > > >> In fact, I have several DBs and I do not known in advance what size it >> will represent. > > What is "it"? > >> Perhaps 500MB. And I need RAM for other stuff, so the >> simplest thing is to use "normal" DBs. > > You've lost me now. You need RAM for your working set of whatever you > are acccessing at the time, doesn't matter whether it came from a file > or a DB (which is just a structured file, probably not optimised for > 300KB BLOBs) or you built it in memory, and what's not being used at the > time will be in your filesystem or in your swap partition. > > Please re-read what I wrote, to which your response was "You're right", > then consider that the total amount of data is not very relevant, what > matters is the size of your working set, mostly irrespective of its source. > > However the overhead of packing/unpacking 300KB blobs into/out of a > database can't be overlooked. > > I would suggest giving serious thought to a variant of an earlier > poster's suggestion: have the BLOBs each in its own file in the file > system, but mmap them. > > >> Using memory DBs and swapping >> them aftwerwards would not be smooth. >> >> But we are not answering my initial question ! >> >> Can I expect some gain in >> -recompiling SQLite (which options/DEFINEs would help ?) >> -using custom memory allocators (I am on Win32, in a multi-threaded >> environment, and yes, "it's bad") >> -using compression > > Compression? You tell us. What percentage compression do you get with > these 300KB BLOBs with (say) bz2? How long does it take to read in a > bz2-compressed BLOB and uncompress it compared to reading in an > uncompressed BLOB? > > Cheers, > John > I'm doing nearly the exact same thing, except my BLOBs are about 8k, and they compressed to about 2k. It turned out in my testing to be significantly faster to skip the compression step (disk space is cheap, right?) and write the data directly to the database. As to your other questions, I have no answers, and indeed, would be interested in them if you came up with some! Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
Hello! On Monday 16 March 2009 09:48:50 Pierre Chatelier wrote: > But we are not answering my initial question ! > > Can I expect some gain in > -recompiling SQLite (which options/DEFINEs would help ?) > -using custom memory allocators (I am on Win32, in a multi-threaded > environment, and yes, "it's bad") > -using compression I'm have repository with tuned SQLite and zlib-compress extension and more other. Debian etch: deb http://mobigroup.ru/debian/ etch main contrib non-free deb-src http://mobigroup.ru/debian/ etch main contrib non-free Debian lenny: deb http://mobigroup.ru/debian/ lenny main contrib non-free deb-src http://mobigroup.ru/debian/ lenny main contrib non-free Keyring: sudo aptitude install debian-mobigroup-keyring I don't have windows build but you can compile this by MinGW. Best regards. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
On 16/03/2009 5:48 PM, Pierre Chatelier wrote: >> A few hundred blocks of raw data? Blocksize approx 300K bytes? >> Database >> created and dropped by the same process? 500 blocks is approx 150M >> bytes; why not keep it in a hash table in memory? If you keep it in a >> database or the file system, it's going to be washed through your real >> memory and pagefile-aka-swap-partition anyway, so just cut out the >> middlemen :-) > > You're right, but who said I have only 1 DB at a time :-) ? You didn't say anything at all about how many DBs you have, so it wasn't you. > In fact, I have several DBs and I do not known in advance what size it > will represent. What is "it"? > Perhaps 500MB. And I need RAM for other stuff, so the > simplest thing is to use "normal" DBs. You've lost me now. You need RAM for your working set of whatever you are acccessing at the time, doesn't matter whether it came from a file or a DB (which is just a structured file, probably not optimised for 300KB BLOBs) or you built it in memory, and what's not being used at the time will be in your filesystem or in your swap partition. Please re-read what I wrote, to which your response was "You're right", then consider that the total amount of data is not very relevant, what matters is the size of your working set, mostly irrespective of its source. However the overhead of packing/unpacking 300KB blobs into/out of a database can't be overlooked. I would suggest giving serious thought to a variant of an earlier poster's suggestion: have the BLOBs each in its own file in the file system, but mmap them. > Using memory DBs and swapping > them aftwerwards would not be smooth. > > But we are not answering my initial question ! > > Can I expect some gain in > -recompiling SQLite (which options/DEFINEs would help ?) > -using custom memory allocators (I am on Win32, in a multi-threaded > environment, and yes, "it's bad") > -using compression Compression? You tell us. What percentage compression do you get with these 300KB BLOBs with (say) bz2? How long does it take to read in a bz2-compressed BLOB and uncompress it compared to reading in an uncompressed BLOB? Cheers, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
> A few hundred blocks of raw data? Blocksize approx 300K bytes? > Database > created and dropped by the same process? 500 blocks is approx 150M > bytes; why not keep it in a hash table in memory? If you keep it in a > database or the file system, it's going to be washed through your real > memory and pagefile-aka-swap-partition anyway, so just cut out the > middlemen :-) You're right, but who said I have only 1 DB at a time :-) ? In fact, I have several DBs and I do not known in advance what size it will represent. Perhaps 500MB. And I need RAM for other stuff, so the simplest thing is to use "normal" DBs. Using memory DBs and swapping them aftwerwards would not be smooth. But we are not answering my initial question ! Can I expect some gain in -recompiling SQLite (which options/DEFINEs would help ?) -using custom memory allocators (I am on Win32, in a multi-threaded environment, and yes, "it's bad") -using compression Regards, Pierre Chatelier ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
On 13/03/2009 11:24 PM, Mike Eggleston wrote: > On Fri, 13 Mar 2009, Pierre Chatelier might have said: > >> Hello, >> >> I am using SQLITE to store and retrieve raw data blocks that are >> basically ~300Ko. Each block has an int identifier, so that insert/ >> select are easy. This is a very basic use : I do not use complex >> queries. Only "INSERT/SELECT where index=..." >> >> Now, I am thinking about performance, for writing a sequence of a few >> hundreds 300k blocks, as fast as possible. >> Obviously, I use bind_blob(), blob_read() and blob_write() functions. >> I have already tuned the PRAGMAs for journal/synchronous/page_size/ >> cache, so that it's rather efficient. >> I do not DELETE any content and the whole database is dropped after >> use: VACUUM is not important. >> >> There are other ways to optimize, but I wonder if it is worth, or it >> the gain would be only marginal regarding what I am doing. >> 1)recompile SQLite ? Which compile options would help in this case ? >> 2)using other memory allocators ? I am not sure that writing big data >> blocks triggers many calls to malloc() >> 3)using compression ? zlib could help, but since my data does not >> compress very well (Let's say an average 20% space can be saved per >> block), I am not sure that the compression time will balance the >> writing time. >> >> Of course, I am only asking for advices regarding your experience, >> there is certainly no exact answer, and it will always depend on my >> data. >> >> Regards, >> >> Pierre Chatelier > > Why do you not use the int converted to a hex (sprintf("%08x", id)) > as a file name and just use the file system? > A few hundred blocks of raw data? Blocksize approx 300K bytes? Database created and dropped by the same process? 500 blocks is approx 150M bytes; why not keep it in a hash table in memory? If you keep it in a database or the file system, it's going to be washed through your real memory and pagefile-aka-swap-partition anyway, so just cut out the middlemen :-) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
On Fri, 13 Mar 2009, Pierre Chatelier might have said: > Hello, > > I am using SQLITE to store and retrieve raw data blocks that are > basically ~300Ko. Each block has an int identifier, so that insert/ > select are easy. This is a very basic use : I do not use complex > queries. Only "INSERT/SELECT where index=..." > > Now, I am thinking about performance, for writing a sequence of a few > hundreds 300k blocks, as fast as possible. > Obviously, I use bind_blob(), blob_read() and blob_write() functions. > I have already tuned the PRAGMAs for journal/synchronous/page_size/ > cache, so that it's rather efficient. > I do not DELETE any content and the whole database is dropped after > use: VACUUM is not important. > > There are other ways to optimize, but I wonder if it is worth, or it > the gain would be only marginal regarding what I am doing. > 1)recompile SQLite ? Which compile options would help in this case ? > 2)using other memory allocators ? I am not sure that writing big data > blocks triggers many calls to malloc() > 3)using compression ? zlib could help, but since my data does not > compress very well (Let's say an average 20% space can be saved per > block), I am not sure that the compression time will balance the > writing time. > > Of course, I am only asking for advices regarding your experience, > there is certainly no exact answer, and it will always depend on my > data. > > Regards, > > Pierre Chatelier Why do you not use the int converted to a hex (sprintf("%08x", id)) as a file name and just use the file system? Mike ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
Hello, > Have you considered storing the blob data into a file? > Just keep a record of the blob in the sqlite.db if you store all the > blobs in a single file then you'll need a filename, begin, length. > Or if one blob per file just the filename.. > Just an idea. Not really sure which would be faster I do not do that, because I will save the db for querying it later, and it is far simpler for archiving to manage one file only. Regards, Pierre Chatelier ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advices to get max performance with SQLITE and BLOBS
Pierre, Have you considered storing the blob data into a file? Just keep a record of the blob in the sqlite.db if you store all the blobs in a single file then you'll need a filename, begin, length. Or if one blob per file just the filename.. Just an idea. Not really sure which would be faster. --- On Fri, 3/13/09, Pierre Chatelier wrote: > From: Pierre Chatelier > Subject: [sqlite] Advices to get max performance with SQLITE and BLOBS > To: sqlite-users@sqlite.org > Date: Friday, March 13, 2009, 3:49 AM > Hello, > > I am using SQLITE to store and retrieve raw data blocks > that are > basically ~300Ko. Each block has an int identifier, so that > insert/ > select are easy. This is a very basic use : I do not use > complex > queries. Only "INSERT/SELECT where index=..." > > Now, I am thinking about performance, for writing a > sequence of a few > hundreds 300k blocks, as fast as possible. > Obviously, I use bind_blob(), blob_read() and blob_write() > functions. > I have already tuned the PRAGMAs for > journal/synchronous/page_size/ > cache, so that it's rather efficient. > I do not DELETE any content and the whole database is > dropped after > use: VACUUM is not important. > > There are other ways to optimize, but I wonder if it is > worth, or it > the gain would be only marginal regarding what I am doing. > 1)recompile SQLite ? Which compile options would help in > this case ? > 2)using other memory allocators ? I am not sure that > writing big data > blocks triggers many calls to malloc() > 3)using compression ? zlib could help, but since my data > does not > compress very well (Let's say an average 20% space can > be saved per > block), I am not sure that the compression time will > balance the > writing time. > > Of course, I am only asking for advices regarding your > experience, > there is certainly no exact answer, and it will always > depend on my > data. > > Regards, > > Pierre Chatelier > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Advices to get max performance with SQLITE and BLOBS
Hello, I am using SQLITE to store and retrieve raw data blocks that are basically ~300Ko. Each block has an int identifier, so that insert/ select are easy. This is a very basic use : I do not use complex queries. Only "INSERT/SELECT where index=..." Now, I am thinking about performance, for writing a sequence of a few hundreds 300k blocks, as fast as possible. Obviously, I use bind_blob(), blob_read() and blob_write() functions. I have already tuned the PRAGMAs for journal/synchronous/page_size/ cache, so that it's rather efficient. I do not DELETE any content and the whole database is dropped after use: VACUUM is not important. There are other ways to optimize, but I wonder if it is worth, or it the gain would be only marginal regarding what I am doing. 1)recompile SQLite ? Which compile options would help in this case ? 2)using other memory allocators ? I am not sure that writing big data blocks triggers many calls to malloc() 3)using compression ? zlib could help, but since my data does not compress very well (Let's say an average 20% space can be saved per block), I am not sure that the compression time will balance the writing time. Of course, I am only asking for advices regarding your experience, there is certainly no exact answer, and it will always depend on my data. Regards, Pierre Chatelier ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users