Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
Thanks for everyone's help and thoughts on this issue. My findings on Windows 7 Pro 64 using a PCI based SSD is that for my smallish image the BLOBs were faster than individual files. Basically, in line with a table that someone posted earlier in this thread. After many experiments, with many variations on grouping writes into transactions, the single writer proves to be the most limiting factor. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
I'm not sure exactly what you're asking here, but if the question is whether to use database blobs vs files, then you might be interested in this technical report from Microsoft: arxiv.org/ftp/cs/papers/0701/0701168.pdf -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-= Robert S. Sciuk http://www.controlq.com 259 Simcoe St. S. Control-Q Research tel: 905.706.1354 Oshawa, Ont. r...@controlq.com Canada, L1H 4H3 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On Fri, 29 Mar 2013 15:17:52 -0400 Jeff Archer jsarc...@nanotronicsimaging.com wrote: I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Or, is it simple that I should just write directly to file myself? Piping through gzip -6 or xz -2 will minimize bytes to write. If you are working with 5D images xz (7-zip LZMA fork) will do the best. For processing you do zcat file | processing_application or xzcat file | processing_application --- --- Eduardo Morras emorr...@yahoo.es ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On 29.03.2013 20:17, Jeff Archer wrote: I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Or, is it simple that I should just write directly to file myself? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users Suggestion : Put the fixed Data with small sizes into a sqlite database. You won't search in the blobs with a database engine and the amount of data you have to process is large to make it fast you should write the image data into files. The other data which is necessary for processing ordering, indexing, searching comparision is best stored in a sqlite database. To improve the speed of access for your images use full pages fill lesser images to the next page boundaries (as an example 4 k, 8 k ...) splitt long files into smaller clusters (16 to 64 MB) sequentially numbered this makes OS file operations faster because you have to cache the block index while opening and processing a file the positions can be indexed in sqlite. I have a similar application for vectorized digitalization of handwritten old scripts and i use a database for searchable information while using external files (splitt as described) for raster images and vector files sqlite would be to slow for blobs like you need them put them outside but the indexes inside. Another advantage of this approach is that you can process many binary files simultanously while by putting them inside a database like sqlite you have only one writer. The use of transactions makes inserting of data faster especially when you have indexes. Then try to create your indexes after fully inserting your data because that makes the process faster. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
Those measurements asume that you store each blob in a seperate file. So the raw file access seems slower for smaller blob sizes. If you use external blob storage do it in raw clusters like i suggested in a previous post (size limit 32/64 MB) and store your blobs on page boundaries (page size 4 k 8 k aso) this will always be faster cause you have no b-tree pages which are always fragmented but sequential stored image data. Don't use file sizes larger than 32/64 MB because the pre fetch cache of modern HD's can read the whole file even if you only ask for a port of it and the fopen command will get slower when you open large files cause you have to read the pagelist into an internal library buffer. The given link is only true if you store each blob in a seperate file. I use for my similar project raw cluster modell and thats x times faster than storing image data in a b-tree organized database file. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] How to achieve fastest possible write performance for a strange and limited case
I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Or, is it simple that I should just write directly to file myself? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On 29 Mar 2013, at 7:17pm, Jeff Archer jsarc...@nanotronicsimaging.com wrote: This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. That's what I thought: just blat it straight out to a text file in append mode. Organise it later. But if you don't want to do that ... But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. Locking between different simultaneous apps is going to kill you. Do you really need, at this stage, one app to access data written by different writers ? If not, have each app write to a different database. Then use a different app running on a different computer (or at least in a different process) to merge the data into one big file, either simultaneously or after data-collection is finished. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Or, is it simple that I should just write directly to file myself? http://www.sqlite.org/pragma.html Journalling off and Synchronous off pragmas. And I note you're asking for 'fastest' whereas you really just want 'fast enough'. However I note that you seem to have 24Gig of data to write. At that size, the speed of SQLite isn't your limiting factor. Instead you have to worry about the speed of your storage medium. With 4K sectors, writing 24Gig of data means you're writing 3 million sectors. If you're writing to rotating hard disks that means you'll be waiting for 3 million sectors to be rotated into the right place. Even at 10,000 RPM that's a lot of waiting. Your biggest speed increase isn't going to come from clever programming, it's going to come from moving to SSD. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On Fri, Mar 29, 2013 at 12:17 PM, Jeff Archer jsarc...@nanotronicsimaging.com wrote: I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Use pragma journal_mode = off; pragma synchronous = off; pragma locking_mode = exclusive; In addition to that you may issue BEGIN statement at the beginning of the application and never COMMIT. Pavel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
Pavel Ivanov paiva...@gmail.com wrote: On Fri, Mar 29, 2013 at 12:17 PM, Jeff Archer jsarc...@nanotronicsimaging.com wrote: I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Use pragma journal_mode = off; pragma synchronous = off; pragma locking_mode = exclusive; In addition to that you may issue BEGIN statement at the beginning of the application and never COMMIT. Yes that should be the fastest. In addition: * make sure that you prepare your INSERT query just once before the insert loop, and use bind/step/reset in the insertion loop. * avoid duplicating the implicit uid index with another primary key by using INTEGER PRIMARY KEY and not something like INT PRIMARY KEY * if you have indexes, make sure that you create them after all the inserts, rather than before. * if you have several tables to populate, you can consider storing them in different databases, and populate them in parallel in different processes or threads. Then later you can ATTACH all of them to see them as a unique database. Regards Dominique ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
I think many people would tell you not to store your images in your database. Just store a filepath to them. That will speed things up quite a bit and even possibly prevent having to use an SSD. With the filepath your processing apps can use file locking too if you need it. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jeff Archer Sent: Friday, March 29, 2013 2:18 PM To: SQLite-user.org Subject: [sqlite] How to achieve fastest possible write performance for a strange and limited case I have previously made an apparently bad assumption about this so now I would like to go back to the beginning of the problem and ask the most basic question first without any preconceived ideas. This use case is from an image processing application. I have a large amount of intermediate data (way exceeds physical memory on my 24GB machine). So, I need to store it temporarily on disk until getting to next phase of processing. I am planning to use a large SSD dedicated to holding this temporary data. I do not need any recoverability in case of hardware, power or other failure. Each item to be stored is 9 DWORDs, 4 doubles and 2 variable sized BLOBS which are images. I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. So, then I though that SQLite could manage these things nicely for me so that I don't have to write and debug indexing and housekeeping code that already exists in SQLite. So, question is: What is the way to get the fastest possible performance from SQLite when I am willing to give up all recoverability guarantees? Or, is it simple that I should just write directly to file myself? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On Fri, Mar 29, 2013 at 4:41 PM, Michael Black mdblac...@yahoo.com wrote: I think many people would tell you not to store your images in your database. Just store a filepath to them. That will speed things up quite a bit and even possibly prevent having to use an SSD. With the filepath your processing apps can use file locking too if you need it. Lots of people think this. But experiments suggest otherwise, depending on the size of your images. For BLOBs smaller than about 100KB, it is actually faster to read them out of an SQLite database than it is out of a file. Presumably the extra overhead of the open/close is what makes the difference. For BLOBs larger than 100KB it is faster to store them in a file. That's for reading - I've never run the experiment for writing. Note also that the 100KB is highly operating-system and file-system and storage-media dependent and so your mileage may vary. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On Fri, Mar 29, 2013 at 4:46 PM, Richard Hipp d...@sqlite.org wrote: On Fri, Mar 29, 2013 at 4:41 PM, Michael Black mdblac...@yahoo.com wrote: I think many people would tell you not to store your images in your database. Just store a filepath to them. That will speed things up quite a bit and even possibly prevent having to use an SSD. With the filepath your processing apps can use file locking too if you need it. Lots of people think this. But experiments suggest otherwise, depending on the size of your images. For BLOBs smaller than about 100KB, it is actually faster to read them out of an SQLite database than it is out of a file. Presumably the extra overhead of the open/close is what makes the difference. For BLOBs larger than 100KB it is faster to store them in a file. That's for reading - I've never run the experiment for writing. Note also that the 100KB is highly operating-system and file-system and storage-media dependent and so your mileage may vary. FWIW. My tests are definitely indicating that BLOBs in database are significantly faster. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
In case you haven't seen this: http://www.sqlite.org/intern-v-extern-blob.html On Friday, 29 March, 2013 at 13:52, Jeff Archer wrote: On Fri, Mar 29, 2013 at 4:46 PM, Richard Hipp d...@sqlite.org (mailto:d...@sqlite.org) wrote: On Fri, Mar 29, 2013 at 4:41 PM, Michael Black mdblac...@yahoo.com (mailto:mdblac...@yahoo.com) wrote: I think many people would tell you not to store your images in your database. Just store a filepath to them. That will speed things up quite a bit and even possibly prevent having to use an SSD. With the filepath your processing apps can use file locking too if you need it. Lots of people think this. But experiments suggest otherwise, depending on the size of your images. For BLOBs smaller than about 100KB, it is actually faster to read them out of an SQLite database than it is out of a file. Presumably the extra overhead of the open/close is what makes the difference. For BLOBs larger than 100KB it is faster to store them in a file. That's for reading - I've never run the experiment for writing. Note also that the 100KB is highly operating-system and file-system and storage-media dependent and so your mileage may vary. FWIW. My tests are definitely indicating that BLOBs in database are significantly faster. ___ sqlite-users mailing list sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org) http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case
On Fri, Mar 29, 2013 at 8:17 PM, Jeff Archer jsarc...@nanotronicsimaging.com wrote: I could write directly to a file myself. But I would need to provide some minimal indexing, some amount of housekeeping to manage variable sized BLOBS and some minimal synchronization so that multiple instances of the same application could operate simultaneously on a single set of data. Have you looked at HDF5? Aside from synchronization, HDF5 gives you the indexing and book-keeping. It was designed for large structured binary data. By structured I mean storing n-D arrays of structs, that can be chunked or tilled and compressed as desired, transparently, and accessed in whole or parts (hyperslabs) as desired, optimizing the access for you. An HDF5 file is basically like a file-system, with folders, and the files are datasets, and HDF5 knows what's inside each dataset (say a 3D array of {x,y,z} doubles in a Point struct). It's structured data for binary scientific data. You can associate attributes (key, value pairs, the value's type, primitive or aggregate, known to HDF5 like for datasets) to datasets, which are in B-trees, and the whole thing is paged with a B-tree of pages and a page cache, similar to sqlite3. HDF5 is fast. I've often wished for the ability to type my sqlite blobs similar to how everything is typed in HDF5. You can in fact stuff HDF5 inside a blob, since HDF5 as a VFL (again just like sqlite3. the parallels are many), but you can't do much with that in-blob data using sqlite SQL, and since custom functions can only return scalars, there's little point. Might as well store the HDF5 files on disk next to the sqlite db file. But I guess I'm digressing here :) We've used a mix of SQLite3 and HDF5, which worked well for us, but we didn't have to worry about concurrent access though. --DD ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users