Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-04-04 Thread Jeff Archer
Thanks for everyone's help and thoughts on this issue.

My findings on Windows 7 Pro 64 using a PCI based SSD is that for my
smallish image the BLOBs were faster than individual files.
Basically, in line with a table that someone posted earlier in this
thread.

After many experiments, with many variations on grouping writes into
transactions, the single writer proves to be the most limiting factor.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-04-03 Thread Rob Sciuk


I'm not sure exactly what you're asking here, but if the question is 
whether to use database blobs vs files, then you might be interested in 
this technical report from Microsoft:


arxiv.org/ftp/cs/papers/0701/0701168.pdf


--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-=
Robert S. Sciuk http://www.controlq.com 259 Simcoe St. S.
Control-Q Research  tel: 905.706.1354   Oshawa, Ont.
r...@controlq.com   Canada, L1H 4H3
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-04-02 Thread Eduardo Morras
On Fri, 29 Mar 2013 15:17:52 -0400
Jeff Archer  wrote:

> I have previously made an apparently bad assumption about this so now I
> would like to go back to the beginning of the problem and ask the most
> basic question first without any preconceived ideas.
> 
> This use case is from an image processing application.  I have a large
> amount of intermediate data (way exceeds physical memory on my 24GB
> machine).  So, I need to store it temporarily on disk until getting to next
> phase of processing.  I am planning to use a large SSD dedicated to holding
> this temporary data.  I do not need any recoverability in case of hardware,
> power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
> 2 variable sized BLOBS which are images.
> 
> I could write directly to a file myself.  But I would need to provide some
> minimal indexing, some amount of housekeeping to manage variable
> sized BLOBS and some minimal synchronization so that multiple instances of
> the same application could operate simultaneously on a single set of data.
> 
> So, then I though that SQLite could manage these things nicely for me so
> that I don't have to write and debug indexing and housekeeping code that
> already exists in SQLite.
> 
> So, question is:  What is the way to get the fastest possible performance
> from SQLite when I am willing to give up all recoverability guarantees?
> Or, is it simple that I should just write directly to file myself?

Piping through gzip -6 or xz -2 will minimize bytes to write. If you are 
working with 5D images xz (7-zip LZMA fork) will do the best. 

For processing you do zcat file | processing_application or xzcat file | 
processing_application

---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-30 Thread ibrahim
Those measurements asume that you store each blob in a seperate file. So 
the raw file access seems slower for smaller blob sizes.


If you use external blob storage do it in raw clusters like i suggested 
in a previous post (size limit 32/64 MB) and store your blobs on page 
boundaries (page size 4 k 8 k aso) this will always be faster cause you 
have no b-tree pages which are always fragmented but sequential stored 
image data.


Don't use file sizes larger than 32/64 MB because the pre fetch cache of 
modern HD's can read the whole file even if you only ask for a port of 
it and the fopen command will get slower when you open large files cause 
you have to read the pagelist into an internal library buffer.


The given link is only true if you store each blob in a seperate file.

I use for my similar project raw cluster modell and thats x times faster 
than storing image data in a b-tree organized database file.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-30 Thread ibrahim

On 29.03.2013 20:17, Jeff Archer wrote:

I have previously made an apparently bad assumption about this so now I
would like to go back to the beginning of the problem and ask the most
basic question first without any preconceived ideas.

This use case is from an image processing application.  I have a large
amount of intermediate data (way exceeds physical memory on my 24GB
machine).  So, I need to store it temporarily on disk until getting to next
phase of processing.  I am planning to use a large SSD dedicated to holding
this temporary data.  I do not need any recoverability in case of hardware,
power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
2 variable sized BLOBS which are images

I could write directly to a file myself.  But I would need to provide some
minimal indexing, some amount of housekeeping to manage variable
sized BLOBS and some minimal synchronization so that multiple instances of
the same application could operate simultaneously on a single set of data.

So, then I though that SQLite could manage these things nicely for me so
that I don't have to write and debug indexing and housekeeping code that
already exists in SQLite.

So, question is:  What is the way to get the fastest possible performance
from SQLite when I am willing to give up all recoverability guarantees?
Or, is it simple that I should just write directly to file myself?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Suggestion : Put the fixed Data with small sizes into a sqlite 
database.  You won't search in the blobs with a database engine and the 
amount of data you have to process is large to make it fast you should 
write the image data into files. The other data which is necessary for 
processing ordering, indexing, searching comparision is best stored in a 
sqlite database.


To improve the speed of access for your images use full pages fill 
lesser images to the next page boundaries (as an example 4 k, 8 k ...) 
splitt long files into smaller clusters (16 to 64 MB) sequentially 
numbered this makes OS file operations faster because you have to cache 
the block index while opening and processing a file the positions can be 
indexed in sqlite.


I have a similar application for vectorized digitalization of 
handwritten old scripts and i use a database for searchable information 
while using external files (splitt as described) for raster images and 
vector files sqlite would be to slow for blobs like you need them put 
them outside but the indexes inside. Another advantage of this approach 
is that you can process many binary files simultanously while by putting 
them inside a database like sqlite you have only one writer.


The use of transactions makes inserting of data faster especially when 
you have indexes. Then try to create your indexes after fully inserting 
your data because that makes the process faster.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Dominique Devienne
On Fri, Mar 29, 2013 at 8:17 PM, Jeff Archer <
jsarc...@nanotronicsimaging.com> wrote:

> I could write directly to a file myself.  But I would need to provide some
> minimal indexing, some amount of housekeeping to manage variable
> sized BLOBS and some minimal synchronization so that multiple instances of
> the same application could operate simultaneously on a single set of data.
>

Have you looked at HDF5? Aside from synchronization, HDF5 gives you the
indexing and book-keeping. It was designed for large "structured" binary
data. By structured I mean storing n-D arrays of "structs", that can be
"chunked" or "tilled"  and compressed as desired, transparently, and
accessed in whole or parts (hyperslabs) as desired, optimizing the access
for you. An HDF5 file is basically like a file-system, with folders, and
the "files" are "datasets", and HDF5 knows what's inside each dataset (say
a 3D array of {x,y,z} doubles in a Point struct). It's structured data for
binary scientific data. You can associate "attributes" (key, value pairs,
the value's type, primitive or aggregate, known to HDF5 like for datasets)
to datasets, which are in B-trees, and the whole thing is "paged" with a
B-tree of pages and a page cache, similar to sqlite3. HDF5 is fast. I've
often wished for the ability to "type" my sqlite blobs similar to how
everything is typed in HDF5. You can in fact stuff HDF5 inside a blob,
since HDF5 as a VFL (again just like sqlite3. the parallels are many), but
you can't do much with that in-blob data using sqlite SQL, and since custom
functions can only return scalars, there's little point. Might as well
store the HDF5 files on disk next to the sqlite db file. But I guess I'm
digressing here :)

We've used a mix of SQLite3 and HDF5, which worked well for us, but we
didn't have to worry about concurrent access though. --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread David King
In case you haven't seen this: http://www.sqlite.org/intern-v-extern-blob.html 


On Friday, 29 March, 2013 at 13:52, Jeff Archer wrote:

> On Fri, Mar 29, 2013 at 4:46 PM, Richard Hipp  (mailto:d...@sqlite.org)> wrote:
> > On Fri, Mar 29, 2013 at 4:41 PM, Michael Black  > (mailto:mdblac...@yahoo.com)> wrote:
> > 
> > > I think many people would tell you not to store your images in your
> > > database.
> > > Just store a filepath to them.
> > > That will speed things up quite a bit and even possibly prevent having to
> > > use an SSD.
> > > 
> > > With the filepath your processing apps can use file locking too if you 
> > > need
> > > it.
> > > 
> > 
> > 
> > Lots of people think this. But experiments suggest otherwise, depending on
> > the size of your images.
> > 
> > For BLOBs smaller than about 100KB, it is actually faster to read them out
> > of an SQLite database than it is out of a file. Presumably the extra
> > overhead of the open/close is what makes the difference. For BLOBs larger
> > than 100KB it is faster to store them in a file. That's for reading - I've
> > never run the experiment for writing. Note also that the 100KB is highly
> > operating-system and file-system and storage-media dependent and so your
> > mileage may vary.
> > 
> 
> 
> FWIW. My tests are definitely indicating that BLOBs in database are
> significantly faster.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Jeff Archer
On Fri, Mar 29, 2013 at 4:46 PM, Richard Hipp  wrote:
> On Fri, Mar 29, 2013 at 4:41 PM, Michael Black  wrote:
>
>> I think many people would tell you not to store your images in your
>> database.
>> Just store a filepath to them.
>> That will speed things up quite a bit and even possibly prevent having to
>> use an SSD.
>>
>> With the filepath your processing apps can use file locking too if you need
>> it.
>>
>
> Lots of people think this.  But experiments suggest otherwise, depending on
> the size of your images.
>
> For BLOBs smaller than about 100KB, it is actually faster to read them out
> of an SQLite database than it is out of a file.  Presumably the extra
> overhead of the open/close is what makes the difference.  For BLOBs larger
> than 100KB it is faster to store them in a file.  That's for reading - I've
> never run the experiment for writing.  Note also that the 100KB is highly
> operating-system and file-system and storage-media dependent and so your
> mileage may vary.
>

FWIW.  My tests are definitely indicating that BLOBs in database are
significantly faster.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Richard Hipp
On Fri, Mar 29, 2013 at 4:41 PM, Michael Black  wrote:

> I think many people would tell you not to store your images in your
> database.
> Just store a filepath to them.
> That will speed things up quite a bit and even possibly prevent having to
> use an SSD.
>
> With the filepath your processing apps can use file locking too if you need
> it.
>

Lots of people think this.  But experiments suggest otherwise, depending on
the size of your images.

For BLOBs smaller than about 100KB, it is actually faster to read them out
of an SQLite database than it is out of a file.  Presumably the extra
overhead of the open/close is what makes the difference.  For BLOBs larger
than 100KB it is faster to store them in a file.  That's for reading - I've
never run the experiment for writing.  Note also that the 100KB is highly
operating-system and file-system and storage-media dependent and so your
mileage may vary.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Michael Black
I think many people would tell you not to store your images in your
database.
Just store a filepath to them.
That will speed things up quite a bit and even possibly prevent having to
use an SSD.

With the filepath your processing apps can use file locking too if you need
it.


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jeff Archer
Sent: Friday, March 29, 2013 2:18 PM
To: SQLite-user.org
Subject: [sqlite] How to achieve fastest possible write performance for a
strange and limited case

I have previously made an apparently bad assumption about this so now I
would like to go back to the beginning of the problem and ask the most
basic question first without any preconceived ideas.

This use case is from an image processing application.  I have a large
amount of intermediate data (way exceeds physical memory on my 24GB
machine).  So, I need to store it temporarily on disk until getting to next
phase of processing.  I am planning to use a large SSD dedicated to holding
this temporary data.  I do not need any recoverability in case of hardware,
power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
2 variable sized BLOBS which are images.

I could write directly to a file myself.  But I would need to provide some
minimal indexing, some amount of housekeeping to manage variable
sized BLOBS and some minimal synchronization so that multiple instances of
the same application could operate simultaneously on a single set of data.

So, then I though that SQLite could manage these things nicely for me so
that I don't have to write and debug indexing and housekeeping code that
already exists in SQLite.

So, question is:  What is the way to get the fastest possible performance
from SQLite when I am willing to give up all recoverability guarantees?
Or, is it simple that I should just write directly to file myself?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Dominique Pellé
Pavel Ivanov  wrote:

> On Fri, Mar 29, 2013 at 12:17 PM, Jeff Archer
>  wrote:
>> I have previously made an apparently bad assumption about this so now I
>> would like to go back to the beginning of the problem and ask the most
>> basic question first without any preconceived ideas.
>>
>> This use case is from an image processing application.  I have a large
>> amount of intermediate data (way exceeds physical memory on my 24GB
>> machine).  So, I need to store it temporarily on disk until getting to next
>> phase of processing.  I am planning to use a large SSD dedicated to holding
>> this temporary data.  I do not need any recoverability in case of hardware,
>> power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
>> 2 variable sized BLOBS which are images.
>>
>> I could write directly to a file myself.  But I would need to provide some
>> minimal indexing, some amount of housekeeping to manage variable
>> sized BLOBS and some minimal synchronization so that multiple instances of
>> the same application could operate simultaneously on a single set of data.
>>
>> So, then I though that SQLite could manage these things nicely for me so
>> that I don't have to write and debug indexing and housekeeping code that
>> already exists in SQLite.
>>
>> So, question is:  What is the way to get the fastest possible performance
>> from SQLite when I am willing to give up all recoverability guarantees?
>
> Use
> pragma journal_mode = off;
> pragma synchronous = off;
> pragma locking_mode = exclusive;
>
> In addition to that you may issue BEGIN statement at the beginning of
> the application and never COMMIT.


Yes that should be the fastest.

In addition:

* make sure that you prepare your INSERT query just once
  before the insert loop, and use bind/step/reset in the
  insertion loop.

* avoid duplicating the implicit uid index with another
  primary key by using INTEGER PRIMARY KEY
  and not something like INT PRIMARY KEY

* if you have indexes, make sure that you create them
  after all the inserts, rather than before.

* if you have several tables to populate, you can consider
  storing them in different databases, and populate them
  in parallel in different processes or threads. Then later
  you can ATTACH all of them to see them as a unique
  database.

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Pavel Ivanov
On Fri, Mar 29, 2013 at 12:17 PM, Jeff Archer
 wrote:
> I have previously made an apparently bad assumption about this so now I
> would like to go back to the beginning of the problem and ask the most
> basic question first without any preconceived ideas.
>
> This use case is from an image processing application.  I have a large
> amount of intermediate data (way exceeds physical memory on my 24GB
> machine).  So, I need to store it temporarily on disk until getting to next
> phase of processing.  I am planning to use a large SSD dedicated to holding
> this temporary data.  I do not need any recoverability in case of hardware,
> power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
> 2 variable sized BLOBS which are images.
>
> I could write directly to a file myself.  But I would need to provide some
> minimal indexing, some amount of housekeeping to manage variable
> sized BLOBS and some minimal synchronization so that multiple instances of
> the same application could operate simultaneously on a single set of data.
>
> So, then I though that SQLite could manage these things nicely for me so
> that I don't have to write and debug indexing and housekeeping code that
> already exists in SQLite.
>
> So, question is:  What is the way to get the fastest possible performance
> from SQLite when I am willing to give up all recoverability guarantees?

Use
pragma journal_mode = off;
pragma synchronous = off;
pragma locking_mode = exclusive;

In addition to that you may issue BEGIN statement at the beginning of
the application and never COMMIT.


Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Simon Slavin

On 29 Mar 2013, at 7:17pm, Jeff Archer  wrote:

> This use case is from an image processing application.  I have a large
> amount of intermediate data (way exceeds physical memory on my 24GB
> machine).  So, I need to store it temporarily on disk until getting to next
> phase of processing.  I am planning to use a large SSD dedicated to holding
> this temporary data.  I do not need any recoverability in case of hardware,
> power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
> 2 variable sized BLOBS which are images.
> 
> I could write directly to a file myself.

That's what I thought: just blat it straight out to a text file in append mode. 
 Organise it later.  But if you don't want to do that ...

> But I would need to provide some
> minimal indexing, some amount of housekeeping to manage variable
> sized BLOBS and some minimal synchronization so that multiple instances of
> the same application could operate simultaneously on a single set of data.

Locking between different simultaneous apps is going to kill you.  Do you 
really need, at this stage, one app to access data written by different writers 
?  If not, have each app write to a different database.  Then use a different 
app running on a different computer (or at least in a different process) to 
merge the data into one big file, either simultaneously or after 
data-collection is finished.

> So, then I though that SQLite could manage these things nicely for me so
> that I don't have to write and debug indexing and housekeeping code that
> already exists in SQLite.
> 
> So, question is:  What is the way to get the fastest possible performance
> from SQLite when I am willing to give up all recoverability guarantees?
> Or, is it simple that I should just write directly to file myself?



Journalling off and Synchronous off pragmas.  And I note you're asking for 
'fastest' whereas you really just want 'fast enough'.

However I note that you seem to have >24Gig of data to write.  At that size, 
the speed of SQLite isn't your limiting factor.  Instead you have to worry 
about the speed of your storage medium.  With 4K sectors, writing 24Gig of data 
means you're writing 3 million sectors.  If you're writing to rotating hard 
disks that means you'll be waiting for 3 million sectors to be rotated into the 
right place.  Even at 10,000 RPM that's a lot of waiting.  Your biggest speed 
increase isn't going to come from clever programming, it's going to come from 
moving to SSD.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread Jeff Archer
I have previously made an apparently bad assumption about this so now I
would like to go back to the beginning of the problem and ask the most
basic question first without any preconceived ideas.

This use case is from an image processing application.  I have a large
amount of intermediate data (way exceeds physical memory on my 24GB
machine).  So, I need to store it temporarily on disk until getting to next
phase of processing.  I am planning to use a large SSD dedicated to holding
this temporary data.  I do not need any recoverability in case of hardware,
power or other failure.   Each item to be stored is 9 DWORDs, 4 doubles and
2 variable sized BLOBS which are images.

I could write directly to a file myself.  But I would need to provide some
minimal indexing, some amount of housekeeping to manage variable
sized BLOBS and some minimal synchronization so that multiple instances of
the same application could operate simultaneously on a single set of data.

So, then I though that SQLite could manage these things nicely for me so
that I don't have to write and debug indexing and housekeeping code that
already exists in SQLite.

So, question is:  What is the way to get the fastest possible performance
from SQLite when I am willing to give up all recoverability guarantees?
Or, is it simple that I should just write directly to file myself?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users