Re: [sqlite] Blob incremental i/o via Python

2008-02-15 Thread Brian Smith
Norman Young wrote:
> D. Richard Hipp wrote:
> > It turns out to be faster to read images directly out of
> > SQLite BLOBs until the image gets up to about 15KB on
> > windows and up to about 60KB on linux.  
> > And even for much larger images, the performance difference
> > between reading from SQLite and reading from a file is not
> > that great, so it is a reasonable thing to do to read from
> > SQLite if transactions are important to you or if it is
> > just more convenient.

> Reliability is paramount. If there is no way to achieve 
> transactional semantics using files outside of SQLite, then 
> any discussions about performance and programmer convenience 
> are purely academic. The application must behave deterministically.

I have done this as folllows:

1. Every file is given a unique name using mkstemp(). This is not fast.

2. Every file is written only once, and never overwritten. If a file
needs to be replaced, a new file (with a new name) is created, and the
link to the file in the SQLite database is updated.

3. The file must be fsync()'d, and its directory must be fsync()'d,
before any attempt is made to update the links that are in the SQLite
database. Then, the SQLite database is committed, which results in at
least two more fdatasync()s. So, basically, you have to call the slowest
filesystem operation four times every time you update a file.

4. File deletion is slow, so I have a seperate process that periodically
runs to delete all the replaced files asynchronously with any requests.

All of this makes inserting/updating files very slow compared to just
stuffing them into SQLite. It will only have a raw performance advantage
over storing files in SQLite when the files are large, when I cache file
descriptors (so I don't have to repeatedly call open() to read files),
and when I am using optimized I/O calls like sendfile(), or mmap()ing
into shared memory. It *does* have the advantage that you can
significantly reduce the amount of time the database is locked when
files are inserted/updated.

- Brian

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-14 Thread Samuel Neff
Thanks for the correction.

Sam


On Thu, Feb 14, 2008 at 6:19 PM, <[EMAIL PROTECTED]> wrote:

> "Samuel Neff" <[EMAIL PROTECTED]> wrote:
> > If the images you're storing are larger than the defined page size for
> the
> > database (which is most likely the case) then you can get better
> performance
> > and reduced memory consumption by storing the images in the file system
> and
> > store only paths to the files in the database.  This means reading the
> large
> > amount of data directly from the file system instead of from sqlite's
> > linked-list of pages and bypassing the page caching layer (which you
> > probably don't want for images anyways) and freeing up more of the page
> > cache for real database data.
> >
>
> One would think.  And yet experiments suggest otherwise.  It
> turns out to be faster to read images directly out of SQLite
> BLOBs until the image gets up to about 15KB on windows and
> up to about 60KB on linux.  And even for much larger images,
> the performance difference between reading from SQLite and
> reading from a file is not that great, so it is a reasonable
> thing to do to read from SQLite if transactions are important
> to you or if it is just more convenient.
>
> --
> D. Richard Hipp <[EMAIL PROTECTED]>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-14 Thread Norman Young
On Thu, Feb 14, 2008 at 6:19 PM, <[EMAIL PROTECTED]> wrote:

> "Samuel Neff" <[EMAIL PROTECTED]> wrote:
> > If the images you're storing are larger than the defined page size for
> the
> > database (which is most likely the case) then you can get better
> performance
> > and reduced memory consumption by storing the images in the file system
> and
> > store only paths to the files in the database.  This means reading the
> large
> > amount of data directly from the file system instead of from sqlite's
> > linked-list of pages and bypassing the page caching layer (which you
> > probably don't want for images anyways) and freeing up more of the page
> > cache for real database data.
> >
>
> One would think.  And yet experiments suggest otherwise.  It
> turns out to be faster to read images directly out of SQLite
> BLOBs until the image gets up to about 15KB on windows and
> up to about 60KB on linux.  And even for much larger images,
> the performance difference between reading from SQLite and
> reading from a file is not that great, so it is a reasonable
> thing to do to read from SQLite if transactions are important
> to you or if it is just more convenient.
>
> --
> D. Richard Hipp <[EMAIL PROTECTED]>
>
> 
>

How much larger? (Have the experiments been published?)

These are exactly the factors we're trading off: performance versus
reliability and programmer convenience.

Reliability is paramount. If there is no way to achieve transactional
semantics using files outside of SQLite, then any discussions about
performance and programmer convenience are purely academic. The application
must behave deterministically.

Performance is next most important. Our images are typically 20k JPEGs, as
captured with the current Nokia Tablet's VGA camera. Our audio streams are
encoding with Speex at about 2k/second, so a typical 1-minute recording is
about 120k. However, audio recordings can grow arbitrarily large, so
performance with 10-minute (1.2M) or 100-minute (12M) recordings could be
relevant. Also, we can anticipate video recordings in the future, with a
steeper slope.

Naturally, we are willing to sacrifice programmer convenience for the sake
of end-user reliability and performance. In this case, it appears that
reliability and programmer convenience are congruent, with transactions.

Thanks for a great database, Dr. Hipp. It suits the burgeoning portable
applications perfectly. It's transactional incremental I/O enables rich
media on  portable devices.

Norm.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-14 Thread drh
"Samuel Neff" <[EMAIL PROTECTED]> wrote:
> If the images you're storing are larger than the defined page size for the
> database (which is most likely the case) then you can get better performance
> and reduced memory consumption by storing the images in the file system and
> store only paths to the files in the database.  This means reading the large
> amount of data directly from the file system instead of from sqlite's
> linked-list of pages and bypassing the page caching layer (which you
> probably don't want for images anyways) and freeing up more of the page
> cache for real database data.
> 

One would think.  And yet experiments suggest otherwise.  It
turns out to be faster to read images directly out of SQLite
BLOBs until the image gets up to about 15KB on windows and
up to about 60KB on linux.  And even for much larger images,
the performance difference between reading from SQLite and
reading from a file is not that great, so it is a reasonable
thing to do to read from SQLite if transactions are important
to you or if it is just more convenient.

--
D. Richard Hipp <[EMAIL PROTECTED]>

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-14 Thread Samuel Neff
If the images you're storing are larger than the defined page size for the
database (which is most likely the case) then you can get better performance
and reduced memory consumption by storing the images in the file system and
store only paths to the files in the database.  This means reading the large
amount of data directly from the file system instead of from sqlite's
linked-list of pages and bypassing the page caching layer (which you
probably don't want for images anyways) and freeing up more of the page
cache for real database data.

HTH,

Sam


On Mon, Feb 11, 2008 at 10:29 PM, Norman Young <[EMAIL PROTECTED]>
wrote:

> The documentation outlines the C interface for incremental blob
> input/output, and mentions the C typedef for a blob handle.
>
> http://www.sqlite.org/c3ref/blob_open.html
> http://www.sqlite.org/c3ref/blob.html
>
> typedef struct sqlite3_blob sqlite3_blob;
>
>
> Can this same interface be accessed in Python?
>
> My application manipulates image and audio content. However, on my
> constrained platform (Python 2.5, sqlite3, Maemo, Nokia Internet Tablet),
> reading and writing the media data via the conventional blob interface (as
> illustrated in teh following www.initd.org SnippetsBlobs.py example) could
> consume excessive memory. Specifically, all of the binary data are read
> into
> blobdata object at once, via the read() call. Since the media files can be
> arbitrarily large, this call could easily exceed available memory.
>
> *blob*data = *open*('c:\\*sqlite*3\\img.jpg','rb').read()
>
> con = *sqlite*.connect(':memory:')
> cur = con.cursor()
>
> cur.execute("Create table picture_table(images)")
> cur.execute("Insert into picture_table(images) values
> (?)",(*sqlite*.Binary(*blob*data),))
> con.commit()
>
>
> Instead, I need to incrementally read and write media data to a blob, to
> avoid consuming arbitrary memory.
>
> Can this be done from Python via the sqlite3 module? Can you point me to
> examples?
>
> Thanks.
>
> Norm.
>
> www.nbyoung.com
> www.personalsyndication.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-14 Thread Norman Young
On Wed, Feb 13, 2008 at 7:54 PM, Roger Binns <[EMAIL PROTECTED]> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Norman Young wrote:
> > We had to exclude references to sqlite3_enable_load_extension and
> > sqlite3_load_extension from within apsw.c, in order to avoid undefined
> > symbol references during the build (python setup.py install).
>
> You should have modified the setup.py to include a define of
> SQLITE_OMIT_LOAD_EXTENSION.
>
> The general theory is that you should create/compile SQLite with
> whatever flags you want (you can exclude many features) and then provide
> the same flags when compiling apsw.  Unfortunately I don't know of any
> way of automatically finding out what flags/inclusions/exclusions SQLite
> was compiled with.
>
> Roger
>

Hey, hey. Right your are!

In the apsw setup.py:
define_macros.append( ('SQLITE_OMIT_LOAD_EXTENSION', '1') )

We had discovered this line of reasoning earlier, but we must have made a
mistake in making the adjustment. Thanks for the prompting. We now have a
properly-compiled apsw module.

Norm.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-13 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Norman Young wrote:
> We had to exclude references to sqlite3_enable_load_extension and
> sqlite3_load_extension from within apsw.c, in order to avoid undefined
> symbol references during the build (python setup.py install).

You should have modified the setup.py to include a define of
SQLITE_OMIT_LOAD_EXTENSION.

The general theory is that you should create/compile SQLite with
whatever flags you want (you can exclude many features) and then provide
the same flags when compiling apsw.  Unfortunately I don't know of any
way of automatically finding out what flags/inclusions/exclusions SQLite
was compiled with.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHs5FbmOOfHg372QQRAojNAKCloFVR3YPu6fnHFnOjnqMj7utF9QCg07tU
GNBSMYjUti27yR1Q3oOaihw=
=I5cd
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-13 Thread Norman Young
On Feb 12, 2008 4:04 AM, Roger Binns <[EMAIL PROTECTED]> wrote:

>
> Norman Young wrote:
> > Can this same interface be accessed in Python?
>
> That API has to be wrapped.
>
> > Can this be done from Python via the sqlite3 module? Can you point me to
> > examples?
>
> I have implemented it in APSW in the development code (ie not released
> yet).  You can find the source on the python sqlite site
> http://www.initd.org/tracker/pysqlite/wiki/APSW#Developmentversion
> ... 
> Roger


Thanks for your quick response, Roger.

We have compiled the apsw module in the Maemo OS2008 (chinook) SDK. We
successfully imported the Python module, both within the SDK, and on the
Tablet. We have not yet specifically tested the incremental blob i/o access,
but I expect that to go well.

We had to exclude references to sqlite3_enable_load_extension and
sqlite3_load_extension from within apsw.c, in order to avoid undefined
symbol references during the build (python setup.py install).

static PyObject *
Connection_enableloadextension(Connection *self, PyObject *enabled)
{
#if 0
...
#endif
}
...
static PyObject *
Connection_loadextension(Connection *self, PyObject *args)
{
#if 0
...
#endif
}

We presume that the symbol resolution failure arose from shortcomings in the
SDK cross-compilation environment. Since we do not need sqlite3 to load
extensions, this work-around serves our purposes. Perhaps more ambitious
Maemo SDK users will investigate further than we have.

Thanks for the great module. We look forward to using blob incremental i/o
via Python in our continuing development. It enables an important
improvement in our design.

Norm.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Blob incremental i/o via Python

2008-02-12 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Norman Young wrote:
> Can this same interface be accessed in Python?

That API has to be wrapped.

> Can this be done from Python via the sqlite3 module? Can you point me to
> examples?

I have implemented it in APSW in the development code (ie not released
yet).  You can find the source on the python sqlite site
http://www.initd.org/tracker/pysqlite/wiki/APSW#Developmentversion

The api makes it look like a regular python style file object (ie with
read/write/close/seek/tell methods).  To open one there is a blobopen
method on connections which essentially takes the same parameters as the
C function http://sqlite.org/c3ref/blob_open.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHsWEUmOOfHg372QQRAgqkAJ4z1cQpbWVfDpQhx0fVmv/eZizD2ACfbBcY
P6ggdol0itPywxc7tciPbiU=
=6QUU
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Blob incremental i/o via Python

2008-02-11 Thread Norman Young
The documentation outlines the C interface for incremental blob
input/output, and mentions the C typedef for a blob handle.

http://www.sqlite.org/c3ref/blob_open.html
http://www.sqlite.org/c3ref/blob.html

typedef struct sqlite3_blob sqlite3_blob;


Can this same interface be accessed in Python?

My application manipulates image and audio content. However, on my
constrained platform (Python 2.5, sqlite3, Maemo, Nokia Internet Tablet),
reading and writing the media data via the conventional blob interface (as
illustrated in teh following www.initd.org SnippetsBlobs.py example) could
consume excessive memory. Specifically, all of the binary data are read into
blobdata object at once, via the read() call. Since the media files can be
arbitrarily large, this call could easily exceed available memory.

*blob*data = *open*('c:\\*sqlite*3\\img.jpg','rb').read()

con = *sqlite*.connect(':memory:')
cur = con.cursor()

cur.execute("Create table picture_table(images)")
cur.execute("Insert into picture_table(images) values
(?)",(*sqlite*.Binary(*blob*data),))
con.commit()


Instead, I need to incrementally read and write media data to a blob, to
avoid consuming arbitrary memory.

Can this be done from Python via the sqlite3 module? Can you point me to
examples?

Thanks.

Norm.

www.nbyoung.com
www.personalsyndication.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users