[sqlite] SQLite performance graph not updated + 3 typos

2018-04-09 Thread Dominique Pellé
Hi

The SQLite performance graph at https://www.sqlite.org/cpu.html
has not been updated since SQLite-3.21 (currently release being
SQLite-3.23).

Was it forgotten, or is the graph updated only once in a while?

Also here are 3 typos found at https://www.sqlite.org/cpu.html
in the same sentence:

Recent *version* *so* SQLite use less *then* a third of the CPU [...]

->
Recent versions of SQLite use less than a third of the CPU [...]

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Sqlite performance on windows vs linux

2016-03-01 Thread Simon Slavin

On 1 Mar 2016, at 1:54pm, Alvi, Tariq  wrote:

> We have been using sqlite on a windows server for a while and the traffic has 
> been increasing steadily. Meaning incoming data volume(inserts) and user 
> queries(select) have both increased. I was wondering if there have been any 
> performance comparisons between windows and linux using similar hardware? 

Yeah.  I've done them.  Windows vs. Linux vs. Mac OS X.

There are differences between them but they are extremely small.  The 
differences caused by different hard disks, different motherboards and 
different CPUs are far greater.  I even spotted different timings depending on 
which version of Windows you're using.  I think it was Windows XP 64-bit vs. 
Windows 7.

Simon.


[sqlite] Sqlite performance on windows vs linux

2016-03-01 Thread Alvi, Tariq
We have been using sqlite on a windows server for a while and the traffic has 
been increasing steadily. Meaning incoming data volume(inserts) and user 
queries(select) have both increased. I was wondering if there have been any 
performance comparisons between windows and linux using similar hardware? This 
would actually determine how we scale going forward as well as what kind of 
developers we hire so any feedback on benchmarks would be greatly appreciated.
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

This message is for the named person's use only. This communication is for 
informational purposes only and has been obtained from sources believed to be 
reliable, but it is not necessarily complete and its accuracy cannot be 
guaranteed. It is not intended as an offer or solicitation for the purchase or 
sale of any financial instrument or as an official confirmation of any 
transaction. Moreover, this material should not be construed to contain any 
recommendation regarding, or opinion concerning, any security. It may contain 
confidential, proprietary or legally privileged information. No confidentiality 
or privilege is waived or lost by any mistransmission. If you receive this 
message in error, please immediately delete it and all copies of it from your 
system, destroy any hard copies of it and notify the sender. You must not, 
directly or indirectly, use, disclose, distribute, print, or copy any part of 
this message if you are not the intended recipient. Any views expressed in this 
message are those of the individual sender, except where the message states 
otherwise and the sender is authorized to state them to be the views of any 
such entity.

Securities products and services provided to Canadian investors are offered by 
ITG Canada Corp. (member CIPF and IIROC - Investment Industry Regulatory 
Organization of Canada), an affiliate of Investment Technology Group, Inc.

Investment research products and services are produced and offered by ITG 
Investment Research, Inc. and not ITG Inc. (a FINRA member firm and SIPC 
member).

For CANADIANS Only: In accordance with the Canadian AntiSpam Legislation, ITG 
Canada has implied consent from you as a member of the investment community 
with whom we have already established a relationship through business 
discussions or dealings, or your email address was made available to ITG. 
However, if you are Canadian and wish to stop receiving ANY emails from ITG, 
please click on this button Unsubscribe . Please note that this does not apply if you are 
an existing client, and IIROC & OSC rules & regulations require us to continue 
to send you critical email communications

ITG Inc. and/or its affiliates reserves the right to monitor and archive all 
electronic communications through its network.

ITG Inc. Member FINRA, SIPC

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-


Re: [sqlite] SQLite performance with NOR FLASH

2014-06-26 Thread Simon Slavin

On 26 Jun 2014, at 7:42am, Vivek Ranjan  wrote:

> Code looks like this:

Thanks.  I was wondering whether you called _step() with strange value but 
you're calling it with -1, which seems to be the best thing to do in your case. 
 And I don't see anything else wrong with your code.

I hope one of the devs can look into this.  Writing any disk file four bytes at 
a time is going to be bad not only for Flash in mini devices but also for 
computers which use SSD for main filestore.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with NOR FLASH

2014-06-26 Thread Vivek Ranjan
Hello Simon,

Code looks like this:


 /* move content */
  sqlite3_backup *pBackup;

  pBackup = sqlite3_backup_init(destDBHandle, "main", sourceDBHandle,
"main");

  if(pBackup)
  {
 int sqlErrno;
 if ((sqlErrno = sqlite3_backup_step(pBackup, -1)) != SQLITE_DONE)
 {
Tr_Err("SwapDB sqlite3_backup_step sqlErrno %d", sqlErrno);

result = false;
 }
 if ((sqlErrno = sqlite3_backup_finish(pBackup)) != SQLITE_OK)
 {
Tr_Err("SwapDB sqlite3_backup_finish sqlErrno %d", sqlErrno);

result = false;
 }
  }
  else
  {
 Tr_Err("SwapDB could not be init");
 result = false;
  }


Br
Vivek


On Wed, Jun 25, 2014 at 10:14 PM, Vivek Ranjan 
wrote:

> Dear All,
>
> I am unsing SQLite on NOR Flash with Sqlite version 3.8.5 with default
> setting i.e. sector and page size. Writing to NOR Flash taken around 10
> seconds ( SQLite file size of 400KB, which is somehow not acceptable). Upon
> inserting additional debug prints, I found that since I use the  following
> API
>
> sqlite3_backup_init()
> 
> sqlite3_backup_step()
> 
> sqlite3_backup_finish()
> 
>
> SQLite writes data in very small chunks i.e. 1, 4, bytes which causes the
> poor performance while writing to Flash
>
> Please suggest if there are some experiences to fine tune the writes to
> NOR Flash.
>
> Kind Regards
> Vivek
>
>


-- 
Vivek(Mobile) :  +49 17655 040018
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with NOR FLASH

2014-06-25 Thread Simon Slavin

On 25 Jun 2014, at 9:14pm, Vivek Ranjan  wrote:

> sqlite3_backup_init()
> 
> sqlite3_backup_step()
> 
> sqlite3_backup_finish()
> 
> 
> SQLite writes data in very small chunks i.e. 1, 4, bytes which causes the
> poor performance while writing to Flash

What parameter are you passing to _step() as a number of pages ?

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite performance with NOR FLASH

2014-06-25 Thread Vivek Ranjan
Dear All,

I am unsing SQLite on NOR Flash with Sqlite version 3.8.5 with default
setting i.e. sector and page size. Writing to NOR Flash taken around 10
seconds ( SQLite file size of 400KB, which is somehow not acceptable). Upon
inserting additional debug prints, I found that since I use the  following
API

sqlite3_backup_init()

sqlite3_backup_step()

sqlite3_backup_finish()


SQLite writes data in very small chunks i.e. 1, 4, bytes which causes the
poor performance while writing to Flash

Please suggest if there are some experiences to fine tune the writes to NOR
Flash.

Kind Regards
Vivek
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite performance with large and multiple attached databases

2010-03-30 Thread Simon Slavin

On 30 Mar 2010, at 9:02am, Akbar Syed wrote:

> Unfortunately, I could not think about any other option than to keep
> the devices attached all the time.

I hope you don't have to handle many attachable devices, because SQLite can't 
handle more than 30 simultaneous ATTACHes.  See section 11 of



Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite performance with large and multiple attached databases

2010-03-30 Thread Akbar Syed
>>Make one big 'working' database by reading all the data from the databases on 
>>the independent devices and putting it into one big table.  Once you have 
>>sucked out the data you can disconnect from the device.  Then do all your 
>>enquiries on this big table.  There's a special convenient form of the INSERT 
>>command which gets its data directly from a SELECT command:

>>INSERT INTO bigTable SELECT * FROM attachedDB.myTable

>>So, in a simplified example, for each pluginnable device do this:

>>ATTACH "deviceFile.s3" AS attachedDB
>>INSERT INTO allDeviceTable SELECT * FROM attachedDB.myTable
>>DETACH attachedDB

Then just query allDeviceTable.

Thanks Simon for your suggestions. Unfortunately, I cannot detach the
devices in my application.
The data to be inserted in each database could be live i.e, on every
device where a database is
created, there could be new live data which needs to be inserted in
the database as and when added
to the device and made available for a single select. Your suggestions
would have worked if it was not live data.
Sorry I forgot to mention abt this in my previous mail.
Unfortunately, I could not think about any other option than to keep
the devices attached all the time.I am trying to
find some optimum cache values for each connection. I tried not to use
many connections. I just have a single connection
open for a device and its database and one attach connection. As of
now I am using 500K of default cache for each open connection
with 100K for attached connections and 500k default temp-cache and
also I have temp-store set to a file. It might hamper my
performance, yet I dont have any other option thou.

Regards,
-Akbar

On 26 Mar 2010, at 10:47am, Akbar Syed wrote:

>* Unfortunately, my application restricts me to use independent
*>* databases than to a single  database
*>* as each database exists on a different device and contains the info of
*>* that device in the database.
*>* Multiple devices are allowed to connect to my server app which instead
*>* gathers the records from the
*>* independent databases by attaching the databases together.
*
Make one big 'working' database by reading all the data from the
databases on the independent devices and putting it into one big
table.  Once you have sucked out the data you can disconnect from the
device.  Then do all your enquiries on this big table.  There's a
special convenient form of the INSERT command which gets its data
directly from a SELECT command:

INSERT INTO bigTable SELECT * FROM attachedDB.myTable

So, in a simplified example, for each pluginnable device do this:

ATTACH "deviceFile.s3" AS attachedDB
INSERT INTO allDeviceTable SELECT * FROM attachedDB.myTable
DETACH attachedDB

Then just query allDeviceTable.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite performance with large and multiple attached databases

2010-03-26 Thread Simon Slavin

On 26 Mar 2010, at 10:47am, Akbar Syed wrote:

> Unfortunately, my application restricts me to use independent
> databases than to a single  database
> as each database exists on a different device and contains the info of
> that device in the database.
> Multiple devices are allowed to connect to my server app which instead
> gathers the records from the
> independent databases by attaching the databases together.

Make one big 'working' database by reading all the data from the databases on 
the independent devices and putting it into one big table.  Once you have 
sucked out the data you can disconnect from the device.  Then do all your 
enquiries on this big table.  There's a special convenient form of the INSERT 
command which gets its data directly from a SELECT command:

INSERT INTO bigTable SELECT * FROM attachedDB.myTable

So, in a simplified example, for each pluginnable device do this:

ATTACH "deviceFile.s3" AS attachedDB
INSERT INTO allDeviceTable SELECT * FROM attachedDB.myTable
DETACH attachedDB

Then just query allDeviceTable.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite performance with large and multiple attached databases

2010-03-26 Thread Akbar Syed
Hello,

 >>If you want to improve utilization memory, just put everything in one
 >> database.  Each attached database in each connection has its own cache.
 >> With that many data sources you're going to get a very uneven distribution
 >> of cache utilization, and very high worst-case usage.

Unfortunately, my application restricts me to use independent
databases than to a single  database
as each database exists on a different device and contains the info of
that device in the database.
Multiple devices are allowed to connect to my server app which instead
gathers the records from the
independent databases by attaching the databases together.
Unfortunately, I could not find a better
solution than attach to join these databases as after gathering the
information together i need to sort
the collected records and present them together. Thatswhy sometimes I
try to select lot of records
with a single query.

May be someone suggest a better option for this situation. Is there a
way to clean cache before I make
a select to ensure I have enough cache for the select. I do use SOFT
HEAP feature already.

>>"Just 250 records" implies a sort.  Unless an index is available, a
>>  sort requires doing the whole query and sorting the results.  That's
>>  the large lead time.  After that it is just popping records off the
>> stack, so to speak.

I do have a primary index on my very first column which is nothing but
a rowid itself. I did not
use other indexes as it may be again overhead on memory. My
implementations for databases are simple
whereby i create a database with a single table of 65 columns and
write data into it. I attach all the
databases together and make selects with ofcourse using sortby on not
on the primary index but on some other
column of type TEXT. Do u think this will slow down the things?

Regards,
-Akbar


On Thu, Mar 25, 2010 at 05:22:04PM +0100, Akbar Syed scratched on the wall:
>* I have been trying to improve the performance and memory usage for my
*>* application whereby i have maximum of 30 databases attached. In total I have
*>* 31 databases with 30 databases attached to the first one. Each database has
*>* a single table with approx 65 columns and the records in each table may go
*>* upto 50,000. I have 31 connections for 31 databases i.e. 1 connection for
*>* one database. For each connection I have a cache size of 500 pages (1 Page =
*>* 1KB), temporary cache 500 pages and for each attached connection cache size
*>* of 100 pages. My efforts to minimize memory usage as much as I can also the
*>* speed of reading is tolerable. I dont mind the writing speed, but I do care
*>* for reading speed. In one attempt, I would select all the records from all
*>* the databases and thats the purpose I am using attached databases with a
*>* single query.
*
  If you want to improve utilization memory, just put everything in one
  database.  Each attached database in each connection has its own cache.
  With that many data sources you're going to get a very uneven distribution
  of cache utilization, and very high worst-case usage.

>* In one attempt i tried to fetch just 250 records of 65 columns from 31
*>* databases and I observed that I spend approx 1-5 seconds in the first call
*>* to sqlite3_step() function and the subsequent calls to sqlite3_step() are
*>* some microseconds.
*
  "Just 250 records" implies a sort.  Unless an index is available, a
  sort requires doing the whole query and sorting the results.  That's
  the large lead time.  After that it is just popping records off the
  stack, so to speak.

   -j
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite performance with large and multiple attached databases

2010-03-25 Thread Jay A. Kreibich
On Thu, Mar 25, 2010 at 05:22:04PM +0100, Akbar Syed scratched on the wall:
> I have been trying to improve the performance and memory usage for my
> application whereby i have maximum of 30 databases attached. In total I have
> 31 databases with 30 databases attached to the first one. Each database has
> a single table with approx 65 columns and the records in each table may go
> upto 50,000. I have 31 connections for 31 databases i.e. 1 connection for
> one database. For each connection I have a cache size of 500 pages (1 Page =
> 1KB), temporary cache 500 pages and for each attached connection cache size
> of 100 pages. My efforts to minimize memory usage as much as I can also the
> speed of reading is tolerable. I dont mind the writing speed, but I do care
> for reading speed. In one attempt, I would select all the records from all
> the databases and thats the purpose I am using attached databases with a
> single query.

  If you want to improve utilization memory, just put everything in one
  database.  Each attached database in each connection has its own cache.
  With that many data sources you're going to get a very uneven distribution
  of cache utilization, and very high worst-case usage.

> In one attempt i tried to fetch just 250 records of 65 columns from 31
> databases and I observed that I spend approx 1-5 seconds in the first call
> to sqlite3_step() function and the subsequent calls to sqlite3_step() are
> some microseconds.

  "Just 250 records" implies a sort.  Unless an index is available, a
  sort requires doing the whole query and sorting the results.  That's
  the large lead time.  After that it is just popping records off the
  stack, so to speak.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite performance with large and multiple attached databases

2010-03-25 Thread Akbar Syed
I have been trying to improve the performance and memory usage for my
application whereby i have maximum of 30 databases attached. In total I have
31 databases with 30 databases attached to the first one. Each database has
a single table with approx 65 columns and the records in each table may go
upto 50,000. I have 31 connections for 31 databases i.e. 1 connection for
one database. For each connection I have a cache size of 500 pages (1 Page =
1KB), temporary cache 500 pages and for each attached connection cache size
of 100 pages. My efforts to minimize memory usage as much as I can also the
speed of reading is tolerable. I dont mind the writing speed, but I do care
for reading speed. In one attempt, I would select all the records from all
the databases and thats the purpose I am using attached databases with a
single query.

In one attempt i tried to fetch just 250 records of 65 columns from 31
databases and I observed that I spend approx 1-5 seconds in the first call
to sqlite3_step() function and the subsequent calls to sqlite3_step() are
some microseconds.
Can anyone throw much light on this? I appreciate your suggestions to
further improve my speed and memory usage.

-Akbar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-05 Thread Simon Slavin

On 5 Oct 2009, at 8:02am, Keith Roberts wrote:

> On Sun, 4 Oct 2009, Simon Slavin wrote:
>
>> But note that the fields of the row are stored in (more or less) a
>> list.  So accessing the 20th column takes twice (-ish) as long as
>> accessing the 10th column.  If you make a table with 100 columns it
>> can take a long time to access the 100th column.
>
> Could this not be implemented as a B-tree search algo as
> well? Maybe something like a FAT (RAT?) index stuck
> on the front of the table rows?

Certainly could.  But it would be slower and a waste of filespace for  
the vast majority of databases out there.

However, it's generally felt that any solution which involves a table  
with 100 columns shows that you didn't understand the question.  You  
should be able to hold all the columns of a table in your mind at  
once.  If you notice that you're numbering your columns, then you're  
probably better off making those numbers a key column.  Otherwise you  
might consider reorganising the data as a properly list.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-05 Thread Keith Roberts
On Sun, 4 Oct 2009, Simon Slavin wrote:

> To: General Discussion of SQLite Database <sqlite-users@sqlite.org>
> From: Simon Slavin <slav...@hearsay.demon.co.uk>
> Subject: Re: [sqlite] SQLite performance with lots of data
> 
>
> On 4 Oct 2009, at 6:11pm, Cory Nelson wrote:
>
>> On Fri, Oct 2, 2009 at 12:34 PM, Cory Nelson <phro...@gmail.com>
>> wrote:
>>> On Fri, Oct 2, 2009 at 9:45 AM, Francisc Romano <fran...@gmail.com>
>>> wrote:
>>>> Wow. I did not expect such a quick answer...
>>>> Is there somewhere I can read exactly how fast and how big
>>>> databases SQLite
>>>> can take, please?
>>>
>>> SQLite uses a b+tree internally, which is logarithmic in complexity.
>>> Every time your dataset doubles in size, worse-case performance will
>>> be halved.
>>
>> Woops, I of course meant to say performance halves every time the size
>> of your dataset is squared.
>
> But note that the fields of the row are stored in (more or less) a
> list.  So accessing the 20th column takes twice (-ish) as long as
> accessing the 10th column.  If you make a table with 100 columns it
> can take a long time to access the 100th column.
>
> Simon.

Could this not be implemented as a B-tree search algo as 
well? Maybe something like a FAT (RAT?) index stuck 
on the front of the table rows?

Keith

-
Websites:
http://www.php-debuggers.net
http://www.karsites.net
http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with
TMDA [http://tmda.net]
-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-04 Thread Simon Slavin

On 4 Oct 2009, at 6:11pm, Cory Nelson wrote:

> On Fri, Oct 2, 2009 at 12:34 PM, Cory Nelson   
> wrote:
>> On Fri, Oct 2, 2009 at 9:45 AM, Francisc Romano   
>> wrote:
>>> Wow. I did not expect such a quick answer...
>>> Is there somewhere I can read exactly how fast and how big  
>>> databases SQLite
>>> can take, please?
>>
>> SQLite uses a b+tree internally, which is logarithmic in complexity.
>> Every time your dataset doubles in size, worse-case performance will
>> be halved.
>
> Woops, I of course meant to say performance halves every time the size
> of your dataset is squared.

But note that the fields of the row are stored in (more or less) a  
list.  So accessing the 20th column takes twice (-ish) as long as  
accessing the 10th column.  If you make a table with 100 columns it  
can take a long time to access the 100th column.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-04 Thread Cory Nelson
On Fri, Oct 2, 2009 at 12:34 PM, Cory Nelson  wrote:
> On Fri, Oct 2, 2009 at 9:45 AM, Francisc Romano  wrote:
>> Wow. I did not expect such a quick answer...
>> Is there somewhere I can read exactly how fast and how big databases SQLite
>> can take, please?
>
> SQLite uses a b+tree internally, which is logarithmic in complexity.
> Every time your dataset doubles in size, worse-case performance will
> be halved.

Woops, I of course meant to say performance halves every time the size
of your dataset is squared.

-- 
Cory Nelson
http://int64.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread Cory Nelson
On Fri, Oct 2, 2009 at 9:45 AM, Francisc Romano  wrote:
> Wow. I did not expect such a quick answer...
> Is there somewhere I can read exactly how fast and how big databases SQLite
> can take, please?

SQLite uses a b+tree internally, which is logarithmic in complexity.
Every time your dataset doubles in size, worse-case performance will
be halved.

-- 
Cory Nelson
http://int64.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread Francisc Romano
Very good idea! Thank you!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Francisc Romano wrote:
> how big databases SQLite can take, please?

Someone told me recently they have 37GB and 66 million rows in their data
set.  Another user is using the virtual table functionality together with
synthetic indices to optimise working with large amounts of 3D data.

Quite simply the best thing to do is to generate test data that is
representative of what you want to do and the kind of queries you want to
execute.  Generate your test data and queries as a text file and feed it to
the SQLite shell.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkrGPmwACgkQmOOfHg372QRY4wCeLYdh9or2DHdT+/iJ3OO6pGap
LawAoNDryZiYyd/UZ/ljqRgC/5bwOmf8
=Dthc
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread P Kishor
On Fri, Oct 2, 2009 at 11:45 AM, Francisc Romano  wrote:
> Wow. I did not expect such a quick answer...
> Is there somewhere I can read exactly how fast and how big databases SQLite
> can take, please?

See http://www.sqlite.org/limits.html for "how big." You will have to
do your own measurements for "how fast."

A simple answer is "fast enough," as that question, standing alone,
makes no sense at all. How fast for what? On which machine? What else
is going on in that machine concurrently? In which application? How
well have you programmed?

-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
Sent from Madison, WI, United States
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread Francisc Romano
Wow. I did not expect such a quick answer...
Is there somewhere I can read exactly how fast and how big databases SQLite
can take, please?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance with lots of data

2009-10-02 Thread P Kishor
On Fri, Oct 2, 2009 at 11:42 AM, Francisc Romano  wrote:
> Hello!
>
> I am not entirely certain this is the right way to proceed, but I haven't
> been able to find the appropriate official SQLite forum (if one exists).
> I want to create a rather complex AIR application that will have to deal
> with a massive database.
> My question is:
>
> Is SQLite capable of dealing with large ammounts of data and still perform
> quickly?
>

yes.

> Thank you!
> Francisc
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor h
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite performance with lots of data

2009-10-02 Thread Francisc Romano
Hello!

I am not entirely certain this is the right way to proceed, but I haven't
been able to find the appropriate official SQLite forum (if one exists).
I want to create a rather complex AIR application that will have to deal
with a massive database.
My question is:

Is SQLite capable of dealing with large ammounts of data and still perform
quickly?

Thank you!
Francisc
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread Griggs, Donald

Subject: Re: [sqlite] SQlite performance on multi process env

Hello, Zhrahman,

Regarding:   "... kindly suggest how to properly have the database
shared in memory among n number of processes. So they can execute select
operatins(read only no update on teh
database) effeciently."


If the large number of processes are unavoidable, you might want to
consider:
   1) Writing a server layer that would be the only process to open the
DB file and service the requests for the other processes.
   2) Modifying such a layer already written by others.
   3) Using a database designed for lots of simultaneous clients, such
as Postgres or MySql.  (I believe the succinct advice has been something
like, "Think of sqlite as a replacement for fwrite, not as a replacement
for Oracle.")
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread Jay A. Kreibich
On Thu, May 28, 2009 at 10:53:34AM -0700, zhrahman scratched on the wall:
> 
> Few other info
> I am running it on Linux. So to make the long story short, kindly suggest
> how to properly have the database shared in memory among n number of
> processes.

  You can't.  ":memory:" databases cannot be shared.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread John Elrick
zhrahman wrote:
> Few other info
> I am running it on Linux. So to make the long story short, kindly suggest
> how to properly have the database shared in memory among n number of
> processes. So they can execute select operatins(read only no update on teh
> database) effeciently.
>   

Multiprocessing is a magic trick.  The slight of hand makes the system 
appear to be faster running multiple tasks by relying upon different 
processes or threads needing access to different interrupts in a given 
period of time.

We can go back to the ancient days of punched cards and line printing to 
make a simple example.  The card reader takes time to process data.  
While a given process is waiting on the card reader, another process can 
chunk through some computations.  Meanwhile, those computations can be 
interrupted to let another process send data to a printer buffer which 
soon fills up and that process has to wait.  Back to computations.  Now 
the card reader buffer is full and can be processed so we switch over 
there.  You get the idea.  From the outside it looks like the system as 
a whole runs faster, but it's an illusion.  We're just making use of 
dead time.

If you throw nearly identical processes into the mix, such as you are 
doing, you defeat the magic.  Suddenly, you're back to no better than 
serially processing the data, with the penalty of task switching added 
in.  Given a quad core processor, you would likely be further ahead with 
three forced processes dividing the work among three cores -- leaving 
the fourth core for operating system processing.  You gain nothing by 
throwing more processes at the problem.


John Elrick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread zhrahman

Few other info
I am running it on Linux. So to make the long story short, kindly suggest
how to properly have the database shared in memory among n number of
processes. So they can execute select operatins(read only no update on teh
database) effeciently.


-- 
View this message in context: 
http://www.nabble.com/SQlite-performance-on-multi-process-env-tp23752056p23766728.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread zhrahman

 So yes the hard disk is shared. I tried to even load the database in memory.
It is still horribly slow. I want to understand this. If I load the database
in memory how can I make the memory sharable among 100 processes. I am
running in quad core environement. So my goal here is to load the database
in memory, each process will open read only connection to it and execute the
select operations. For doing 1 slect operations a single process was
taking 1 second and when 100 process was running each took 10+ seconds. What
are the ways to share the database among all the processes in memory. I hope
some shared memory approach is used here. But what I see is unless I
recreate all the tables individually for each process the memory sharing
doesnt happen. Please help me out here 

-- 
View this message in context: 
http://www.nabble.com/SQlite-performance-on-multi-process-env-tp23752056p23765638.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite performance on multi process env

2009-05-28 Thread Sylvain Pointeau
the hard disk is shared so it is a critical resource,as well as 100
processes doesn't seem realistic on a single processor, dual core or not.

so I can understand your result, I find even them not too bad ...

Cheers,
Sylvain

On Thu, May 28, 2009 at 12:38 AM, zhrahman  wrote:

>
> I am doing a simple test on sqlite where in step one, I just write 10,000
> records on a simple database having three fileds. In step 2, after the
> write
> is done,  I want 100 process reading from the database and doing simple
> SQLs, no update on the databse just simple select read operations. I am
> getting horrible performance when I increase the number of processes. Now
> each process only opens the database connection in read only mode. Still I
> am guessing some lock is placed on the database and that why even the read
> operation is slowing down drastically with the increase of the number of
> processes. Any idea what I need to do to boost performance? For 100
> processes doing reads it takes like 10 seconds to do 1 SQL by each
> process. If I reduce the number of processes from 1000 to 1 it only
> finishes
> in a second
> --
> View this message in context:
> http://www.nabble.com/SQlite-performance-on-multi-process-env-tp23752056p23752056.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQlite performance on multi process env

2009-05-27 Thread zhrahman

I am doing a simple test on sqlite where in step one, I just write 10,000
records on a simple database having three fileds. In step 2, after the write
is done,  I want 100 process reading from the database and doing simple
SQLs, no update on the databse just simple select read operations. I am
getting horrible performance when I increase the number of processes. Now
each process only opens the database connection in read only mode. Still I
am guessing some lock is placed on the database and that why even the read
operation is slowing down drastically with the increase of the number of
processes. Any idea what I need to do to boost performance? For 100
processes doing reads it takes like 10 seconds to do 1 SQL by each
process. If I reduce the number of processes from 1000 to 1 it only finishes
in a second
-- 
View this message in context: 
http://www.nabble.com/SQlite-performance-on-multi-process-env-tp23752056p23752056.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite performance issues on Linux running on MIPS (BIG ENDIAN)

2009-05-02 Thread chandan r
Hi,
   I am planning to use sqlite in an application program. The application
will be running in Linux environment on MIPS processor (BIG ENDIAN). Are
there any sqlite performance issues when running on MIPS processor with
Linux as the OS.  In particular are there any issues regarding non-aligned
memory access (e.g. BUS errors)?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite performance tuning and optimization

2009-04-30 Thread Kalyani Phadke
I am using SQLIte3 database and my query is  running slow
sometimes,sometimes it runs fast. Is there any sqlite Profiler available
just like SQL Profiler for SQL server? 
 
My query looks like 
 "SELECT ID, EventClassName, EventClassRef, TransitionTime, Message,
MonitoredRef, EventRef,ToState,Priority,Acked from Events WHERE
Events.NotificationTime >= "(LastEvent)&" "

and ID > " & " Order By ID DESC LIMIT 100"

Recordset CursorLocation = 2,CursorType = 2,LockType = 3

 

My Events table has 2259205 records. ID is my primary key so index is on
that field. NOtification time also have index on it.

The page running this query is executing in 312 ms -2.5 seconds on
Windows XP machine.

The page running this query is executing in 366 ms -2.8 minutes on
Windows Web server 2008  machine. 

IS there any way I can figure out why its taking so long to run??

Thanks,

-KPH

 

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance on select/update with 11GB database file

2009-04-09 Thread Kees Nuyt
On Wed, 08 Apr 2009 23:17:02 +0200, Florian Nigsch
 wrote:

>Hi all,
>
>I have been trying to implement a couple of things in SQLite because I  
>only need it for myself (so no concurrency issues here).
>
>I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
>8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
>text file that I want to put into SQLite to be able to query it and  
>also update fields it required. This raw text file has 7.2GB, roughly  
>11 million entries and 60 fields. (Perhaps not the best design, but  
>that shall not be the issue here)

We can't ignore that. If you have 11 million entries and 60
fields, I bet there is a lot of redundant data in the
database, which makes it big.
Normalization can make a lot of difference here.

>Getting it into SQLite was not as easy as I anticipated. At first, I  
>used the execute() method for every insert, and committed every 1  
>rows. This run didn't finish overnight.
>
>The a commit every 100,000 rows: at around 2.5 million entries it  
>slowed down so much that it was unacceptable. I used PRAGMA  
>journal_mode = OFF which improved a bit, but not enough.
>
>The final rescue was to use executemany() for every 100,000 records  
>combined with a commit(). That put the whole file in the DB in approx  
>17 min. Creation of indices where required took another 40 minutes.  
>The final database file has roughly 11GB.
>
>Now, I have 3 spare fields in the table that I want to populate with  
>data from another file. I loop over the file in question, from there I  
>get the ID of the record to update in the table and the info to  
>populate the empty field with. Again, I used executemany() every  
>100,000 records, assuming that it would work. But no.
>
>Even when I use executemany() every 10,000 rows, these 10,000 updates  
>take around 5 minutes or so (I haven't properly timed it). That is  
>also with PRAGMA journal_mode = OFF.

I don't recognize executemany() or commit(). They are not
part of the sqlite3_* API. Can I assume they are your own
wrapper functions? 

You did tell you commit() your transactions, but do you also
BEGIN them? (I apologize for asking the obvious, but someone
has to ask it anyway).

>When I just get the IDs of the entries that I need to update, look  
>them up in the table (select a,b,c from table where id in (...)), and  
>retrieve them with fetchall() and do nothing with them (no printing,  
>etc.) then this works fine for the first 10,000 records. After that,  
>again, it get's so slow that it's basically unusable. The ID field  
>that I use for the lookups has an index on it. The fields that I am  
>updating do not have indices on them. It's just text fields that are  
>being populated with strings up to approx 150 characters.
>
>Why are the retrievals that slow, and why are they getting slower the  
>further I go? And why are the updates so slow?
>
>Is it because the database file is too big?
> 
>Any help or suggestions would be highly appreciated!
>
>Regards,
>
>Flo

-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite performance on select/update with 11GB database file

2009-04-08 Thread Florian Nigsch
Hi all,

I have been trying to implement a couple of things in SQLite because I  
only need it for myself (so no concurrency issues here).

I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
text file that I want to put into SQLite to be able to query it and  
also update fields it required. This raw text file has 7.2GB, roughly  
11 million entries and 60 fields. (Perhaps not the best design, but  
that shall not be the issue here)

Getting it into SQLite was not as easy as I anticipated. At first, I  
used the execute() method for every insert, and committed every 1  
rows. This run didn't finish overnight.

The a commit every 100,000 rows: at around 2.5 million entries it  
slowed down so much that it was unacceptable. I used PRAGMA  
journal_mode = OFF which improved a bit, but not enough.

The final rescue was to use executemany() for every 100,000 records  
combined with a commit(). That put the whole file in the DB in approx  
17 min. Creation of indices where required took another 40 minutes.  
The final database file has roughly 11GB.

Now, I have 3 spare fields in the table that I want to populate with  
data from another file. I loop over the file in question, from there I  
get the ID of the record to update in the table and the info to  
populate the empty field with. Again, I used executemany() every  
100,000 records, assuming that it would work. But no.

Even when I use executemany() every 10,000 rows, these 10,000 updates  
take around 5 minutes or so (I haven't properly timed it). That is  
also with PRAGMA journal_mode = OFF.

When I just get the IDs of the entries that I need to update, look  
them up in the table (select a,b,c from table where id in (...)), and  
retrieve them with fetchall() and do nothing with them (no printing,  
etc.) then this works fine for the first 10,000 records. After that,  
again, it get's so slow that it's basically unusable. The ID field  
that I use for the lookups has an index on it. The fields that I am  
updating do not have indices on them. It's just text fields that are  
being populated with strings up to approx 150 characters.

Why are the retrievals that slow, and why are they getting slower the  
further I go? And why are the updates so slow?

Is it because the database file is too big?

Any help or suggestions would be highly appreciated!

Regards,

Flo


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-04 Thread Brown, Daniel
Hello Donald,

I have managed to speed up some of my SQLite queries by about 300% by
writing them from joins to sub queries.  I am comparing SQLite against a
custom SQL implementation we have here. It looks like it is our cursor
implementation that is so much faster than SQLite the regular queries
are roughly comparable.  I'm going to step through our custom SQL cursor
code and see if how it is managing that sort of speed.

I can't really post my table schema without first obscuring like I did
the original test queries it as it contains sensitive information but
I'll try taking a look at the schema output.

Daniel

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
Sent: Wednesday, December 03, 2008 3:51 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

Hi again, Daniel,

So I guess you're still having certain queries that take about 200x
longer than with your custom code, right?

There's nothing magical about sqlite, so it's not surprizing that code
customized for an application can outperform a generalized sql engine,
but a factor of 200 does seems pretty large.

If it's neither especially proprietary nor private, I'd again urge you
to post here your schema (and repost the slow queries).

You can dump this easily using the command-line utility program, e.g.:

 sqlite3 myDatabase.db
 .output mySchema
 .schema
 .quit

The schema should then be in the file "mySchema"

If the database itself is not too private, you may want to compress it
(with something like 7zip) and upload it somewhere on the web and
provide a link to it.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-03 Thread Griggs, Donald
Hi again, Daniel,

So I guess you're still having certain queries that take about 200x
longer than with your custom code, right?

There's nothing magical about sqlite, so it's not surprizing that code
customized for an application can outperform a generalized sql engine,
but a factor of 200 does seems pretty large.

If it's neither especially proprietary nor private, I'd again urge you
to post here your schema (and repost the slow queries).

You can dump this easily using the command-line utility program, e.g.:

 sqlite3 myDatabase.db
 .output mySchema
 .schema
 .quit

The schema should then be in the file "mySchema"

If the database itself is not too private, you may want to compress it
(with something like 7zip) and upload it somewhere on the web and
provide a link to it.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Brown, Daniel
Thank you for the explanation!  

I now have primary keys and indices added to my tables, my exporter was
not previously exporting primary keys correctly to SQLite and I just
added index exporting.  However I am not seeing any performance gain!!
Using EXPLAIN QUERY PLAN is mostly coming back as using PRIMARY KEY or
sometimes an index which is good but I would have expected some sort of
performance gain.  Every table should have at least one primary key and
one or more indices now.

The file size has increased by about 25-30% too which is unavoidable I
guess, I've tried vacuuming but that does not seem to effect the file
size much. The sqlite3_analyzer gives the following report summary:
Page size in bytes 1024  
Pages in the whole file (measured) 2628  
Pages in the whole file (calculated).. 2628  
Pages that store data. 2628   100.0% 
Pages on the freelist (per header) 00.0% 
Pages on the freelist (calculated) 00.0% 
Pages of auto-vacuum overhead. 00.0% 
Number of tables in the database.. 77
Number of indices. 152   
Number of named indices... 77
Automatically generated indices... 75
Size of the file in bytes. 2691072   
Bytes of user payload stored.. 1501970 55.8%

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Stanton
Sent: Tuesday, December 02, 2008 5:06 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

To efficiently execute the SQL "SELECT * FROM mytab WHERE myid = '1234'"

you must have an index on the "myid" colunm.  Each row has an index 
which uses a rowid as a key, and that is how the row is accessed.

A "primary key" is a column which is indexed and which has a unique 
value, duplicates are not encouraged.

You may have many indices on a table, and an index my have as its key 
values from more than one column.

Brown, Daniel wrote:
> I maybe confused but indices sound similar to what I understand
primary
> keys do, I already have primary keys on each table.  Unless I'm
mistaken
> as to what primary keys are?  From your explanation I guess I'm
slightly
> confused about the difference in primary keys and indices and that I
> need to implement indices to speed up my queries. Are there any
general
> guidelines for creating indices?  Is it as simple as creating an
indice
> per primary key in a table?  
> 
> How do I interpret the output from EXPLAIN QUERY PLAN?
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of John Stanton
> Sent: Tuesday, December 02, 2008 2:20 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] SQLite performance woe
> 
> Databases work by using indices.  A search for a row in a table of 1 
> million rows goes from having to do as many as a million row reads to
a 
> handful of index node accesses, from minutes to milliseconds.  Note
that
> 
> Sqlite is "lite" and only uses one index at a time so thoughtful
schema 
> design and query layout is necessary for optimal results.
> 
> Brown, Daniel wrote:
> 
>>Hello Donald & Others,
>>
>>I have primary keys set for each of the table but no indicies (that I
> 
> am
> 
>>aware of) as I simply converted the data from our existing database
>>system which does not support indicies.  As my current system only
>>implements primary keys I have no real experience dealing with
> 
> indicies,
> 
>>are they like some sort of extra key column?  Are there any guides to
>>optimising SQLite performance with indicies?  
>>
>>I tried EXPLAIN QUERY PLAN for the following:
>>"SELECT * FROM test_item INNER JOIN test_container ON
>>test_item.container_code = test_container.container_code"
>>
>>The output was:
>>0|0|TABLE test_item
>>1|1|TABLE test_container
>>
>>Is there a guide I can check for understanding this output?
>>
>>Daniel
>>
>>-Original Message-
>>From: [EMAIL PROTECTED]
>>[mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
>>Sent: Tuesday, December 02, 2008 9:52 AM
>>To: General Discussion of SQLite Database
>>Subject: Re: [sqlite] SQLite performance woe
>>
>>Hi Daniel,
>>
>>Regarding:
>>   "What I'd like to know is  if there is anything we can do with
>>our queries, SQLite set-up or library configuration to improve the
>>speed? " 
>>
>>Unless indicies would be inappropriate, did you mention whether you've
>>defined any indicies and does EXPLAIN QUERY PLA

Re: [sqlite] SQLite performance woe

2008-12-02 Thread John Stanton
To efficiently execute the SQL "SELECT * FROM mytab WHERE myid = '1234'" 
you must have an index on the "myid" colunm.  Each row has an index 
which uses a rowid as a key, and that is how the row is accessed.

A "primary key" is a column which is indexed and which has a unique 
value, duplicates are not encouraged.

You may have many indices on a table, and an index my have as its key 
values from more than one column.

Brown, Daniel wrote:
> I maybe confused but indices sound similar to what I understand primary
> keys do, I already have primary keys on each table.  Unless I'm mistaken
> as to what primary keys are?  From your explanation I guess I'm slightly
> confused about the difference in primary keys and indices and that I
> need to implement indices to speed up my queries. Are there any general
> guidelines for creating indices?  Is it as simple as creating an indice
> per primary key in a table?  
> 
> How do I interpret the output from EXPLAIN QUERY PLAN?
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of John Stanton
> Sent: Tuesday, December 02, 2008 2:20 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] SQLite performance woe
> 
> Databases work by using indices.  A search for a row in a table of 1 
> million rows goes from having to do as many as a million row reads to a 
> handful of index node accesses, from minutes to milliseconds.  Note that
> 
> Sqlite is "lite" and only uses one index at a time so thoughtful schema 
> design and query layout is necessary for optimal results.
> 
> Brown, Daniel wrote:
> 
>>Hello Donald & Others,
>>
>>I have primary keys set for each of the table but no indicies (that I
> 
> am
> 
>>aware of) as I simply converted the data from our existing database
>>system which does not support indicies.  As my current system only
>>implements primary keys I have no real experience dealing with
> 
> indicies,
> 
>>are they like some sort of extra key column?  Are there any guides to
>>optimising SQLite performance with indicies?  
>>
>>I tried EXPLAIN QUERY PLAN for the following:
>>"SELECT * FROM test_item INNER JOIN test_container ON
>>test_item.container_code = test_container.container_code"
>>
>>The output was:
>>0|0|TABLE test_item
>>1|1|TABLE test_container
>>
>>Is there a guide I can check for understanding this output?
>>
>>Daniel
>>
>>-Original Message-
>>From: [EMAIL PROTECTED]
>>[mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
>>Sent: Tuesday, December 02, 2008 9:52 AM
>>To: General Discussion of SQLite Database
>>Subject: Re: [sqlite] SQLite performance woe
>>
>>Hi Daniel,
>>
>>Regarding:
>>   "What I'd like to know is  if there is anything we can do with
>>our queries, SQLite set-up or library configuration to improve the
>>speed? " 
>>
>>Unless indicies would be inappropriate, did you mention whether you've
>>defined any indicies and does EXPLAIN QUERY PLAN show that the proper
>>index is being used?
>>
>>___
>>sqlite-users mailing list
>>sqlite-users@sqlite.org
>>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>___
>>sqlite-users mailing list
>>sqlite-users@sqlite.org
>>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread sqlite
All:

For comparison I tried several combinations of query orders and indices.
 I found both indices and the join clause sequence make significant
differences in execution time.  Using SQLiteSpy with SQLite v3.6.1 on
Windows XP.

I have two tables:

GPFB with 34830 rows, 10 columns with a 4-column primary key
SET_1 with 320 rows, 10 columns with a 2-column primary key

Indices added on two columns (GID,EID) common to both tables:

create index idx_gpfb_gid_eid on GPFB(GID,EID);
create index idx_set1_gid_eid on SET_1(GID,EID)

(The combination of GID and EID are not unique in either of the tables.)

My basic query:

select SETID,SID,CUT,X,sum(t1*Kx) as Px,sum(t2*Ky) as Py,sum(t3*Kz) as
Pz,sum(R2*Ky+T1*Kx*Z) as My 
from  GPFB join SET_1
where GPFB.GID=SET_1.GID and GPFB.EID=SET_1.EID 
group by SETID,SID,CUT 
order by SETID,SID,CUT;

I also executed the query reversing the join clause to "from SET_1 join
GPFB".

800 rows were returned:

"from GPFB join Set_1" with no indices: 3.3 seconds
"from GPFB join Set_1" with indices: 109.7 ms
"from SET_1 join GPFB" with no indices: 5.5 seconds
"from SET_1 join GPFB" with indices: 55.9 ms

In this example, EXPLAIN QUERY PLAN seems to indicate only the joined
table index is used.  There was no significant time delta in either
query if the from table index was dropped.

Russ Royal

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brown, Daniel
Sent: Tuesday, December 02, 2008 5:03 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

Hello Donald & Others,

I have primary keys set for each of the table but no indicies (that I am
aware of) as I simply converted the data from our existing database
system which does not support indicies.  As my current system only
implements primary keys I have no real experience dealing with indicies,
are they like some sort of extra key column?  Are there any guides to
optimising SQLite performance with indicies?  

I tried EXPLAIN QUERY PLAN for the following:
"SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code"

The output was:
0|0|TABLE test_item
1|1|TABLE test_container

Is there a guide I can check for understanding this output?

Daniel

-

Daniel,

I don't know the sizes of your tables nor the cardinality of your joined
items (i.e., how much variation in the values), but you might try
creating creating an index or two, especially on the larger table, e.g.:
CREATE INDEX idx_ti_ccode ON test_item(container_code); 
CREATE INDEX idx_ti_ccode ON test_container(container_code);

Then run the EXPLAIN QUERY PLAN again and see if one of the indices is
mentioned.

It might even help a little to VACUUM the database afterwards.

If the rows are rather large (i.e. if your avg row is measure in Kbytes
rather than bytes) then be sure "container_code" is one of the first
columns in each table.

Note that when benchmarking, your first run may be markedly slower than
repeated runs due to caching.

Is this running on a workstation/laptop/pc type of machine, or some
embedded gizmo with limited resources?

Let us know the results.

Donald 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Griggs, Donald
 

-Original Message-
Subject: Re: [sqlite] SQLite performance woe

I maybe confused but indices sound similar to what I understand primary
keys do, I already have primary keys on each table.  Unless I'm mistaken
as to what primary keys are?  From your explanation I guess I'm slightly
confused about the difference in primary keys and indices and that I
need to implement indices to speed up my queries. Are there any general
guidelines for creating indices?  Is it as simple as creating an indice
per primary key in a table?  

How do I interpret the output from EXPLAIN QUERY PLAN?

-

Sorry, emails crossed.

Regarding "explaining EXPLAIN":

I'm an anti-expert here, and the author writes:

   "The output from EXPLAIN and EXPLAIN QUERY PLAN is intended for
interactive analysis and troubleshooting only. The details of the output
format are subject to change from one release of SQLite to the next.
Applications should not use EXPLAIN or EXPLAIN QUERY PLAN since their
behavior is undocumented, unspecified, and variable."
   (above is from:  http://www.sqlite.org/lang_explain.html )

Nonetheless -- 
   If I run EXPLAIN QUERY PLAN on a query and get an output such as the
following, then I know that my query is using the index named "myIndex"
in accessing table "client_logs."

0   1   TABLE clients
1   0   TABLE client_logs AS cl WITH INDEX myIndex
0   0   TABLE lastContact

You may want to post here the commands you use to create your tables
(schema).

Since sqlite shows "benign neglect" to most data type specifications, I
wondered if it's possible that your definitions of primary keys are not
being honored?

You may also be interested in:
http://www.sqlite.org/lang_createindex.html
http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning
http://www.sqlite.org/lang_createtable.html
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Griggs, Donald
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brown, Daniel
Sent: Tuesday, December 02, 2008 5:03 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

Hello Donald & Others,

I have primary keys set for each of the table but no indicies (that I am
aware of) as I simply converted the data from our existing database
system which does not support indicies.  As my current system only
implements primary keys I have no real experience dealing with indicies,
are they like some sort of extra key column?  Are there any guides to
optimising SQLite performance with indicies?  

I tried EXPLAIN QUERY PLAN for the following:
"SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code"

The output was:
0|0|TABLE test_item
1|1|TABLE test_container

Is there a guide I can check for understanding this output?

Daniel

-

Daniel,

I don't know the sizes of your tables nor the cardinality of your joined
items (i.e., how much variation in the values), but you might try
creating creating an index or two, especially on the larger table, e.g.:
CREATE INDEX idx_ti_ccode ON test_item(container_code); 
CREATE INDEX idx_ti_ccode ON test_container(container_code);

Then run the EXPLAIN QUERY PLAN again and see if one of the indices is
mentioned.

It might even help a little to VACUUM the database afterwards.

If the rows are rather large (i.e. if your avg row is measure in Kbytes
rather than bytes) then be sure "container_code" is one of the first
columns in each table.

Note that when benchmarking, your first run may be markedly slower than
repeated runs due to caching.

Is this running on a workstation/laptop/pc type of machine, or some
embedded gizmo with limited resources?

Let us know the results.

Donald 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Brown, Daniel
I maybe confused but indices sound similar to what I understand primary
keys do, I already have primary keys on each table.  Unless I'm mistaken
as to what primary keys are?  From your explanation I guess I'm slightly
confused about the difference in primary keys and indices and that I
need to implement indices to speed up my queries. Are there any general
guidelines for creating indices?  Is it as simple as creating an indice
per primary key in a table?  

How do I interpret the output from EXPLAIN QUERY PLAN?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Stanton
Sent: Tuesday, December 02, 2008 2:20 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

Databases work by using indices.  A search for a row in a table of 1 
million rows goes from having to do as many as a million row reads to a 
handful of index node accesses, from minutes to milliseconds.  Note that

Sqlite is "lite" and only uses one index at a time so thoughtful schema 
design and query layout is necessary for optimal results.

Brown, Daniel wrote:
> Hello Donald & Others,
> 
> I have primary keys set for each of the table but no indicies (that I
am
> aware of) as I simply converted the data from our existing database
> system which does not support indicies.  As my current system only
> implements primary keys I have no real experience dealing with
indicies,
> are they like some sort of extra key column?  Are there any guides to
> optimising SQLite performance with indicies?  
> 
> I tried EXPLAIN QUERY PLAN for the following:
> "SELECT * FROM test_item INNER JOIN test_container ON
> test_item.container_code = test_container.container_code"
> 
> The output was:
> 0|0|TABLE test_item
> 1|1|TABLE test_container
> 
> Is there a guide I can check for understanding this output?
> 
> Daniel
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
> Sent: Tuesday, December 02, 2008 9:52 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] SQLite performance woe
> 
> Hi Daniel,
> 
> Regarding:
>"What I'd like to know is  if there is anything we can do with
> our queries, SQLite set-up or library configuration to improve the
> speed? " 
> 
> Unless indicies would be inappropriate, did you mention whether you've
> defined any indicies and does EXPLAIN QUERY PLAN show that the proper
> index is being used?
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread John Stanton
Databases work by using indices.  A search for a row in a table of 1 
million rows goes from having to do as many as a million row reads to a 
handful of index node accesses, from minutes to milliseconds.  Note that 
Sqlite is "lite" and only uses one index at a time so thoughtful schema 
design and query layout is necessary for optimal results.

Brown, Daniel wrote:
> Hello Donald & Others,
> 
> I have primary keys set for each of the table but no indicies (that I am
> aware of) as I simply converted the data from our existing database
> system which does not support indicies.  As my current system only
> implements primary keys I have no real experience dealing with indicies,
> are they like some sort of extra key column?  Are there any guides to
> optimising SQLite performance with indicies?  
> 
> I tried EXPLAIN QUERY PLAN for the following:
> "SELECT * FROM test_item INNER JOIN test_container ON
> test_item.container_code = test_container.container_code"
> 
> The output was:
> 0|0|TABLE test_item
> 1|1|TABLE test_container
> 
> Is there a guide I can check for understanding this output?
> 
> Daniel
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
> Sent: Tuesday, December 02, 2008 9:52 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] SQLite performance woe
> 
> Hi Daniel,
> 
> Regarding:
>"What I'd like to know is  if there is anything we can do with
> our queries, SQLite set-up or library configuration to improve the
> speed? " 
> 
> Unless indicies would be inappropriate, did you mention whether you've
> defined any indicies and does EXPLAIN QUERY PLAN show that the proper
> index is being used?
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Brown, Daniel
Hello Donald & Others,

I have primary keys set for each of the table but no indicies (that I am
aware of) as I simply converted the data from our existing database
system which does not support indicies.  As my current system only
implements primary keys I have no real experience dealing with indicies,
are they like some sort of extra key column?  Are there any guides to
optimising SQLite performance with indicies?  

I tried EXPLAIN QUERY PLAN for the following:
"SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code"

The output was:
0|0|TABLE test_item
1|1|TABLE test_container

Is there a guide I can check for understanding this output?

Daniel

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Griggs, Donald
Sent: Tuesday, December 02, 2008 9:52 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] SQLite performance woe

Hi Daniel,

Regarding:
   "What I'd like to know is  if there is anything we can do with
our queries, SQLite set-up or library configuration to improve the
speed? " 

Unless indicies would be inappropriate, did you mention whether you've
defined any indicies and does EXPLAIN QUERY PLAN show that the proper
index is being used?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Griggs, Donald
Hi Daniel,

Regarding:
   "What I'd like to know is  if there is anything we can do with
our queries, SQLite set-up or library configuration to improve the
speed? " 

Unless indicies would be inappropriate, did you mention whether you've
defined any indicies and does EXPLAIN QUERY PLAN show that the proper
index is being used?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-02 Thread Ken



> 
> I am not using the amalgamation version of the source as I
> have our my
> VFS implementations for two of the platforms I work with
> based on the
> original win_os.c VFS and the amalgamation does not provide
> the
> necessary header files (os_common.h and sqliteInt.h) to
> make VFS
> integration possible.  Other than by inserting the source
> for my VFS
> implementations directly into the amalgamation source,
> which I'd rather
> not do as it would make upgrading to new SQLite versions
> much more
> complex.
> 

Try using the full source sqlite and amalgamation.
Use the source sqlite to create your VFS implementation and libraries.
Then use the amalgamated version to create the sqlite libs. 

Probably will only be a 10% gain max though.

Can you tell us what/How your "custom" database works? I'm going to guess that 
it does not do parsing and is static based upon your types and structures?

So thats kind of like comparing apples to oranges.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite performance woe

2008-12-01 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brown, Daniel wrote:
> I am not using the amalgamation version of the source as I have our my
> VFS implementations for two of the platforms I work with based on the
> original win_os.c VFS and the amalgamation does not provide the
> necessary header files (os_common.h and sqliteInt.h) to make VFS
> integration possible.  Other than by inserting the source for my VFS
> implementations directly into the amalgamation source, which I'd rather
> not do as it would make upgrading to new SQLite versions much more
> complex.

You can do what I do which is the other way round.  I #include the
sqlite3.c amalgamation into my one file that does database access,
implements vtables and vfs etc.

You can make all the SQLite symbols private even:

/* See SQLite ticket 2554 */
#define SQLITE_API static
#define SQLITE_EXTERN static

Using the amalgamation vs individual source files gives a reported 5-10%
performance improvement.  I haven't measured what you then get when
#including the amalgamation above your database code, but did notice
when using the debugger that the compiler inlined quite a few methods.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkk05GEACgkQmOOfHg372QSPXgCfcyym8qyxmz452C01OKnUOJaJ
eycAoKyFi6kjHkua96crnZl2NcKZlCHl
=Ox7s
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite performance woe

2008-12-01 Thread Brown, Daniel
Good evening list,

 

I have been profiling the performance of SQLite version 3.6.1 against my
current custom (hacktastic) runtime database solution (which I am hoping
to replace with SQLite) and I just got a nasty and unexpected result:
SQLite is a lot slower!   I am running SQLite completely in memory
during profiling using an in memory database (:memory:) and I am setting
temp_store=MEMORY and journal_mode=MEMORY.  

 

I have ten tests statements that select data from an database that I
load completely into memory from file before each test begins.   I
iterate over the results summing the values of a column (usually max)
and record the duration over ten iterations to get a rough mean, high
and low time count:

1.   "SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code"

2.   "SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code  WHERE
(test_item.item_code > '100')"

3.   "SELECT * FROM test_item  WHERE (test_item.item_code > '100')"

4.   "SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code  WHERE
(test_item.item_code > '100')  AND (test_item.max > '50')  AND
(test_item.initial > '30')"

5.   "SELECT max  FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code";

6.   "SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code ORDER BY max"

7.   "SELECT * FROM test_item INNER JOIN test_container ON
test_item.container_code = test_container.container_code WHERE
(test_item.max > '50') ORDER BY max"

8.   "SELECT max FROM test_item"

9.   "SELECT * FROM test_item"

 

SQLite: in memory.

Test 1 Mean 288.470825, Low 286.080383, High 308.041931. (milliseconds)

Test 2 Mean 270.140808, Low 267.758209, High 289.795166. (milliseconds)

Test 3 Mean 68.888512, Low 66.573952, High 88.525116. (milliseconds)

Test 4 Mean 255.051758, Low 251.905319, High 273.752533. (milliseconds)

Test 5 Mean 103.347633, Low 101.023598, High 123.561203. (milliseconds)

Test 6 Mean 2050.301025, Low 2047.153442, High 2069.840088.
(milliseconds)

Test 7 Mean 1927.293213, Low 1923.600952, High 1944.863159.
(milliseconds)

Test 8 Mean 16.426598, Low 13.990897, High 37.434727. (milliseconds)

Test 9 Mean 68.630898, Low 66.280098, High 88.732208. (milliseconds)

 

Custom DB: in memory.

Test 1 (milliseconds) mean,low,high, 10.120694,9.463699,10.540760 

Test 2 (milliseconds) mean,low,high, 10.226122,9.593320,10.600039 

Test 3 (milliseconds) mean,low,high, 9.077908,8.492440,9.602920 

Test 4 (milliseconds) mean,low,high, 10.259830,9.591120,10.668540 

Test 5 (milliseconds) mean,low,high, 3.890896,3.626060,4.679040 

Test 6 (milliseconds) mean,low,high, 10.156658,9.529779,10.696919 

Test 7 (milliseconds) mean,low,high, 10.371894,9.598040,11.083039 

Test 8 (milliseconds) mean,low,high, 1.155094,1.010600,2.416280 

Test 9 (milliseconds) mean,low,high, 1.120254,1.002940,2.032100

 

As you can see the performance difference is significant, profiling
indicates that 75%-80% of the test execution for SQLite is being spent
inside sqlite3VdbeExec.  What I'd like to know is if that is normal and
if there is anything we can do with our queries, SQLite set-up or
library configuration to improve the speed?  So far I have found force
inlining sqlite3VdbeSerialTypeLen helped the performance a bit.  

 

I am not using the amalgamation version of the source as I have our my
VFS implementations for two of the platforms I work with based on the
original win_os.c VFS and the amalgamation does not provide the
necessary header files (os_common.h and sqliteInt.h) to make VFS
integration possible.  Other than by inserting the source for my VFS
implementations directly into the amalgamation source, which I'd rather
not do as it would make upgrading to new SQLite versions much more
complex.

 

I love SQLite's feature set, flexibility, tools and syntax compared to
my current solution but I need SQLite to be at least as fast as my
current solution to make it worth the switch.

 

Cheers,

 

Daniel Brown | Software Engineer

"The best laid schemes o' mice an' men, gang aft agley"

 

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Performance

2007-04-18 Thread Hugh Gibson
>   SELECT * FROM tetragrams 
>   WHERE word1 = 'x' AND word2||'' = 'y' 
>   ORDER BY occs;

Better as

   SELECT * FROM tetragrams 
   WHERE word1 = 'x' AND +word2 = 'y' 
   ORDER BY occs;

See http://www.sqlite.org/optoverview.html section 6.

Hugh

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite Performance

2007-04-18 Thread Dan Kennedy
On Wed, 2007-04-18 at 11:06 +0100, Alberto Simões wrote:
> On 4/17/07, Alberto Simões <[EMAIL PROTECTED]> wrote:
> > On 4/17/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > "=?ISO-8859-1?Q?Alberto_Sim=F5es?=" <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Consider the following database schema:
> > > > CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
> > > > word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
> > > > word4));
> > > > CREATE INDEX tet_b ON tetragrams (word2);
> > > > CREATE INDEX tet_c ON tetragrams (word3);
> > > > CREATE INDEX tet_d ON tetragrams (word4);
> > > >
> > > > The problem is that I want to use:
> > > >
> > > > SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;
> > > >
> > > > and it takes.. five minutes and did not give the result yet...
> > > >
> > > > Is there anything I can do to make it speed up CONSIDERABLY?
> > >
> > > CREATE INDEX tet_e ON tetragrams(word1, occs);
> 
> I tried:
> CREATE INDEX tet_a ON tetragrams(word1, occs);
> CREATE INDEX tet_b ON tetragrams(word2, occs);
> CREATE INDEX tet_c ON tetragrams(word3, occs);
> CREATE INDEX tet_d ON tetragrams(word4, occs);
> and PRIMARY KEY(word1,word2,word3,word4)
> 
> This works fairly well if I need
> SELECT * from tetragrams WHERE word1 = 'x' ORDER BY occs;
> 
> The problem is that I need as well
> SELECT * from tetragrams WHERE word1 = 'x' AND word2 = 'y' ORDER BY occs;
> and this is QUITE slow.

You could see if this one is faster:

  SELECT * FROM tetragrams 
  WHERE word1 = 'x' AND word2||'' = 'y' 
  ORDER BY occs;

Should take about the same time as the one that works "fairly well".
But return less rows. See also the ANALYZE command.

Dan.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite Performance

2007-04-18 Thread Alberto Simões

On 4/17/07, Alberto Simões <[EMAIL PROTECTED]> wrote:

On 4/17/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> "=?ISO-8859-1?Q?Alberto_Sim=F5es?=" <[EMAIL PROTECTED]> wrote:
> >
> > Consider the following database schema:
> > CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
> > word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
> > word4));
> > CREATE INDEX tet_b ON tetragrams (word2);
> > CREATE INDEX tet_c ON tetragrams (word3);
> > CREATE INDEX tet_d ON tetragrams (word4);
> >
> > The problem is that I want to use:
> >
> > SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;
> >
> > and it takes.. five minutes and did not give the result yet...
> >
> > Is there anything I can do to make it speed up CONSIDERABLY?
>
> CREATE INDEX tet_e ON tetragrams(word1, occs);


I tried:
CREATE INDEX tet_a ON tetragrams(word1, occs);
CREATE INDEX tet_b ON tetragrams(word2, occs);
CREATE INDEX tet_c ON tetragrams(word3, occs);
CREATE INDEX tet_d ON tetragrams(word4, occs);
and PRIMARY KEY(word1,word2,word3,word4)

This works fairly well if I need
SELECT * from tetragrams WHERE word1 = 'x' ORDER BY occs;

The problem is that I need as well
SELECT * from tetragrams WHERE word1 = 'x' AND word2 = 'y' ORDER BY occs;
and this is QUITE slow.

and more (try 16 combinations).
That means I need 16 indexes? :-S

Just to let you have an idea, this schema:
CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
word4));
CREATE INDEX tet_a ON tetragrams (word1,occs);
CREATE INDEX tet_b ON tetragrams (word2,occs);
CREATE INDEX tet_c ON tetragrams (word3,occs);
CREATE INDEX tet_d ON tetragrams (word4,occs);

is already with 1.8GB of disk :)

TIA
Alberto

--
Alberto Simões

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite Performance

2007-04-17 Thread Alberto Simões

On 4/17/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"=?ISO-8859-1?Q?Alberto_Sim=F5es?=" <[EMAIL PROTECTED]> wrote:
>
> Consider the following database schema:
> CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
> word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
> word4));
> CREATE INDEX tet_b ON tetragrams (word2);
> CREATE INDEX tet_c ON tetragrams (word3);
> CREATE INDEX tet_d ON tetragrams (word4);
>
> The problem is that I want to use:
>
> SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;
>
> and it takes.. five minutes and did not give the result yet...
>
> Is there anything I can do to make it speed up CONSIDERABLY?

CREATE INDEX tet_e ON tetragrams(word1, occs);


Hmms, Yes, it works as expected and speeds up to about one second, or less :)
Thank you,
Alberto
--
Alberto Simões

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite Performance

2007-04-17 Thread drh
"=?ISO-8859-1?Q?Alberto_Sim=F5es?=" <[EMAIL PROTECTED]> wrote:
> 
> Consider the following database schema:
> CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
> word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
> word4));
> CREATE INDEX tet_b ON tetragrams (word2);
> CREATE INDEX tet_c ON tetragrams (word3);
> CREATE INDEX tet_d ON tetragrams (word4);
> 
> The problem is that I want to use:
> 
> SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;
> 
> and it takes.. five minutes and did not give the result yet...
> 
> Is there anything I can do to make it speed up CONSIDERABLY? 

CREATE INDEX tet_e ON tetragrams(word1, occs);
--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite Performance

2007-04-17 Thread Dan Kennedy
On Tue, 2007-04-17 at 11:53 +0100, Alberto Simões wrote:
> Hi
> 
> I've found SQLite faster than MySQL and Postgres for small/medium
> databases. Now I have big ones and I really do not want to change, but
> I have some performance issues.
> 
> Consider the following database schema:
> CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
> word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
> word4));
> CREATE INDEX tet_b ON tetragrams (word2);
> CREATE INDEX tet_c ON tetragrams (word3);
> CREATE INDEX tet_d ON tetragrams (word4);
> 
> And the following database size:
> sqlite> SELECT COUNT(*) from tetragrams;
> 18397532
> 
> Now, a query like
> SELECT FROM tetragrams WHERE word1 = 6;
> returns 166579 rows;
> 
> This query takes some time, but starts as soon as I type the query.
> The problem is that I want to use:
> 
> SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;
> 
> and it takes.. five minutes and did not give the result yet...
> 
> Is there anything I can do to make it speed up CONSIDERABLY? I mean,
> this is to be used in a CGI and each CGI query will make 11 queries
> like the one above to the database.

You might need an index like:

  CREATE INDEX tet_e ON tetragrams(word1, occs);

Otherwise you have to sort the 166000 items each time the query 
is made.

Dan.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] SQLite Performance

2007-04-17 Thread Alberto Simões

Hi

I've found SQLite faster than MySQL and Postgres for small/medium
databases. Now I have big ones and I really do not want to change, but
I have some performance issues.

Consider the following database schema:
CREATE TABLE tetragrams (word1 INTEGER, word2 INTEGER, word3 INTEGER,
word4 INTEGER, occs INTEGER, PRIMARY KEY (word1, word2, word3,
word4));
CREATE INDEX tet_b ON tetragrams (word2);
CREATE INDEX tet_c ON tetragrams (word3);
CREATE INDEX tet_d ON tetragrams (word4);

And the following database size:
sqlite> SELECT COUNT(*) from tetragrams;
18397532

Now, a query like
SELECT FROM tetragrams WHERE word1 = 6;
returns 166579 rows;

This query takes some time, but starts as soon as I type the query.
The problem is that I want to use:

SELECT * FROM tetragrams WHERE word1=6 ORDER BY occs DESC LIMIT 10;

and it takes.. five minutes and did not give the result yet...

Is there anything I can do to make it speed up CONSIDERABLY? I mean,
this is to be used in a CGI and each CGI query will make 11 queries
like the one above to the database.

Thank you in advance,
Alberto
--
Alberto Simões

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-16 Thread John Stanton
Dennis's insight is apt.  Just as you would eliminate "common sub 
expressions" in your programs (or have an optimizer do it for you) it is 
a good idea to do the same with systems.


Writing numbers in an endian agnostic manner can easily be done on your 
flat file.


Dennis Cote wrote:

Ken wrote:

 I do have the option of either putting the data into sqlite at 
the start (when its read) or putting it into a flat file and then 
Later loading it into a sqlite db via a downstream job.
   A great deal of the data columns are simple numeric values and 
thats where sqlite really shines in that is portable between systems 
having differing endianness.


  Here is a summary of the entire processing where a1/b1 are different 
processes and probably differnt host platforms.
 a1, read from producing system, generate output data (be it flat 
file or sqlite).

  a2. Compress data file.
  a3. Transfer  compressed datafile to target system.
  a4. Goto a1
 b1. Receive datafile and uncompress.
  b2. Read datafile and load into Master DB. (This might just be a 
simple attach).

  b3. Massage data in Master db ???
  b4. Read and process data from MasterDb. Delete or mark as deleteable.
  b5. Delete processed data from MasterDb. (this could be in a 
seperate thread).

  b6. Goto step b1.
 The nice thing about simply attaching as a DB in step b2 is that 
when all data is processed from step b4 then step b5 to purge is a 
simple detach and operating system unlink for the underlying datafile. 
Which I suspect will be infinately faster than a sql delete command.



Ken,

If you are always going to move the data into an sqlite database at some 
point, then I would think you would want to do that on system a when the 
data is read. Unless system a is so much slower than system b that the 
extra work is a burden on system a, whereas it is insignificant on 
system b.


In either case your timing of the exchange file write function in system 
a only is incomplete. You are neglecting the time it takes system b to 
open and read the exchange file. If you sum those two times then you can 
see which solution provides the fastest overall delivery of the data to 
system b in the required format.


Dennis Cote
- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-16 Thread Eduardo Morras

At 15:51 16/03/2007, you wrote:

Dennis,

  Yes the data will be read later by down stream processing.

  I do have the option of either putting the data into sqlite at 
the start (when its read) or putting it into a flat file and then 
Later loading it into a sqlite db via a downstream job.


  A great deal of the data columns are simple numeric values and 
thats where sqlite really shines in that is portable between 
systems having differing endianness.


  Here is a summary of the entire processing where a1/b1 are 
different processes and probably differnt host platforms.


  a1, read from producing system, generate output data (be it flat 
file or sqlite).

  a2. Compress data file.
  a3. Transfer  compressed datafile to target system.
  a4. Goto a1

  b1. Receive datafile and uncompress.
  b2. Read datafile and load into Master DB. (This might just be a 
simple attach).

  b3. Massage data in Master db ???
  b4. Read and process data from MasterDb. Delete or mark as deleteable.
  b5. Delete processed data from MasterDb. (this could be in a 
seperate thread).

  b6. Goto step b1.

  The nice thing about simply attaching as a DB in step b2 is that 
when all data is processed from step b4 then step b5 to purge is a 
simple detach and operating system unlink for the underlying 
datafile. Which I suspect will be infinately faster than a sql delete command.


  Thanks,
  Ken


Ken, i have your same scenario, the producing system generate data 
and it's read by sbc card, but as Dennis says in last message, have 
sqlite running on that card system. From time to time dump database 
to b system for backup. It's lot easier than 
read/compress/transfer/insert on different machines.





---
Useful Acronymous : DMCA = Don't Make Content Accessible 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-16 Thread Dennis Cote

Ken wrote:
   
  I do have the option of either putting the data into sqlite at the start (when its read) or putting it into a flat file and then Later loading it into a sqlite db via a downstream job.
  
   
  A great deal of the data columns are simple numeric values and thats where sqlite really shines in that is portable between systems having differing endianness.


  Here is a summary of the entire processing where a1/b1 are different 
processes and probably differnt host platforms.
   
  a1, read from producing system, generate output data (be it flat file or sqlite).

  a2. Compress data file.
  a3. Transfer  compressed datafile to target system.
  a4. Goto a1
   
  b1. Receive datafile and uncompress.

  b2. Read datafile and load into Master DB. (This might just be a simple 
attach).
  b3. Massage data in Master db ???
  b4. Read and process data from MasterDb. Delete or mark as deleteable.
  b5. Delete processed data from MasterDb. (this could be in a seperate thread).
  b6. Goto step b1.
   
  The nice thing about simply attaching as a DB in step b2 is that when all data is processed from step b4 then step b5 to purge is a simple detach and operating system unlink for the underlying datafile. Which I suspect will be infinately faster than a sql delete command.
   
 

Ken,

If you are always going to move the data into an sqlite database at some 
point, then I would think you would want to do that on system a when the 
data is read. Unless system a is so much slower than system b that the 
extra work is a burden on system a, whereas it is insignificant on 
system b.


In either case your timing of the exchange file write function in system 
a only is incomplete. You are neglecting the time it takes system b to 
open and read the exchange file. If you sum those two times then you can 
see which solution provides the fastest overall delivery of the data to 
system b in the required format.


Dennis Cote 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-16 Thread Ken
Dennis,
   
  Yes the data will be read later by down stream processing. 
   
  I do have the option of either putting the data into sqlite at the start 
(when its read) or putting it into a flat file and then Later loading it into a 
sqlite db via a downstream job.
   
  A great deal of the data columns are simple numeric values and thats where 
sqlite really shines in that is portable between systems having differing 
endianness.

  Here is a summary of the entire processing where a1/b1 are different 
processes and probably differnt host platforms.
   
  a1, read from producing system, generate output data (be it flat file or 
sqlite).
  a2. Compress data file.
  a3. Transfer  compressed datafile to target system.
  a4. Goto a1
   
  b1. Receive datafile and uncompress.
  b2. Read datafile and load into Master DB. (This might just be a simple 
attach).
  b3. Massage data in Master db ???
  b4. Read and process data from MasterDb. Delete or mark as deleteable.
  b5. Delete processed data from MasterDb. (this could be in a seperate thread).
  b6. Goto step b1.
   
  The nice thing about simply attaching as a DB in step b2 is that when all 
data is processed from step b4 then step b5 to purge is a simple detach and 
operating system unlink for the underlying datafile. Which I suspect will be 
infinately faster than a sql delete command.
   
  Thanks,
  Ken
   
  
Dennis Cote <[EMAIL PROTECTED]> wrote:
  Ken wrote:
> 
> This is a write only app. 100% insert. 
> 
> 
Ken,

Why bother putting the data into a database if you are never going to 
read it back out? Other formats, such as a flat text file are much 
better for logs or archives.

If, in fact, you will be reading the data at some point then you need to 
also compare the time it takes to retrieve the data of interest from 
your flat file vs the sqlite database.

Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




Re: [sqlite] sqlite Performance

2007-03-16 Thread Dennis Cote

Ken wrote:
 
 This is  a write only app. 100% insert. 
 
 

Ken,

Why bother putting the data into a database if you are never going to 
read it back out? Other formats, such as a flat text file are much 
better for logs or archives.


If, in fact, you will be reading the data at some point then you need to 
also compare the time it takes to retrieve the data of interest from 
your flat file vs the sqlite database.


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread Michael Scharf

Hi Ken,

you can get the exact insert speed of the flatfile.dat:
- dump your data into the flat file
- create a virtual table implementation for your flat file
   http://www.sqlite.org/cvstrac/wiki?p=VirtualTables&1150734307
- and use it from SQLite
   http://www.sqlite.org/lang_createvtab.html

Michael


I'm looking for suggestions on improving performance of my sqlite application.
 
 Here are system timings for a run where the sqlite db has been replaced with a flat file output.

 real 0m1.459s
 user0m0.276s
 sys  0m0.252s
 
 This is a run when using sqlite as the output format.

 real 0m3.095s
 user0m1.956s
 sys  0m0.160s
 
 As you can see sqlite takes twice as long and almost 8 times the user time.
 
 Output size for flat file:   13, 360, 504flatfile.dat

 Output size fo sqlit file:   11,042,816   sqlt.db f
 
 Slite db has the following pragmas set.
 
  PRAGMA default_synchronous=FULL

  PRAGMA temp_store=memory
  PRAGMA page_size=4096
  PRAGMA cache_size=2000
 
 Any ideas how to get the sqlite output timings to a more respectable level would be appreciated.
 
 Thanks

 Ken
 
 






--
http://MichaelScharf.blogspot.com/


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread drh
Ken <[EMAIL PROTECTED]> wrote:
>  
> When the DB is closed when in synchrounous mode,
> is it then persistent at the OS level even from power failures etc? 
>  

You don't have to close the DB.  All you have to do is
commit.  Before the commit finishes, all of your data
is guaranteed to be on oxide.**

**Note:  this assumes that the fsync() system call (or its
equivalent on windows) really works as advertised.  In practice,
I am told, this assumption is false, because almost every disk 
controller lies and says that it has committed a sector to oxide
before it really has.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread Ken
DRH,
 Thanks for your valuable insite. 
 
 When the DB is closed when in synchrounous mode, is it then persistent at the 
OS level even from power failures etc? 
 
 
 

[EMAIL PROTECTED] wrote: Ken  wrote:
>  
>  I should be able to run with synchronous=off. Since 
> the application maintains state in a seperate DB elsewhere.
>  

Just to clarify the implications where, if you run with
synchronous=off and you take a power failure or an OS
crash in the middle of a transaction, then you run a
serious risk of ending up with a corruption database
file. 

However, (key point->) an application crash will not hurt 
the database.  

Since you have an alternate source of authoritative data
and since power failures and OS crashes are reasonably
infrequent, your decision to run with synchronous=off
may well be the right choice.

--
D. Richard Hipp  


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




Re: [sqlite] sqlite Performance

2007-03-15 Thread Ken
Tito,
 Its even better now!
 
 Synchronous=normal and No primary keys (except 1 table) for auto increment. 
 
 real0m1.975s
 user0m1.436s
 sys 0m0.140s
 
 Vs  flat file test case:
 
 real0m0.862s
 user0m0.228s
 sys 0m0.188s
 
 This is now very respectable. 
 
 Thanks,
 Ken
 

Ken <[EMAIL PROTECTED]> wrote: Tito,
 
 There are no indices built besides the default ones. Hmm maybe I should try 
this by dropping the primary Keys.. I'll give that a try as well, GOOD idea!
 
 The entire batch of inserts (about 8 tables) is done in a single transaction.
 
 As an Oracle DBA, I'm pretty familar with tuning. This was definately an I/O 
issue and related to the code path vs say a select where the optimizer picked 
an incorrect plan. 
 
 Regards,
 Ken
 

Tito Ciuro  wrote: Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)

I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.

-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

> There are no free lunches.  When Sqlite stores your data item it  
> not only writes it into a linked list of pages in a file but also  
> inserts at least on key into a B-Tree index.  It does it quite  
> efficiently so what you are seeing is the inevitable overhead of  
> storing the data in a structured form.  The value of the structure  
> becomes obvious when you are retrieving a single item from a set of  
> millions and the index allows you to access it in a tiny fraction  
> of the time it would take to search an unstructured list like a  
> flat file.
>
> The ACID implementation in Sqlite provides data security but is  
> does involve a significant overhead.  You pay a price for not  
> losing data in a system crash.
>
> Like all things in life "you pays your money and you takes your  
> choice".  It is somewhat simpler with Sqlite in that you don't pay  
> your money, you just take your choice.
>
> If you want faster Sqlite performance use faster disks.  The  
> latency is important so 15,000 rpm disks will be better than 5,400  
> rpm ones.
>
> Ken wrote:
>> To answer your question:  Yes I can use a flat file at this stage,  
>> but eventually it needs to be imported into some type of  
>> structure. So to that end I decided early on to use sqlite to  
>> write the data out.   I was hoping for better performance. The raw  
>> I/O to read the data and process is around .75 seconds (no write i/ 
>> o).. So using a flat file output costs about .7 seconds.
>>   Using sqlite to do the output costs about 2.25 seconds. My  
>> question is why? And what can be done to improve this  
>> performance?   John Stanton  wrote: Ken wrote:
>>> I'm looking for suggestions on improving performance of my sqlite  
>>> application.
>>> Here are system timings for a run where the sqlite db has been  
>>> replaced with a flat file output.
>>> real 0m1.459s
>>> user0m0.276s
>>> sys  0m0.252s
>>> This is a run when using sqlite as the output format.
>>> real 0m3.095s
>>> user0m1.956s
>>> sys  0m0.160s
>>> As you can see sqlite takes twice as long and almost 8 times the  
>>> user time.
>>> Output size for flat file:   13, 360, 504flatfile.dat
>>> Output size fo sqlit file:   11,042,816   sqlt.db f
>>> Slite db has the following pragmas set.
>>>  PRAGMA default_synchronous=FULL
>>>  PRAGMA temp_store=memory
>>>  PRAGMA page_size=4096
>>>  PRAGMA cache_size=2000
>>> Any ideas how to get the sqlite output timings to a more  
>>> respectable level would be appreciated.
>>> Thanks
>>> Ken
>> If you want flat file performance, use a flat file.  Sqlite is  
>> built on top of a flat file and cannot be faster or even as fast.   
>> If your application can use a flat file, why use anything more  
>> complex?
>> - 
>> 
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> - 
>> 
>
>
> -- 
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> -- 
> ---
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-





RE: [sqlite] sqlite Performance

2007-03-15 Thread GBanschbach


"Griggs, Donald" wrote on 03/15/2007 01:49:30 PM:

> Regarding:
>  Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
>  Flat file is 13 MB, sqlite db is 11 MB.
>
>  "Any ideas how to get the sqlite output timings to a more respectable
> level would be appreciated. "
>
I think you may be looking at this as a one dimensional problem, or looking
at it from the wrong angle. [I am using upper case here for mild emphasis -
not yelling].  Really, there is a greater difference in gains at the other
end.  In other words, there IS a certain amount of overhead, BUT, what you
get back from sqlite over a flat file is: The ability to use SQL, where a
flat file is flat and brain dead. The ability to create indexes to help
find specific data, where your other choices may require you to write code
(which equals time which equals labor and money).  Finally, businesses
often throw hardware at performance, when tuning has already been tried.
Good Luck.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread drh
Ken <[EMAIL PROTECTED]> wrote:
>  
>  I should be able to run with synchronous=off. Since 
> the application maintains state in a seperate DB elsewhere.
>  

Just to clarify the implications where, if you run with
synchronous=off and you take a power failure or an OS
crash in the middle of a transaction, then you run a
serious risk of ending up with a corruption database
file. 

However, (key point->) an application crash will not hurt 
the database.  

Since you have an alternate source of authoritative data
and since power failures and OS crashes are reasonably
infrequent, your decision to run with synchronous=off
may well be the right choice.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread Ken
Tito,
 
 There are no indices built besides the default ones. Hmm maybe I should try 
this by dropping the primary Keys.. I'll give that a try as well, GOOD idea!
 
 The entire batch of inserts (about 8 tables) is done in a single transaction.
 
 As an Oracle DBA, I'm pretty familar with tuning. This was definately an I/O 
issue and related to the code path vs say a select where the optimizer picked 
an incorrect plan. 
 
 Regards,
 Ken
 

Tito Ciuro <[EMAIL PROTECTED]> wrote: Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)

I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.

-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

> There are no free lunches.  When Sqlite stores your data item it  
> not only writes it into a linked list of pages in a file but also  
> inserts at least on key into a B-Tree index.  It does it quite  
> efficiently so what you are seeing is the inevitable overhead of  
> storing the data in a structured form.  The value of the structure  
> becomes obvious when you are retrieving a single item from a set of  
> millions and the index allows you to access it in a tiny fraction  
> of the time it would take to search an unstructured list like a  
> flat file.
>
> The ACID implementation in Sqlite provides data security but is  
> does involve a significant overhead.  You pay a price for not  
> losing data in a system crash.
>
> Like all things in life "you pays your money and you takes your  
> choice".  It is somewhat simpler with Sqlite in that you don't pay  
> your money, you just take your choice.
>
> If you want faster Sqlite performance use faster disks.  The  
> latency is important so 15,000 rpm disks will be better than 5,400  
> rpm ones.
>
> Ken wrote:
>> To answer your question:  Yes I can use a flat file at this stage,  
>> but eventually it needs to be imported into some type of  
>> structure. So to that end I decided early on to use sqlite to  
>> write the data out.   I was hoping for better performance. The raw  
>> I/O to read the data and process is around .75 seconds (no write i/ 
>> o).. So using a flat file output costs about .7 seconds.
>>   Using sqlite to do the output costs about 2.25 seconds. My  
>> question is why? And what can be done to improve this  
>> performance?   John Stanton  wrote: Ken wrote:
>>> I'm looking for suggestions on improving performance of my sqlite  
>>> application.
>>> Here are system timings for a run where the sqlite db has been  
>>> replaced with a flat file output.
>>> real 0m1.459s
>>> user0m0.276s
>>> sys  0m0.252s
>>> This is a run when using sqlite as the output format.
>>> real 0m3.095s
>>> user0m1.956s
>>> sys  0m0.160s
>>> As you can see sqlite takes twice as long and almost 8 times the  
>>> user time.
>>> Output size for flat file:   13, 360, 504flatfile.dat
>>> Output size fo sqlit file:   11,042,816   sqlt.db f
>>> Slite db has the following pragmas set.
>>>  PRAGMA default_synchronous=FULL
>>>  PRAGMA temp_store=memory
>>>  PRAGMA page_size=4096
>>>  PRAGMA cache_size=2000
>>> Any ideas how to get the sqlite output timings to a more  
>>> respectable level would be appreciated.
>>> Thanks
>>> Ken
>> If you want flat file performance, use a flat file.  Sqlite is  
>> built on top of a flat file and cannot be faster or even as fast.   
>> If your application can use a flat file, why use anything more  
>> complex?
>> - 
>> 
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> - 
>> 
>
>
> -- 
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> -- 
> ---
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




RE: [sqlite] sqlite Performance

2007-03-15 Thread Ken
Donald,
 
 I set the PRAGMA synchronous= OFF and here are the results:
 
 real0m2.258s
 user0m1.736s
 sys 0m0.168s
 
 --
 Pragma synchronous= NORMAL
 real0m2.395s
 user0m1.520s
 sys 0m0.128s
 
 Pragma synchronous= FULL
 real0m3.228s
 user0m2.276s
 sys 0m0.136s
 
 
 Running with synchronous=off is 43% faster !!!
 Running with Synchrounous=normal is 33 % faster.
 
 I should be able to run with synchronous=off. Since the application maintains 
state in a seperate DB elsewhere.
 
 
 Thanks for you valuable Input.
 Ken
 
 

"Griggs, Donald" <[EMAIL PROTECTED]> wrote: Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




Re: [sqlite] sqlite Performance

2007-03-15 Thread Tito Ciuro

Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)


I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.


-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

There are no free lunches.  When Sqlite stores your data item it  
not only writes it into a linked list of pages in a file but also  
inserts at least on key into a B-Tree index.  It does it quite  
efficiently so what you are seeing is the inevitable overhead of  
storing the data in a structured form.  The value of the structure  
becomes obvious when you are retrieving a single item from a set of  
millions and the index allows you to access it in a tiny fraction  
of the time it would take to search an unstructured list like a  
flat file.


The ACID implementation in Sqlite provides data security but is  
does involve a significant overhead.  You pay a price for not  
losing data in a system crash.


Like all things in life "you pays your money and you takes your  
choice".  It is somewhat simpler with Sqlite in that you don't pay  
your money, you just take your choice.


If you want faster Sqlite performance use faster disks.  The  
latency is important so 15,000 rpm disks will be better than 5,400  
rpm ones.


Ken wrote:
To answer your question:  Yes I can use a flat file at this stage,  
but eventually it needs to be imported into some type of  
structure. So to that end I decided early on to use sqlite to  
write the data out.   I was hoping for better performance. The raw  
I/O to read the data and process is around .75 seconds (no write i/ 
o).. So using a flat file output costs about .7 seconds.
  Using sqlite to do the output costs about 2.25 seconds. My  
question is why? And what can be done to improve this  
performance?   John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:
I'm looking for suggestions on improving performance of my sqlite  
application.
Here are system timings for a run where the sqlite db has been  
replaced with a flat file output.

real 0m1.459s
user0m0.276s
sys  0m0.252s
This is a run when using sqlite as the output format.
real 0m3.095s
user0m1.956s
sys  0m0.160s
As you can see sqlite takes twice as long and almost 8 times the  
user time.

Output size for flat file:   13, 360, 504flatfile.dat
Output size fo sqlit file:   11,042,816   sqlt.db f
Slite db has the following pragmas set.
 PRAGMA default_synchronous=FULL
 PRAGMA temp_store=memory
 PRAGMA page_size=4096
 PRAGMA cache_size=2000
Any ideas how to get the sqlite output timings to a more  
respectable level would be appreciated.

Thanks
Ken
If you want flat file performance, use a flat file.  Sqlite is  
built on top of a flat file and cannot be faster or even as fast.   
If your application can use a flat file, why use anything more  
complex?
- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 




-- 
---

To unsubscribe, send email to [EMAIL PROTECTED]
-- 
---





-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread John Stanton
There are no free lunches.  When Sqlite stores your data item it not 
only writes it into a linked list of pages in a file but also inserts at 
least on key into a B-Tree index.  It does it quite efficiently so what 
you are seeing is the inevitable overhead of storing the data in a 
structured form.  The value of the structure becomes obvious when you 
are retrieving a single item from a set of millions and the index allows 
you to access it in a tiny fraction of the time it would take to search 
an unstructured list like a flat file.


The ACID implementation in Sqlite provides data security but is does 
involve a significant overhead.  You pay a price for not losing data in 
a system crash.


Like all things in life "you pays your money and you takes your choice". 
 It is somewhat simpler with Sqlite in that you don't pay your money, 
you just take your choice.


If you want faster Sqlite performance use faster disks.  The latency is 
important so 15,000 rpm disks will be better than 5,400 rpm ones.


Ken wrote:
To answer your question: 
 Yes I can use a flat file at this stage, but eventually it needs to be imported into some type of structure. So to that end I decided early on to use sqlite to write the data out. 
 
 I was hoping for better performance. The raw I/O to read the data and process is around .75 seconds (no write i/o).. So using a flat file output costs about .7 seconds.
 
 Using sqlite to do the output costs about 2.25 seconds. My question is why? And what can be done to improve this performance? 
 
 


John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:


I'm looking for suggestions on improving performance of my sqlite application.

Here are system timings for a run where the sqlite db has been replaced with a 
flat file output.
real 0m1.459s
user0m0.276s
sys  0m0.252s

This is a run when using sqlite as the output format.
real 0m3.095s
user0m1.956s
sys  0m0.160s

As you can see sqlite takes twice as long and almost 8 times the user time.

Output size for flat file:   13, 360, 504flatfile.dat
Output size fo sqlit file:   11,042,816   sqlt.db f

Slite db has the following pragmas set.

 PRAGMA default_synchronous=FULL
 PRAGMA temp_store=memory
 PRAGMA page_size=4096
 PRAGMA cache_size=2000

Any ideas how to get the sqlite output timings to a more respectable level 
would be appreciated.

Thanks
Ken



If you want flat file performance, use a flat file.  Sqlite is built on 
top of a flat file and cannot be faster or even as fast.  If your 
application can use a flat file, why use anything more complex?


-
To unsubscribe, send email to [EMAIL PROTECTED]
-






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread Ken
Scott, 
 
 The whole job is wrapped in an explicit transaction.
 Variables are bound and statements prepared only once, using reset. 
 
 This is  a write only app. 100% insert. 
 
 Ken
 

Scott Hess <[EMAIL PROTECTED]> wrote: Are you using explicit transactions at 
all?  If not, as a quick test,
put the _entire_ job in a transaction and see what happens.

-scott


On 3/15/07, Ken  wrote:
> To answer your question:
> Yes I can use a flat file at this stage, but eventually it needs to be 
> imported into some type of structure. So to that end I decided early on to 
> use sqlite to write the data out.
>
> I was hoping for better performance. The raw I/O to read the data and process 
> is around .75 seconds (no write i/o).. So using a flat file output costs 
> about .7 seconds.
>
> Using sqlite to do the output costs about 2.25 seconds. My question is why? 
> And what can be done to improve this performance?
>
>
>
> John Stanton  wrote: Ken wrote:
> > I'm looking for suggestions on improving performance of my sqlite 
> > application.
> >
> >  Here are system timings for a run where the sqlite db has been replaced 
> > with a flat file output.
> >  real 0m1.459s
> >  user0m0.276s
> >  sys  0m0.252s
> >
> >  This is a run when using sqlite as the output format.
> >  real 0m3.095s
> >  user0m1.956s
> >  sys  0m0.160s
> >
> >  As you can see sqlite takes twice as long and almost 8 times the user time.
> >
> >  Output size for flat file:   13, 360, 504flatfile.dat
> >  Output size fo sqlit file:   11,042,816   sqlt.db f
> >
> >  Slite db has the following pragmas set.
> >
> >   PRAGMA default_synchronous=FULL
> >   PRAGMA temp_store=memory
> >   PRAGMA page_size=4096
> >   PRAGMA cache_size=2000
> >
> >  Any ideas how to get the sqlite output timings to a more respectable level 
> > would be appreciated.
> >
> >  Thanks
> >  Ken
> >
> If you want flat file performance, use a flat file.  Sqlite is built on
> top of a flat file and cannot be faster or even as fast.  If your
> application can use a flat file, why use anything more complex?
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




RE: [sqlite] sqlite Performance

2007-03-15 Thread Ken
ok my bad for poor wording... 
 
 I'll try with Synchronous off. I may also try disabling the journal file since 
I can easily recreate the data if it is not successful.
 
 Thanks,
 Ken
 

"Griggs, Donald" <[EMAIL PROTECTED]> wrote: Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




Re: [sqlite] sqlite Performance

2007-03-15 Thread Ken
To answer your question: 
 Yes I can use a flat file at this stage, but eventually it needs to be 
imported into some type of structure. So to that end I decided early on to use 
sqlite to write the data out. 
 
 I was hoping for better performance. The raw I/O to read the data and process 
is around .75 seconds (no write i/o).. So using a flat file output costs about 
.7 seconds.
 
 Using sqlite to do the output costs about 2.25 seconds. My question is why? 
And what can be done to improve this performance? 
 
 

John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:
> I'm looking for suggestions on improving performance of my sqlite application.
>  
>  Here are system timings for a run where the sqlite db has been replaced with 
> a flat file output.
>  real 0m1.459s
>  user0m0.276s
>  sys  0m0.252s
>  
>  This is a run when using sqlite as the output format.
>  real 0m3.095s
>  user0m1.956s
>  sys  0m0.160s
>  
>  As you can see sqlite takes twice as long and almost 8 times the user time.
>  
>  Output size for flat file:   13, 360, 504flatfile.dat
>  Output size fo sqlit file:   11,042,816   sqlt.db f
>  
>  Slite db has the following pragmas set.
>  
>   PRAGMA default_synchronous=FULL
>   PRAGMA temp_store=memory
>   PRAGMA page_size=4096
>   PRAGMA cache_size=2000
>  
>  Any ideas how to get the sqlite output timings to a more respectable level 
> would be appreciated.
>  
>  Thanks
>  Ken
>  
If you want flat file performance, use a flat file.  Sqlite is built on 
top of a flat file and cannot be faster or even as fast.  If your 
application can use a flat file, why use anything more complex?

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




RE: [sqlite] sqlite Performance

2007-03-15 Thread Griggs, Donald
Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite Performance

2007-03-15 Thread John Stanton

Ken wrote:

I'm looking for suggestions on improving performance of my sqlite application.
 
 Here are system timings for a run where the sqlite db has been replaced with a flat file output.

 real 0m1.459s
 user0m0.276s
 sys  0m0.252s
 
 This is a run when using sqlite as the output format.

 real 0m3.095s
 user0m1.956s
 sys  0m0.160s
 
 As you can see sqlite takes twice as long and almost 8 times the user time.
 
 Output size for flat file:   13, 360, 504flatfile.dat

 Output size fo sqlit file:   11,042,816   sqlt.db f
 
 Slite db has the following pragmas set.
 
  PRAGMA default_synchronous=FULL

  PRAGMA temp_store=memory
  PRAGMA page_size=4096
  PRAGMA cache_size=2000
 
 Any ideas how to get the sqlite output timings to a more respectable level would be appreciated.
 
 Thanks

 Ken
 
If you want flat file performance, use a flat file.  Sqlite is built on 
top of a flat file and cannot be faster or even as fast.  If your 
application can use a flat file, why use anything more complex?


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] sqlite Performance

2007-03-15 Thread Ken
I'm looking for suggestions on improving performance of my sqlite application.
 
 Here are system timings for a run where the sqlite db has been replaced with a 
flat file output.
 real 0m1.459s
 user0m0.276s
 sys  0m0.252s
 
 This is a run when using sqlite as the output format.
 real 0m3.095s
 user0m1.956s
 sys  0m0.160s
 
 As you can see sqlite takes twice as long and almost 8 times the user time.
 
 Output size for flat file:   13, 360, 504flatfile.dat
 Output size fo sqlit file:   11,042,816   sqlt.db f
 
 Slite db has the following pragmas set.
 
  PRAGMA default_synchronous=FULL
  PRAGMA temp_store=memory
  PRAGMA page_size=4096
  PRAGMA cache_size=2000
 
 Any ideas how to get the sqlite output timings to a more respectable level 
would be appreciated.
 
 Thanks
 Ken
 
 



Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gerry Snyder wrote:
> The wiki is there and open to all.
> 
> I look forward to reading your additions to it.

To be fair, only some of the documentation is in the wiki.  The
remainder is generated.  For example you can't edit any of the pages
listed under:

  http://sqlite.org/docs.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFoA0YmOOfHg372QQRAlycAJ9p3hQkWfc8yAUTqE7DgjGsefdiTwCfWIqD
xxpWSAFOZOB6yrbYrNO/Cwc=
=jxv/
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Gerry Snyder

Emerson Clarke wrote:



Fix the out of date documentation


The wiki is there and open to all.

I look forward to reading your additions to it.


Gerry

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Emerson Clarke

For anyone who is interested i have created a standalone test case
which demonstrates the threading behaviour that i had, or as close as
i can get it.  Feel free to use the code for whatever purposes you see
fit.

It will compile on linux and windows, and comes with 4 versions of the
sqlite libraries which can be found under "test/lib/sqlite/lib".  Each
platform has two versions, one built with THREADSAFE=0, and the other
with THREADSAFE=1.

http://ul1.rapidshare.com/files/10511098/test.zip

It seems that i was incorrect in thinking the cause was
sqlite3_step(). In replicating the problem from scratch i found that
it was actually caused by sqlite3_finalize() being called without
synchronisation.

What does this mean, well firstly it means that the entire problem was
avoidable had i simply synchronised correctly.  But i think more
worryingly, it highlights the fact that there is a great deal of
confusion as to what it means for something to be thread safe on this
mailing list.

This confusion is also not aided in any way by the obtuse and stubborn
attitudes that some people have demonstrated.  If a lot of people are
having regular problems with sqlite and threading as has been
indicated then perhaps its time to review the situation.

It is incredibly frustrating that the documentation and the various
answers that i have recieved on this topic have been contradictory.
At no time has anyone actually stood up and said that the software can
safely be used in a multithreaded environment as long as access to the
api's is synchronised.

Instead you have a situation where the software has inbuilt misuse
detection which may or may not alert the programmer to a potential
error and a THREADSAFE macro which makes 5% of the it threadsafe but
leaves the other 95% exposed.  (By way of example using the code i
supplied and the safe versions of the libraries it is possible to get
away with calling sqlite3_finalize() outside the mutex, but if you
take it one step further and use sqlite3_step() outside the mutex it
will fail)

So an inconsistent and unpredictable internal approach to thread
safety combined with a policy of righteousness on this mailing list
means that people like me have little chance of ever getting it right.

Why is it so hard to simply nail down the characteristics of the
software and update the documentation so that everyone is on the same
page ?

If sqlite is thread safe, and by that i mean it can be used safely
with synchronisation, then why not say so.  And be consistent in
saying so.  That way when someone comes here with a problem you can
confidently describe how the api can be used and the user can go away
and attempt to resolve their problem with confidence.

I guess in part i owe people an appology, but then, had i been given a
rational explanation or had access to clear documentation i guess this
would not have gone on so long.  And to be quite honest, my confidence
in the software has now been eroded to such an extent that i dont know
what is up or down.

Please considder what im saying rationally and dont take offence.  I
know a lot of you are frustrated at this debate, and so you should be.
I am too. So why not take steps to ensure that it doesnt happen
again.  Be clear about the threading behaviour and remove the
inconsistencies.

Fix the out of date documentation, and give a consistent answer,
sqlite can be used with multiple threads provided that access to the
api is synchronised.

The saftey checks and macro options do more harm than good becuase
they create confusion.  Yes if you do X,Y, and Z you might get lucky,
but thats probably why so many people have trouble.  Its not
consistent and its complicated.  Unless your going to document the
exact sequence of calls which is safe to make without synchronisation,
you shouldnt be encouraging it.

Advising people to create complicated designs where each thread has
its own connection doesnt help either, because it will not work with
transactions.  It should be either 100% safe, or not at all.

Perhaps it would also be useful to document the various strategies and
ways in which sqlite can be used with safety.  Stop telling people
that mutlithreading results in random unpredictable behavior. Be open
minded and don't hide behind whatever doctrine has been followed up
till now.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Fred Williams
Sounds like a H-- of a plan to me!  

I'll call Larry Ellison and warn him to put the 12 meter yacht up for sale, 
he's about to get steam rolled! :-)

Fred

> -Original Message-
> From: Bill King [mailto:[EMAIL PROTECTED]
> Sent: Saturday, January 06, 2007 1:15 AM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] sqlite performance, locking & threading
> 
> 
> Roger Binns wrote:
> 
> >>Im sorry for being so harsh, and i know im not winning any friends
> >>here,
> >>
> >>
> >
> >So far noone has agreed with you :-)
> >
> >  
> >
> This would be incorrect. The correct statement is "so far no one has 
> vocally agreed with you".
> 
> If people didn't agree, this whole once a month people having 
> multi-threading issues would not be the case. Surely the number of 
> people with exactly the same issues, month after month, 
> should point to 
> something? Perchance illogic?
> 
> Time I think to go off and write my own simple sql engine. 
> People here 
> are far too closed minded, and far too quick to attack others because 
> they don't follow DRH's line of thought, and decisions.
> 
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
> 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread John Stanton

Bill King wrote:

Roger Binns wrote:


Im sorry for being so harsh, and i know im not winning any friends
here,
  



So far noone has agreed with you :-)

 

This would be incorrect. The correct statement is "so far no one has 
vocally agreed with you".


If people didn't agree, this whole once a month people having 
multi-threading issues would not be the case. Surely the number of 
people with exactly the same issues, month after month, should point to 
something? Perchance illogic?


Time I think to go off and write my own simple sql engine. People here 
are far too closed minded, and far too quick to attack others because 
they don't follow DRH's line of thought, and decisions.


Good idea.  Would you share your proposed architecture with us.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Bill King

Roger Binns wrote:


Im sorry for being so harsh, and i know im not winning any friends
here,
   



So far noone has agreed with you :-)

 

This would be incorrect. The correct statement is "so far no one has 
vocally agreed with you".


If people didn't agree, this whole once a month people having 
multi-threading issues would not be the case. Surely the number of 
people with exactly the same issues, month after month, should point to 
something? Perchance illogic?


Time I think to go off and write my own simple sql engine. People here 
are far too closed minded, and far too quick to attack others because 
they don't follow DRH's line of thought, and decisions.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threadin

2007-01-05 Thread Emerson Clarke

Ken,

Thanks for you comments.  I have coded and tested a module just like
test_server.c and by disabling the safety checks i have also been able
to code and test an example which uses a single connection, single
transaction, single table with up to 50 threads doing
insert/update/delete with no errors.

Granted it was synchronised, so all execution of sqlite3_ api routines
was done serially, but in my experience thats quite a normal
requirement for multithreading.

Its hard to make something which doesnt require synchronisation,
especially when internal state and files are concerned.  But thats not
a problem, because if the synchronisation is granular enough it will
only ever block as frequently as it would normallly crash.  So in
other words, blocking becomes a race condition, and performance is not
hindered very much at all.

If you would like i can code up a test case which demonstrates this
and send it too you, but i think i should stop posting to this topic.
Some people are getting a bit too defensive and annoyed...

I am grateful for all the time people have given me, and for the
software itself.  At the end of the day i only brought this up because
i wanted to help improve it.

Emerson

On 1/5/07, Ken <[EMAIL PROTECTED]> wrote:

Emerson,

 I agree with you somewhat. Not 100% convinced but, I like you am a little dissapointed how sqlite handles  
"threadsafe" and multiple connections. Even in the "test_server.c" module is not 
"concurrent" As it serializes all processing to a single thread, this is not concurrent processing.

 Now if i could take that server thread and create a pool of workers and hand 
off the working set request to one of those, then this would be concurrent 
since each thread would have the capability to run independently of the other 
and at the same time on an SMP hardware system such as Solaris,Hpux etc... But 
this is specifically disallowed since the connection when using the shared 
cache must be used by the creating thread.

 So far I've not been able to successfully find any threadsafe examples of 
sqlite that actually work in a concurrent manner.  Even the threadtest2.c 
crased (bug fix coming in 3.3.10)  but even it does not start up a transaction 
and run multiple inserts/step commands.

 I would like to see a working example of how sqlite can be concurrently 
accessed via multiple threads performingWrite (insert/update/delete) and by 
other threads performing selects against a single table.

 IMHO, the sqlite3_step function if it can only be executed serially (ie must be wrapped 
in a mutex) should do this mutexing internally and not be exposed and left in the hands 
of the user. If one compiles the code with --enable-threadsafe, shouldn't the API be able 
to handle areas such as these internally and not generate "misuse" errors.



Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Emerson Clarke wrote:
> > I have to admit i am a little dissapointed.  As the primary author of
> > the software i would have thought that you would have a good
> > understanding of what the thread safety characteristics of your own
> > api were.
>
> He does!  It is well documented and tested.  *You* are the one who wants
> to do various things differently.
>
> > Suggesting that suppressing the safety checks will result in random
> > and non reproducable failures seems a little extreme, if not
> > superstitious.
>
> Err, you are using threading.  Explain exactly how if you got a failure
> due to threading you could reproduce it exactly.  You do know that you
> can't run x86 processors in lockstep because at the cycle level they
> have effectively random behaviour due to cache eviction and replacement
> policies.  That one reason for the Itanium if you need two processors to
> have exactly the same behaviour on the same code at the cycle level.
>
> > As i was discussing with Nicholas, api's are not by default thread
> > unsafe.  In 

Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread bartsmissaert
> not to spark a debate

Although the majority of this thread is as clear as mud, it is still
interesting, even for simple VBA programmers like me that have no chance
(maybe via a VB6 ActiveX exe) to use multi-threading.

RBS


> Emerson, one posts to a forum like this to get help and other ideas, not
> to spark a debate.  Many talented people gave you some of their time to
> help you solve your problem and one in particular gave you a well
> conceived and executed piece of software free of charge.  Appreciate
> their charity.
>
> If you have some insights which have escaped the rest of us, implement
> them and present the results to the world and dazzle us.
>
> BTW, a major advantage of Open Source software is that you do not need
> to have people explain it to you.  You have the source and that explains
> everything.  That is particularly so with Sqlite, which is clearly
> written and consequently the source reads like a book.  A few minutes
> with the source and grep and you have your answers.
>
> Emerson Clarke wrote:
>> John,
>>
>> Um, alright then...
>>
>> But i think your preaching to the converted, simplifying things is
>> what i always try to do.  And not just when theres a problem
>>
>> If you followed the thread fully you would realise that there was
>> never a problem with my design, though that didnt stop many people
>> from chirping in and offering suggestions.
>>
>> The problem i had was with sqlite not being compatible with the simple
>> design that i wanted.  I did try several alternate designs, but only
>> as a way of working around the problem i had with sqlite.  It took a
>> long time but eventually i managed to get someone to explain why
>> sqlite had that particular problem, and i was able to modify the
>> sqlite source to resolve the issue.
>>
>> Unfortunately no one has yet commented on my solution, or the problem
>> which it addresses.   Basically sqlite has thread safety checking
>> routines which work a little like mutexe's.  Every time you enter a
>> bit of code which is potentially thread unsafe it sets a magic number,
>> then resets it when it comes out.  This is an attempt to detect when
>> two threads are accessing the same bit of code at the same time.
>>
>> Clearly its not 100% reliable, and is subject to all kinds of thread
>> races, but it does provide some measure of protection.  Unfortunately
>> though, the way it has been coded, an unbalanced safety check is
>> performed in the sqlite3_step() function.
>>
>> This is equivalent to entering a mutex but never leaving, which causes
>> deadlock in a multithreaded program.  Only in this situation sqlite
>> throws a misuse error any time two or more threads use sqlite3_step()
>> at the same time, even if those threads are synchronised and perfectly
>> safe.
>>
>> The easy solution is to disable the safety checks, the propper
>> solution is to balance out the checks in sqlite3_step() so that users
>> who actually wish to use sqlite in a multithreaded program are free to
>> synchronise access to the api without error and there is still a
>> reasonable level of safety checking for users who do not synchronise
>> properly.
>>
>>
>> Emerson
>>
>> On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:
>>
>>> Work on turning "reasonable" into "adequate" or "good" and it will help
>>> you get an intuitive feel for the design of programs such as yours.
>>> Then your programs will be simple, fast and robust, as Einstein
>>> counselled - "Make it as simple a possible, but no simpler".
>>>
>>> I also suggest that you take Niklaus Wirth's advice and when you run
>>> into a difficulty backtrack your work and scrap everything until you
>>> reach a point where there are no problems and start again from that
>>> point taking a different approach having learned a lesson from your
>>> previous attempt.
>>>
>>> By the way, I doubt whether you are using a modern operating system, it
>>> is most likely to be old technology like Windows or Linux.  Neither
>>> supports much in the way of parallelism.
>>>
>>
>> -
>>
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> -
>>
>>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>
>




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread dcharno

Can we please stop this thread?

John Stanton wrote:
Emerson, one posts to a forum like this to get help and other ideas, not 
to spark a debate.  Many talented people gave you some of their time to 
help you solve your problem and one in particular gave you a well 
conceived and executed piece of software free of charge.  Appreciate 
their charity.


If you have some insights which have escaped the rest of us, implement 
them and present the results to the world and dazzle us.


BTW, a major advantage of Open Source software is that you do not need 
to have people explain it to you.  You have the source and that explains 
everything.  That is particularly so with Sqlite, which is clearly 
written and consequently the source reads like a book.  A few minutes 
with the source and grep and you have your answers.


Emerson Clarke wrote:

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:


Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 









-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread John Stanton
Emerson, one posts to a forum like this to get help and other ideas, not 
to spark a debate.  Many talented people gave you some of their time to 
help you solve your problem and one in particular gave you a well 
conceived and executed piece of software free of charge.  Appreciate 
their charity.


If you have some insights which have escaped the rest of us, implement 
them and present the results to the world and dazzle us.


BTW, a major advantage of Open Source software is that you do not need 
to have people explain it to you.  You have the source and that explains 
everything.  That is particularly so with Sqlite, which is clearly 
written and consequently the source reads like a book.  A few minutes 
with the source and grep and you have your answers.


Emerson Clarke wrote:

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:


Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Ken
Emerson,
 
 I agree with you somewhat. Not 100% convinced but, I like you am a little 
dissapointed how sqlite handles  "threadsafe" and multiple connections. Even in 
the "test_server.c" module is not "concurrent" As it serializes all processing 
to a single thread, this is not concurrent processing.
 
 Now if i could take that server thread and create a pool of workers and hand 
off the working set request to one of those, then this would be concurrent 
since each thread would have the capability to run independently of the other 
and at the same time on an SMP hardware system such as Solaris,Hpux etc... But 
this is specifically disallowed since the connection when using the shared 
cache must be used by the creating thread.
 
 So far I've not been able to successfully find any threadsafe examples of 
sqlite that actually work in a concurrent manner.  Even the threadtest2.c 
crased (bug fix coming in 3.3.10)  but even it does not start up a transaction 
and run multiple inserts/step commands.
 
 I would like to see a working example of how sqlite can be concurrently 
accessed via multiple threads performingWrite (insert/update/delete) and by 
other threads performing selects against a single table. 
 
 IMHO, the sqlite3_step function if it can only be executed serially (ie must 
be wrapped in a mutex) should do this mutexing internally and not be exposed 
and left in the hands of the user. If one compiles the code with 
--enable-threadsafe, shouldn't the API be able to handle areas such as these 
internally and not generate "misuse" errors. 
 
 
 
Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Emerson Clarke wrote:
> > I have to admit i am a little dissapointed.  As the primary author of
> > the software i would have thought that you would have a good
> > understanding of what the thread safety characteristics of your own
> > api were.
>
> He does!  It is well documented and tested.  *You* are the one who wants
> to do various things differently.
>
> > Suggesting that suppressing the safety checks will result in random
> > and non reproducable failures seems a little extreme, if not
> > superstitious.
>
> Err, you are using threading.  Explain exactly how if you got a failure
> due to threading you could reproduce it exactly.  You do know that you
> can't run x86 processors in lockstep because at the cycle level they
> have effectively random behaviour due to cache eviction and replacement
> policies.  That one reason for the Itanium if you need two processors to
> have exactly the same behaviour on the same code at the cycle level.
>
> > As i was discussing with Nicholas, api's are not by default thread
> > unsafe.  In fact it is the other way around, most api's should be
> > perfectly thread safe if access to them is synchronised.
>
> This all comes down to semantics.  For trivial libraries, sure you can
> just wrap a mutex around it all.  For non-trivial libraries, best
> practise is for the author to design and implement for whatever thread
> usage they believe is best suited for the library and the platforms it
> operates one.
>
> > Im sorry for being so harsh, and i know im not winning any friends
> > here,
>
> So far noone has agreed with you :-)
>
> > I have tested the changes i made with some 50 threads performing
> > probably thousands of statements a second, and had no errors.
>
> What does that prove?  You cannot prove threading by running stuff and
> saying "see it didn't crash".  The only way to prove threading is by
> reasoning about the design and verifying the implementation matches the
> design. drh already did that years ago.
>
> > If you are willing to work through the problem, i am willing to assist
> > you in any way that i can.  Wether it be writing test cases or making
> > modifications to the source.
>
> You can always maintain your own forked version of SQLite and link to it
> from the wiki.
>
> Roger
> -BEGIN PGP 

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns <[EMAIL PROTECTED]> wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
> I have to admit i am a little dissapointed.  As the primary author of
> the software i would have thought that you would have a good
> understanding of what the thread safety characteristics of your own
> api were.

He does!  It is well documented and tested.  *You* are the one who wants
to do various things differently.

> Suggesting that suppressing the safety checks will result in random
> and non reproducable failures seems a little extreme, if not
> superstitious.

Err, you are using threading.  Explain exactly how if you got a failure
due to threading you could reproduce it exactly.  You do know that you
can't run x86 processors in lockstep because at the cycle level they
have effectively random behaviour due to cache eviction and replacement
policies.  That one reason for the Itanium if you need two processors to
have exactly the same behaviour on the same code at the cycle level.

> As i was discussing with Nicholas, api's are not by default thread
> unsafe.  In fact it is the other way around, most api's should be
> perfectly thread safe if access to them is synchronised.

This all comes down to semantics.  For trivial libraries, sure you can
just wrap a mutex around it all.  For non-trivial libraries, best
practise is for the author to design and implement for whatever thread
usage they believe is best suited for the library and the platforms it
operates one.

> Im sorry for being so harsh, and i know im not winning any friends
> here,

So far noone has agreed with you :-)

> I have tested the changes i made with some 50 threads performing
> probably thousands of statements a second, and had no errors.

What does that prove?  You cannot prove threading by running stuff and
saying "see it didn't crash".  The only way to prove threading is by
reasoning about the design and verifying the implementation matches the
design. drh already did that years ago.

> If you are willing to work through the problem, i am willing to assist
> you in any way that i can.  Wether it be writing test cases or making
> modifications to the source.

You can always maintain your own forked version of SQLite and link to it
from the wiki.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc
KaSDdgBuwW4ITN1UWms5DMI=
=EqF9
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
> I have to admit i am a little dissapointed.  As the primary author of
> the software i would have thought that you would have a good
> understanding of what the thread safety characteristics of your own
> api were.

He does!  It is well documented and tested.  *You* are the one who wants
to do various things differently.

> Suggesting that suppressing the safety checks will result in random
> and non reproducable failures seems a little extreme, if not
> superstitious.

Err, you are using threading.  Explain exactly how if you got a failure
due to threading you could reproduce it exactly.  You do know that you
can't run x86 processors in lockstep because at the cycle level they
have effectively random behaviour due to cache eviction and replacement
policies.  That one reason for the Itanium if you need two processors to
have exactly the same behaviour on the same code at the cycle level.

> As i was discussing with Nicholas, api's are not by default thread
> unsafe.  In fact it is the other way around, most api's should be
> perfectly thread safe if access to them is synchronised.

This all comes down to semantics.  For trivial libraries, sure you can
just wrap a mutex around it all.  For non-trivial libraries, best
practise is for the author to design and implement for whatever thread
usage they believe is best suited for the library and the platforms it
operates one.

> Im sorry for being so harsh, and i know im not winning any friends
> here,

So far noone has agreed with you :-)

> I have tested the changes i made with some 50 threads performing
> probably thousands of statements a second, and had no errors.

What does that prove?  You cannot prove threading by running stuff and
saying "see it didn't crash".  The only way to prove threading is by
reasoning about the design and verifying the implementation matches the
design. drh already did that years ago.

> If you are willing to work through the problem, i am willing to assist
> you in any way that i can.  Wether it be writing test cases or making
> modifications to the source.

You can always maintain your own forked version of SQLite and link to it
from the wiki.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc
KaSDdgBuwW4ITN1UWms5DMI=
=EqF9
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

Richard,

I have to admit i am a little dissapointed.  As the primary author of
the software i would have thought that you would have a good
understanding of what the thread safety characteristics of your own
api were.

Suggesting that suppressing the safety checks will result in random
and non reproducable failures seems a little extreme, if not
superstitious.

As i was discussing with Nicholas, api's are not by default thread
unsafe.  In fact it is the other way around, most api's should be
perfectly thread safe if access to them is synchronised.

It is my undestanding that sqlite3_stmt * represents an instance of
the VM, and that most of the internal virtual machine operations are
reentrant, simply operating on the supplied VM and modifying its
internal state.

If this is the case, i cant see how you would ever have thread safety
problems so long as no two threads are modifying the same sqlite3_stmt
* at the same time.

Surely this is something that can be reasoned through ?

Adding deliberate safety check imbalances to sqlite3_step becuase it
is the only routine that can be called mutliple times in sequence is a
bit extreme.  Why not allow users who don't believe in withces and
goblins go ahead and use the api in a synchronised way.  The safety
checks will still be there, just not as agressively.

You should also considder that the safety checks only provide a
warning for the user.  Given that they are subject to race conditions,
you can never guarantee that the safety checks will trap all errors.
In fact i would argue that you are better off throwing an int 3
instruction than returning a misuse error.  Giving the programmer the
false idea that they can continue to run their program without stack
corruption is not a good idea, after all its just encouraging them to
try again.  Rather, according to what you are saying they should be
re-writing their code to access the api with a single thread.

So on the one hand you are actively disallowing users like myself from
synchronising thread access to the api, but on the other hand
encouraging other users to not fix the actual problems.  It doesnt
make sense...

Im sorry for being so harsh, and i know im not winning any friends
here, but i dont give up easily.  I think sqlite is an excellent piece
of software and i believe there is a way which it can be made to
accomodate these concerns.

I have tested the changes i made with some 50 threads performing
probably thousands of statements a second, and had no errors.

If you are willing to work through the problem, i am willing to assist
you in any way that i can.  Wether it be writing test cases or making
modifications to the source.

If there are problems as you say, then there is no reason why they
cant alteast be documented and verified.

Emerson



On 1/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
>
> The problem i had was with sqlite not being compatible with the simple
> design that i wanted.  I did try several alternate designs, but only
> as a way of working around the problem i had with sqlite.  It took a
> long time but eventually i managed to get someone to explain why
> sqlite had that particular problem, and i was able to modify the
> sqlite source to resolve the issue.
>

I remain unconvinced that there is any problem with SQLite in
regard to thread safety.  It is my belief that by disabling the
safety check routines, you are opening your self up to lots of
problems.  Those routines are there to protect you, the programmer,
and to make your programming errors obvious.  By disabling those
checks, you have not fixed the problem.  You have merely
suppressed the symptoms so that you will get rare, random
failures that cannot be easily reproduced.  On your own head
be it.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread drh
"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> 
> The problem i had was with sqlite not being compatible with the simple
> design that i wanted.  I did try several alternate designs, but only
> as a way of working around the problem i had with sqlite.  It took a
> long time but eventually i managed to get someone to explain why
> sqlite had that particular problem, and i was able to modify the
> sqlite source to resolve the issue.
> 

I remain unconvinced that there is any problem with SQLite in
regard to thread safety.  It is my belief that by disabling the
safety check routines, you are opening your self up to lots of
problems.  Those routines are there to protect you, the programmer,
and to make your programming errors obvious.  By disabling those
checks, you have not fixed the problem.  You have merely 
suppressed the symptoms so that you will get rare, random
failures that cannot be easily reproduced.  On your own head
be it.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:

Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread John Stanton
Work on turning "reasonable" into "adequate" or "good" and it will help 
you get an intuitive feel for the design of programs such as yours. 
Then your programs will be simple, fast and robust, as Einstein 
counselled - "Make it as simple a possible, but no simpler".


I also suggest that you take Niklaus Wirth's advice and when you run 
into a difficulty backtrack your work and scrap everything until you 
reach a point where there are no problems and start again from that 
point taking a different approach having learned a lesson from your 
previous attempt.


By the way, I doubt whether you are using a modern operating system, it 
is most likely to be old technology like Windows or Linux.  Neither 
supports much in the way of parallelism.


Emerson Clarke wrote:

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote:


If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
> Hi Emerson,
>
> I just hope you don't reinvent the wheel ;) I haven't yet had the 
need to
> index things the way you describe it. May be I should take that as 
one of my

> next pet projects to get a handle on this type of task.
>
> The problem as I see it is basically, that any way you design this: 
If the
> storage tasks take 90% of your indexing time, then any 
parallelization may

> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>
> As far as static initialization: That it occurs before main() and is 
out of
> your control was the point I was getting across. That's why I wrote 
that

> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>
> Thanks for the advice.  During the indexing process i need to select 
and

> optionally insert records into a table so i cant ignore the outcomes.
>
> Basically the indexing process does compression, so for each 
document it
> inserts words into a table and looks up keys.  Every word in the 
document

> gets swapped with a key, and new keys are inserted as needed.
>
> There are some problems with splitting the work up in a different 
way as you
> suggested. I would either end up with a lot of queues or i would 
have to
> stagger the work so that the entire data set gets processed in 
stages which
> doesnt scale very well and isnt particularly fault tollerant.  When 
building
> an index, you want the structure to be built up progressively, so 
that you

> can pause the process and resume it later on whilst still having useful
> results.
>
> I would be worried that in a queued design, the overhead and 
bottlenecks
> caused by the buffering, message passing, and context switching 
would reduce

> the performance to that of a single thread.
> Especially since the database operations represent 90% of the work, 
all you
> would really be doing is attempting to serialise things in a 
multithreaded

> way.
>
> Im sure having worked on multithreaded systems you appreciate that 
sometimes
> simple designs are better, and i think i have a pretty good handle 
on what

> it is that im trying to do.

Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Emerson Clarke

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote:

If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
> Hi Emerson,
>
> I just hope you don't reinvent the wheel ;) I haven't yet had the need to
> index things the way you describe it. May be I should take that as one of my
> next pet projects to get a handle on this type of task.
>
> The problem as I see it is basically, that any way you design this: If the
> storage tasks take 90% of your indexing time, then any parallelization may
> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>
> As far as static initialization: That it occurs before main() and is out of
> your control was the point I was getting across. That's why I wrote that
> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>
> Thanks for the advice.  During the indexing process i need to select and
> optionally insert records into a table so i cant ignore the outcomes.
>
> Basically the indexing process does compression, so for each document it
> inserts words into a table and looks up keys.  Every word in the document
> gets swapped with a key, and new keys are inserted as needed.
>
> There are some problems with splitting the work up in a different way as you
> suggested. I would either end up with a lot of queues or i would have to
> stagger the work so that the entire data set gets processed in stages which
> doesnt scale very well and isnt particularly fault tollerant.  When building
> an index, you want the structure to be built up progressively, so that you
> can pause the process and resume it later on whilst still having useful
> results.
>
> I would be worried that in a queued design, the overhead and bottlenecks
> caused by the buffering, message passing, and context switching would reduce
> the performance to that of a single thread.
> Especially since the database operations represent 90% of the work, all you
> would really be doing is attempting to serialise things in a multithreaded
> way.
>
> Im sure having worked on multithreaded systems you appreciate that sometimes
> simple designs are better, and i think i have a pretty good handle on what
> it is that im trying to do.
>
> You never have control over static initialisation, it happens before main().
> If i was writing very specific code to suit just this situation then maybe
> as you say i wouldnt need to worry about it.  But im also writing a database
> api, and that api is used for many different things.  My considderations are
> not just for this one problem, but also for the best general way to code the
> api so that it is safe and efficient in all circumstances.  So far the
> client/server design is the only way i can achieve true thread safety.
>
> If i could work out why sqlite3_step() causes problems across multiple
> threads i could probably make things a little faster and i could do away
> with the need for a client/server design.
>
> Emerson
>
>
> On 1/3/07, Michael Ruck <[EMAIL PROTEC

Re: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Emerson Clarke

Bill,

Thanks for the description, thats pretty much how i designed the
index, but with a few modifications.  The filesystem becomes the tree
structure which is indexed by a hash of the original document url. It
works like a big hashtable so its quite scalable.




>
Sorry if this has been posited before, but our solution to this was data
normalisation. IE, we store only the filename, and an integer ID which
is the directory of the file, and of course a lookup table of id's to
directory names. This helped us greatly, as well as reducing overhead of
amount of data stored.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread John Stanton
If Emeroson intuitively understood the essential architecture of the PC 
he is using he would not be having difficulty with his concept of how to 
use it.  It is essentially a serial device, multi-tasking device and 
parallelism in the forms of threading and multi processing is a 
sophistication added with a high overhead.


I recollect an insightful CS professor impressing on his his students 
the concept by explaining to them that the machines on their desks were 
descended from a device invented to be a gas pump controller.


A machine designed from first principles to manage parrallel processing 
would be very different.


Michael Ruck wrote:

Hi Emerson,

I just hope you don't reinvent the wheel ;) I haven't yet had the need to
index things the way you describe it. May be I should take that as one of my
next pet projects to get a handle on this type of task.

The problem as I see it is basically, that any way you design this: If the
storage tasks take 90% of your indexing time, then any parallelization may
be a waste of effort. Even if you use a synchronization object you're
essentially serializing things in a (complicated) multithreaded way...

As far as static initialization: That it occurs before main() and is out of
your control was the point I was getting across. That's why I wrote that
this type of initialization should be avoided, unless there's no better
design for it.

Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 3. Januar 2007 20:31

An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Thanks for the advice.  During the indexing process i need to select and
optionally insert records into a table so i cant ignore the outcomes.

Basically the indexing process does compression, so for each document it
inserts words into a table and looks up keys.  Every word in the document
gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way as you
suggested. I would either end up with a lot of queues or i would have to
stagger the work so that the entire data set gets processed in stages which
doesnt scale very well and isnt particularly fault tollerant.  When building
an index, you want the structure to be built up progressively, so that you
can pause the process and resume it later on whilst still having useful
results.

I would be worried that in a queued design, the overhead and bottlenecks
caused by the buffering, message passing, and context switching would reduce
the performance to that of a single thread.
Especially since the database operations represent 90% of the work, all you
would really be doing is attempting to serialise things in a multithreaded
way.

Im sure having worked on multithreaded systems you appreciate that sometimes
simple designs are better, and i think i have a pretty good handle on what
it is that im trying to do.

You never have control over static initialisation, it happens before main().
If i was writing very specific code to suit just this situation then maybe
as you say i wouldnt need to worry about it.  But im also writing a database
api, and that api is used for many different things.  My considderations are
not just for this one problem, but also for the best general way to code the
api so that it is safe and efficient in all circumstances.  So far the
client/server design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do away
with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:


Emerson,

Now I understand your current implementation.  You seemingly only 
partially split up the work in your code. I'd schedule the database 
operation and not wait on the outcome, but start on the next task. 
When the database finishes and has retrieved its result, schedule some 
work package on a third thread, which only processes the results etc. 
Split up the work in to repetitive, non blocking tasks. Use multiple 
queues and dedicated threads for parts of the operation or thread pools,


which process queues in parallel if possible.


From what I can tell you're already half way there.

I still don't see your static initialization problem, but that's 
another story. Actually I'd avoid using static initialization or 
static (singleton) instances, unless the design really requires it. 
Someone must control startup of the entire process, have that one 
(probably main/WinMain) take care that the work queues are available. 
Afterwards the order of thread starts doesn't matter... Actually it is 
non-deterministic anyway (unless you serialize this yourself.)


Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@

Re: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Bill King

Emerson Clarke wrote:


The indexing process works like this.

1.) Open a document and parse its contents.
2.) Look up records in the first database based on the contents of the
document, updating records where appropriate and inserting new ones.
3.) Transforming the document based on what was obtained from the
first database.
4.) Create a filesystem structure in the form of a folder and two or 
more files.

5.) Look up some more records in a second database, updating and
inserting as necessary.

For architectural reasons the above steps must be performed in that
order. This means that operations cant be separated or queued up in
the way that you suggested.  Each step is dependent on the previous
step.

But by having multiple threads and using synchronisation around the
database operations you can ensure that multiple database operations
are always pending.

Each thread will be at a different stage in the pipeline, but a few
will always be ready to perform a database operation so the idea is to
maximise the throughput.

As you said, the more rows per transaction, the more rows per second.
Which brings us back to the original issue.  Why cant i have multiple
threads all using the same connection within a single transaction ?

Of course i know the simple answer, which is that the current api does
not support this.  But im wondering why, and if there are any other
ways to achieve the desired performance.

Emerson


On 12/28/06, Roger Binns <[EMAIL PROTECTED]> wrote:


Emerson Clarke wrote:
> The idea is that because i am accessing two databases, and doing
> several file system operations per document, there should be a large
> gain by using many threads.  There is no actual indexing process, the
> whole structure is the index, but if anything the database operations
> take the most time.  The filesystem operations have a very small
> amount of overhead.

That is all unclear from your original description.  Aren't you trying
to "index" several million documents and doesn't the process of indexing
consist of two parts?

1: Open the document, parse it in various ways, build index data, 
close it

2: Add a row to a SQLite database

My point was that #1 is way more work than #2, so you can run #1's in
multiple threads/processes and do #2 in a single thread using a
queue/pipe object for communication.

On the other hand, if #1 is way less work than #2 then you will be bound
by the speed at which you decide to make transactions in SQLite.  A 7200
rpm disk limits you to 60 transactions a second.  The more rows per
transaction, the more rows per second.

Roger

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 




Sorry if this has been posited before, but our solution to this was data 
normalisation. IE, we store only the filename, and an integer ID which 
is the directory of the file, and of course a lookup table of id's to 
directory names. This helped us greatly, as well as reducing overhead of 
amount of data stored.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Nicolas Williams
On Thu, Jan 04, 2007 at 12:50:01AM +, Emerson Clarke wrote:
> My oppologies, your right that explanation had been given.

OK.

> But i didnt actually take it seriously, i guess i found it hard to
> believe that it being the easier option was the only reason why this
> limitation was in place.

SQLite is a large pile of code.  Other libraries that I'm familiar with
that have taken this approach are larger still.  Retrofitting MT-safety
into these is hard, so the easiest path is often taken.  (It may be that
SQLite was always intended to be MT-safe, but I don't know that for a
fact.)

> If this is the case, then surely the fix is simple.  Given that i
> assume it is safe to have multiple sqlite3_step() calls active on a
> single connection on a single thread.  And given what you have said
> about sqlite not already checking data structures that would be shared
> by multiple threads, then surely all that needs to happen is for the
> misuse detection to be removed.

Your first assumption, as has been explained repeatedly, is incorrect.

Oh, wait.  I think I understand what's happening.  You've missunderstood
what you've been told (your next paragraph makes me think so).

You *can* use sqlite3_step() with the same db context in multiple
threads, you just have to synchronize so this doesn't happen
*concurrently*.

If you remove the misuse detection but don't synchronize I believe
you'll find that your application will crash or worse.

> Since there is usually nothing which needs to be done to specifically
> make any api thread safe other than synchronising access too it.  If
> by synchronising access to the api calls i can ensure that no two
> threads use any data structure at the same time, everything should
> work fine right ?

Yes.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Emerson Clarke

Nicholas,

My oppologies, your right that explanation had been given.

But i didnt actually take it seriously, i guess i found it hard to
believe that it being the easier option was the only reason why this
limitation was in place.

If this is the case, then surely the fix is simple.  Given that i
assume it is safe to have multiple sqlite3_step() calls active on a
single connection on a single thread.  And given what you have said
about sqlite not already checking data structures that would be shared
by multiple threads, then surely all that needs to happen is for the
misuse detection to be removed.

Since there is usually nothing which needs to be done to specifically
make any api thread safe other than synchronising access too it.  If
by synchronising access to the api calls i can ensure that no two
threads use any data structure at the same time, everything should
work fine right ?

This gets to the very core of the original issue i had, and why i said
sqlite was "actively" thread unsafe.  Because it terminates itself
with misuse errors even when in theory it is perfectly safe to be used
with multiple threads provided the user synchronises on every api
call, which is easily achieved.

Am i making any sense ?

Emerson


On 1/3/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:

On Wed, Jan 03, 2007 at 11:22:36PM +, Emerson Clarke wrote:
> Ok,
>
> Well can you do me a favour and forward me the email where this was
> supposedly explained in all its technical glory.

Technical glory?  No.  At a high level it's this: that it is easier to
make an API like SQLite's thread-safe with exceptions like "only one
thread active in any given context object at any time" than it is to
put mutexes and what not around data structures that would be shared by
multiple threads if this exception were not stated.  Unless and until
you try to do it the other way you'll find this rationale to be
subjective.

Like I said, good luck.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



  1   2   3   >