Re: [sqlite] threads and transactions

2011-02-06 Thread Pavel Ivanov
> i'm not sure at all it's practical to create an 80MB string with one giant 
> SQL statement in it to send all that data at once.

80MB string is not too bad after all (probably even less than 1% of
the whole memory). So you better do it this way. BTW, it won't be one
SQL statement, it will be 40k SQL statements which will be executed
one-by-one when the parsing is over.


Pavel

On Sun, Feb 6, 2011 at 1:07 PM, David M. Cotter  wrote:
> bummrz.
>
> here's what i'm trying to do
>
> my software has two music stores
> it downloads an XML for each music store, both at the same time
> it then parses the XML into SQLite
> each song has about 7 to 10 bits of data (columns)
> and there may be 40k songs.
> i need to be able to add all 40k songs to the DB in a way that can be rolled 
> back if a parse error occurs or if the user hits the stop sign or if the user 
> quits during the parse phase (which takes about 30 seconds)
> see?
>
> and the two stores operate independently, on different threads
>
> i'm not sure at all it's practical to create an 80MB string with one giant 
> SQL statement in it to send all that data at once.
>
> so here's my new idea:
>
> is there a way to merge two databases that have already been created?
> so each store, during the update phase, would write to it's own personal 
> database
> only when it's complete and there are no errors or aborts do i then just 
> "merge" the store DB with the main DB, but this depends on the ability to run 
> a single "merge" command?
>
> is there such a thing?
>
> On Feb 6, 2011, at 9:56 AM, Simon Slavin wrote:
>
>>
>> On 6 Feb 2011, at 5:42pm, David M. Cotter wrote:
>>
 If you don't need this behaviour because you're confident you'll never get 
 a clash, then you could accumulate your INSERTs in memory, then blast 
 through them when you would previously have just done the COMMIT.
>>>
>>>
>>> i will never have a clash because i manage the primary keys myself.
>>> is there an SQL way to do that or do you mean i should build my own struct 
>>> to hold the data temporarily?
>>
>> Sorry, no automated way to do it.  You'll have to do it yourself.
>>
>> If, for example, you're using sqlite3_exec() and just handing it a SQL 
>> string that starts "INSERT INTO ..." then you could make an array, store 
>> these strings in it as you're preparing them, and just flush the array to 
>> your database when you have 100 (or 1000) of them.  This will lock up your 
>> database for less time because you won't be holding it locked while you 
>> prepare the SQL commands, just enough time to retrieve them from an array.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-06 Thread Simon Slavin

On 6 Feb 2011, at 6:07pm, David M. Cotter wrote:

> here's what i'm trying to do
> 
> my software has two music stores
> it downloads an XML for each music store, both at the same time

Is there any need to do that ?  All your threads are running on the same 
computer.  That's no faster than just querying each store in turn, processing 
the next new record (if there is one) then moving to the next store.  If you 
can do that, stop here.

> it then parses the XML into SQLite 
> each song has about 7 to 10 bits of data (columns)
> and there may be 40k songs.
> i need to be able to add all 40k songs to the DB in a way that can be rolled 
> back if a parse error occurs or if the user hits the stop sign or if the user 
> quits during the parse phase (which takes about 30 seconds)
> see?

I don't see why you have a problem here.  Each thread writes to the database, 
doing all the writing for one record in a single INSERT command.  While that 
INSERT is happening, the other thread is blocked.  So what ?  It's all 
happening on a single computer anyway: the bottleneck in the amount of time 
taken is probably how long it takes to write to your hard disk rather than 
processing time.  If this is acceptable, stop here.

By the way, if at all possible, stop using threads and use processes instead.  
Perhaps run one process for all the writing to do with each store.

> and the two stores operate independently, on different threads

Have a third thread that actually does the writing.  The two threads you 
already have just prepare the data and hand it over to the third thread, which 
is the only one which actually writes to the SQL database.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-06 Thread David M. Cotter
bummrz.

here's what i'm trying to do

my software has two music stores
it downloads an XML for each music store, both at the same time
it then parses the XML into SQLite 
each song has about 7 to 10 bits of data (columns)
and there may be 40k songs.
i need to be able to add all 40k songs to the DB in a way that can be rolled 
back if a parse error occurs or if the user hits the stop sign or if the user 
quits during the parse phase (which takes about 30 seconds)
see?

and the two stores operate independently, on different threads

i'm not sure at all it's practical to create an 80MB string with one giant SQL 
statement in it to send all that data at once.

so here's my new idea:

is there a way to merge two databases that have already been created?
so each store, during the update phase, would write to it's own personal 
database
only when it's complete and there are no errors or aborts do i then just 
"merge" the store DB with the main DB, but this depends on the ability to run a 
single "merge" command?

is there such a thing?

On Feb 6, 2011, at 9:56 AM, Simon Slavin wrote:

> 
> On 6 Feb 2011, at 5:42pm, David M. Cotter wrote:
> 
>>> If you don't need this behaviour because you're confident you'll never get 
>>> a clash, then you could accumulate your INSERTs in memory, then blast 
>>> through them when you would previously have just done the COMMIT.
>> 
>> 
>> i will never have a clash because i manage the primary keys myself.
>> is there an SQL way to do that or do you mean i should build my own struct 
>> to hold the data temporarily?
> 
> Sorry, no automated way to do it.  You'll have to do it yourself.
> 
> If, for example, you're using sqlite3_exec() and just handing it a SQL string 
> that starts "INSERT INTO ..." then you could make an array, store these 
> strings in it as you're preparing them, and just flush the array to your 
> database when you have 100 (or 1000) of them.  This will lock up your 
> database for less time because you won't be holding it locked while you 
> prepare the SQL commands, just enough time to retrieve them from an array.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-06 Thread Simon Slavin

On 6 Feb 2011, at 5:42pm, David M. Cotter wrote:

>> If you don't need this behaviour because you're confident you'll never get a 
>> clash, then you could accumulate your INSERTs in memory, then blast through 
>> them when you would previously have just done the COMMIT.
> 
> 
> i will never have a clash because i manage the primary keys myself.
> is there an SQL way to do that or do you mean i should build my own struct to 
> hold the data temporarily?

Sorry, no automated way to do it.  You'll have to do it yourself.

If, for example, you're using sqlite3_exec() and just handing it a SQL string 
that starts "INSERT INTO ..." then you could make an array, store these strings 
in it as you're preparing them, and just flush the array to your database when 
you have 100 (or 1000) of them.  This will lock up your database for less time 
because you won't be holding it locked while you prepare the SQL commands, just 
enough time to retrieve them from an array.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-06 Thread David M. Cotter
> If you don't need this behaviour because you're confident you'll never get a 
> clash, then you could accumulate your INSERTs in memory, then blast through 
> them when you would previously have just done the COMMIT.


i will never have a clash because i manage the primary keys myself.
is there an SQL way to do that or do you mean i should build my own struct to 
hold the data temporarily?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Igor Tandetnik
David M. Cotter  wrote:
> i may not have been clear
> 
> i want to begin transactions on different threads at once

Why, if you don't mind me asking? Your hard drive has only one write head. What 
makes you feel that writing to the same file from multiple threads would be any 
faster than doing it from one thread?

In any case, you can't do with SQLite what you think you want to do. 
Personally, I'd have one thread dedicated to SQLite work, then all the other 
threads would generate the data to be inserted, and feed it to the SQLite 
worker via a producer-consumer queue.

> but does inserting data during a transaction actually block too?

Yes.

> is inserting considered a "writing transaction" if there is a "begin" before 
> it?

Yes.

> cuz it's not actually writing to the DB proper,
> it's writing to it's journal file

Incorrect. SQLite writes previous, unmodified data into journal file, prior to 
overwriting parts of actual database file with new data. Committing a 
transaction consists simply of deleting the journal file. Rolling back means 
copying the data over from the journal file back to the database file.

> i'm not using "begin immediate", just using "begin"

This just means the transaction becomes a write transaction a split second 
later, when the first INSERT statement is executed.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Igor Tandetnik
David M. Cotter  wrote:
>> In SQLite every write is in a transaction whether you declare one with BEGIN 
>> or not.  If you don't declare a transaction, SQLite
>> invisibly surrounds each individual INSERT or UPDATE with a BEGIN and 
>> COMMIT. 
> sure, that's fine.  but if you do your own BEGIN, then any INSERT you do 
> after that, *before* you do the COMMIT isn't actually
> writing anything into the database, right? 
> 
> it's really only the COMMIT that actually writes to the database, right?

Wrong. There is a limited in-memory cache, but once the amount of data grows 
large enough, intermediate changes need to be spilled to disk and, indeed, 
written to the database file.

> but you're not addressing the case where i have manually started a BEGIN.  
> after i manually do BEGIN, does the next INSERT need
> to block, or is it the COMMIT that actually does the blocking? 

The first INSERT acquires a reserved lock on the database. No other connection 
will then be able to acquite a reserved lock, so there would be no other 
writer. For details, see http://www.sqlite.org/lockingv3.html

> "if each thread is collecting several rows that it will insert in one fell 
> swoop, why should the mere gathering of rows (without
> actually committing them yet) block another thread from gathering it's own 
> set of rows?

You are free to "gather" the data that would go into the database before 
starting the write transaction.

> it's only when all the rows are gathered
> that i actually commit the change.

There's no mechanism in SQLite to control when precisely intermediate data is 
written to the database file. It may happen before COMMIT.

> am i making sense?

Yes, but you are laboring under incorrect assumptions, and thus reaching 
incorrect conclusions.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Simon Slavin

On 6 Feb 2011, at 1:30am, David M. Cotter wrote:

>> In SQLite every write is in a transaction whether you declare one with BEGIN 
>> or not. If you don't declare a transaction, SQLite invisibly surrounds each 
>> individual INSERT or UPDATE with a BEGIN and COMMIT.
> sure, that's fine.  but if you do your own BEGIN, then any INSERT you do 
> after that, *before* you do the COMMIT isn't actually writing anything into 
> the database, right?
> 
> it's really only the COMMIT that actually writes to the database, right?

An alternative point of view is that the transactions hit the database 
immediately, and that either ROLLBACK or failure to do a COMMIT backs them out. 
 It depends what you think 'the database' is.  Remember that it's possible to 
run SQLite without any journaling at all: even uncommitted changes go to the 
database file.  If a journal file exists, then I feel 'the database' includes 
both the database file and the journal file.

>>> i understand that one commit will block the other
>>> 
>>> but does inserting data during a transaction actually block too?
>> 
>> It is each transaction which blocks each other transaction, but the default 
>> behaviour is not to lock the file until the first command that makes a 
>> change.  So once one thread has executed an INSERT, all other threads will 
>> be blocked at their own BEGINs and will never even get to do an INSERT until 
>> the first thread has done its COMMIT.
> 
> but you're not addressing the case where i have manually started a BEGIN.  
> after i manually do BEGIN, does the next INSERT need to block, or is it the 
> COMMIT that actually does the blocking?

The first INSERT after the BEGIN starts the blocking.  Unless you specifically 
override the behaviour by doing a BEGIN IMMEDIATE (which you've previously said 
you weren't doing).

>>> is inserting considered a "writing transaction" if there is a "begin" 
>>> before it? cuz it's not actually writing to the DB proper, it's writing to 
>>> it's journal file, saving things up until the "commit" or "rollback".
>> 
>> The natural question you're asking is something like "Well, if each thread 
>> is just inserting new rows, why do they need to block each-other ?  They 
>> don't care what data the other threads are inserting."
> 
> well, i would phrase it this way:
> 
> "if each thread is collecting several rows that it will insert in one fell 
> swoop, why should the mere gathering of rows (without actually committing 
> them yet) block another thread from gathering it's own set of rows?  it's 
> only when all the rows are gathered that i actually commit the change. Since 
> the database isn't being written to during the gathering phase, they don't 
> care what other threads are gathering".

See my previous answer.  If a program tries to add a row to the database that 
conflicts with one that's already there, it needs to get an error immediately.  
Not later when it has forgotten the context of the row that's causing the 
problem and can no longer figure out how to handle the problem.  So a thread 
that's trying to do an INSERT must be blocked until all the changes that will 
be done before it have been completed.  Also there's the problem that some 
applications need to know the rowid of a freshly-inserted row.

If you don't need this behaviour because you're confident you'll never get a 
clash, then you could accumulate your INSERTs in memory, then blast through 
them when you would previously have just done the COMMIT.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread David M. Cotter
forgive my not understanding this but i'm trying to be extremely clear and i am 
not sure from your answer whether you have understood my  question.

> In SQLite every write is in a transaction whether you declare one with BEGIN 
> or not.  If you don't declare a transaction, SQLite invisibly surrounds each 
> individual INSERT or UPDATE with a BEGIN and COMMIT.
sure, that's fine.  but if you do your own BEGIN, then any INSERT you do after 
that, *before* you do the COMMIT isn't actually writing anything into the 
database, right?

it's really only the COMMIT that actually writes to the database, right?

>> i understand that one commit will block the other
>> 
>> but does inserting data during a transaction actually block too?
> 
> It is each transaction which blocks each other transaction, but the default 
> behaviour is not to lock the file until the first command that makes a 
> change.  So once one thread has executed an INSERT, all other threads will be 
> blocked at their own BEGINs and will never even get to do an INSERT until the 
> first thread has done its COMMIT.

but you're not addressing the case where i have manually started a BEGIN.  
after i manually do BEGIN, does the next INSERT need to block, or is it the 
COMMIT that actually does the blocking?

do you see what i'm asking?

>> is inserting considered a "writing transaction" if there is a "begin" before 
>> it?  cuz it's not actually writing to the DB proper, it's writing to it's 
>> journal file, saving things up until the "commit" or "rollback".
> 
> The natural question you're asking is something like "Well, if each thread is 
> just inserting new rows, why do they need to block each-other ?  They don't 
> care what data the other threads are inserting."

well, i would phrase it this way:

"if each thread is collecting several rows that it will insert in one fell 
swoop, why should the mere gathering of rows (without actually committing them 
yet) block another thread from gathering it's own set of rows?  it's only when 
all the rows are gathered that i actually commit the change. Since the database 
isn't being written to during the gathering phase, they don't care what other 
threads are gathering".

am i making sense?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Simon Slavin

On 5 Feb 2011, at 11:00pm, David M. Cotter wrote:

> i may not have been clear
> 
> i want to begin transactions on different threads at once
> in each thread
>   begin a transaction
>   insert lots of data, this may take a long time
>   commit transaction

Okay, here's some background.

In SQLite every write is in a transaction whether you declare one with BEGIN or 
not.  If you don't declare a transaction, SQLite invisibly surrounds each 
individual INSERT or UPDATE with a BEGIN and COMMIT.

> i understand that one commit will block the other
> 
> but does inserting data during a transaction actually block too?

It is each transaction which blocks each other transaction, but the default 
behaviour is not to lock the file until the first command that makes a change.  
So once one thread has executed an INSERT, all other threads will be blocked at 
their own BEGINs and will never even get to do an INSERT until the first thread 
has done its COMMIT.

> is inserting considered a "writing transaction" if there is a "begin" before 
> it?  cuz it's not actually writing to the DB proper, it's writing to it's 
> journal file, saving things up until the "commit" or "rollback".

The natural question you're asking is something like "Well, if each thread is 
just inserting new rows, why do they need to block each-other ?  They don't 
care what data the other threads are inserting.".  The answer is that even the 
process of inserting new data needs to look at existing data.  There's the 
problem of dealing with columns marked as UNIQUE: each thread needs to be 
stopped if it tries to insert a new row that would violate database 
restrictions, and it has to be stopped on that instruction, not long after it 
has forgotten what it did that caused the problem.  And for the system to 
decide which thread should get the rejection it has to consider the operations 
in a particular order.  Which is why it only considers one transaction at a 
time.

For a fuller explanation of what causes and is blocked by locking see



Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread David M. Cotter
i may not have been clear

i want to begin transactions on different threads at once
in each thread
begin a transaction
insert lots of data, this may take a long time
commit transaction

i understand that one commit will block the other

but does inserting data during a transaction actually block too?

is inserting considered a "writing transaction" if there is a "begin" before 
it?  cuz it's not actually writing to the DB proper, it's writing to it's 
journal file, saving things up until the "commit" or "rollback".

i'm not using "begin immediate", just using "begin"


On Feb 5, 2011, at 2:53 PM, Pavel Ivanov wrote:

>> i understand that one "commit" will block all other threads from doing a 
>> "commit", "rollback" or any atomic transaction, until it's done, but are you 
>> saying i can't even add data on another thread while one has an open 
>> transaction?
> 
> There can be several simultaneous read-only transactions. But as long
> as one connection started a writing transaction (by executing "begin
> immediate" or by executing insert/update/delete after "begin") no
> other connection can start a writing transaction (it still can do
> read-only transactions for a while).
> 
> If you need a different behavior you need to use some other DBMS.
> 
> 
> Pavel
> 
> On Sat, Feb 5, 2011 at 5:48 PM, David M. Cotter  wrote:
>>> Transactions are per-connection and have nothing to do
>>> with threads. If you want different transactions in each thread you
>>> need to make one connection for each thread. But those transactions
>>> won't be able to execute simultaneously.
>> 
>> so if i open a separate connection on each thread
>> then each thread begins a transaction
>> you're saying one thread will block?
>> 
>> i understand that one "commit" will block all other threads from doing a 
>> "commit", "rollback" or any atomic transaction, until it's done, but are you 
>> saying i can't even add data on another thread while one has an open 
>> transaction?
>> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Pavel Ivanov
> i understand that one "commit" will block all other threads from doing a 
> "commit", "rollback" or any atomic transaction, until it's done, but are you 
> saying i can't even add data on another thread while one has an open 
> transaction?

There can be several simultaneous read-only transactions. But as long
as one connection started a writing transaction (by executing "begin
immediate" or by executing insert/update/delete after "begin") no
other connection can start a writing transaction (it still can do
read-only transactions for a while).

If you need a different behavior you need to use some other DBMS.


Pavel

On Sat, Feb 5, 2011 at 5:48 PM, David M. Cotter  wrote:
>> Transactions are per-connection and have nothing to do
>> with threads. If you want different transactions in each thread you
>> need to make one connection for each thread. But those transactions
>> won't be able to execute simultaneously.
>
> so if i open a separate connection on each thread
> then each thread begins a transaction
> you're saying one thread will block?
>
> i understand that one "commit" will block all other threads from doing a 
> "commit", "rollback" or any atomic transaction, until it's done, but are you 
> saying i can't even add data on another thread while one has an open 
> transaction?
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread David M. Cotter
> Transactions are per-connection and have nothing to do
> with threads. If you want different transactions in each thread you
> need to make one connection for each thread. But those transactions
> won't be able to execute simultaneously.

so if i open a separate connection on each thread
then each thread begins a transaction
you're saying one thread will block?

i understand that one "commit" will block all other threads from doing a 
"commit", "rollback" or any atomic transaction, until it's done, but are you 
saying i can't even add data on another thread while one has an open 
transaction?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] threads and transactions

2011-02-05 Thread Pavel Ivanov
> presuming this timeline is chronological, may i assume that step 4 is 
> committed first in the database?

You mean as a third transaction? No.

> and that steps 5 and 6 operate independently?

No.

> even when threads 1 and 2 open their individual transactions, i see only ONE 
> journal file
> logically i guess i would have expected to see one journal file per open 
> transaction per thread?

You are wrong. Transactions are per-connection and have nothing to do
with threads. If you want different transactions in each thread you
need to make one connection for each thread. But those transactions
won't be able to execute simultaneously.

> does step five commit thread 6?  does thread 6's rollback only operate on the 
> "stuff" done in thread 6?

Step 5 commits everything in thread 1, thread 1's rollback does
nothing. And btw if you checked your return codes from SQLite you'd
notice that "begin transaction" fails either in thread 2 or thread 1
(whichever comes last).


Pavel

On Sat, Feb 5, 2011 at 5:27 PM, David M. Cotter  wrote:
> i'm sure this topic has been beaten to death but i just really want to make 
> sure.
>
> i'm using ONE database, and one handle to it on all threads
>
> here's a theoretical timeline
>
> --
> 1) thread 1
> begin transaction
> do bunches of stuff
>
> 2) thread 2
> begin transaction
> do bunches of stuff
>
> 3) thread 1
> do more stuff
>
> 4) thread 0 (main thread)
> do atomic operation
>
> 5) thread 2
> do more stuff
> commit transaction
>
> 6) thread 1
> rollback transaction
> --
>
> presuming this timeline is chronological, may i assume that step 4 is 
> committed first in the database?
>
> and that steps 5 and 6 operate independently?
>
> even when threads 1 and 2 open their individual transactions, i see only ONE 
> journal file
> logically i guess i would have expected to see one journal file per open 
> transaction per thread?
>
> does step five commit thread 6?  does thread 6's rollback only operate on the 
> "stuff" done in thread 6?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users