[sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Gregory Petrosyan
Hello,

I am experiancing a weird problem: sometimes (1 time in a 10-100) when
2 processes try to open the same database file (and execute something
like 'create table foo if not exists'), one of them fails with
SQLITE_BUSY — despite 1 second (or bigger) timeout.

Processes themselves produce almost no DB activity; they merely start
and initialize the database roughly at the same time.
sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout().

Unfortunately I can't write a small program that reproduces this
reliably: my test program never crashes this way (except when timeout
is really small — say, 10ms). Yet, this behaviour is rare but
reproducible, both from Python (apsw) on Windows, Python (apsw) on OS
X and native C API on OS X (in entirely different program).

Can you please point me at what can be the cause of this?

       Gregory
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Gregory Petrosyan
On Tue, Apr 3, 2012 at 3:53 PM, Simon Slavin  wrote:
>
> On 3 Apr 2012, at 9:53am, Dan Kennedy  wrote:
>
>> As Jay says, deadlock is not possible for implicit transactions.
>> SQLite will keep retrying until either your busy-handler returns
>> zero (if you configured a busy-handler) or the timeout is reached
>> (if you configured a timeout). It sounds like the latter in this
>> case.
>
> You seem to have a setup where your failures can be produced on demand, 
> albeit at random.  So you can test whether this is a timeout-related problem 
> but varying your timeout.  Run it, and log how many failures you get.  Then 
> multiply (or divide) your timeout setting by ten and run it again.  If you 
> get failures at the same interval, it's not a timeout-related problem.

I've managed to make *something* reproducible. Take a look at the
following Python code, I hope it is self-explanatory:

--

import os
import sys
import apsw
import multiprocessing as mp


BUSYTIMEOUT = 5000
NPROCS = 4


def rmdb():
try:
os.remove('file.db')
os.remove('file.db-wal')
os.remove('file.db-shm')
except:
pass


def child():
try:
db = apsw.Connection('file.db')
db.setbusytimeout(BUSYTIMEOUT)
try:
db.cursor().execute('PRAGMA JOURNAL_MODE=WAL')
except:
print >>sys.stderr,  "Error with PRAGMA:", sys.exc_info()[0]
db.close()
except:
print >>sys.stderr,  "Unexpected error:", sys.exc_info()[0]


if __name__ == '__main__':
while True:
rmdb()

ps = [mp.Process(target=child) for i in xrange(NPROCS)]
for p in ps:
p.daemon = True
for p in ps:
p.start()
for p in ps:
p.join()

--

It reliably results in the "deadlock" scenario described above; hence
SQLite returns _BUSY for "PRAGMA"  despite the big timeout (timeout
value seems to have no effect).

Can you please describe why SQLite does not retries the implicit
"PRAGMA" transaction here?

       Gregory
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Simon Slavin

On 3 Apr 2012, at 9:53am, Dan Kennedy  wrote:

> As Jay says, deadlock is not possible for implicit transactions.
> SQLite will keep retrying until either your busy-handler returns
> zero (if you configured a busy-handler) or the timeout is reached
> (if you configured a timeout). It sounds like the latter in this
> case.

You seem to have a setup where your failures can be produced on demand, albeit 
at random.  So you can test whether this is a timeout-related problem but 
varying your timeout.  Run it, and log how many failures you get.  Then 
multiply (or divide) your timeout setting by ten and run it again.  If you get 
failures at the same interval, it's not a timeout-related problem.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Dan Kennedy

On 04/03/2012 04:20 PM, Gregory Petrosyan wrote:

On Tue, Apr 3, 2012 at 12:53 PM, Dan Kennedy  wrote:

As a side note: why does not SQLite automatically retry implicit
transactions after invoking busy handler?


It's a race. That's what SQLITE_BUSY means.

As Jay says, deadlock is not possible for implicit transactions.
SQLite will keep retrying until either your busy-handler returns
zero (if you configured a busy-handler) or the timeout is reached
(if you configured a timeout). It sounds like the latter in this
case.


I don't follow the logic here: if deadlock is not possible with
implicit transactions, SQLite should retry them after specified
timeout, but not return _BUSY, forcing me into implementing the same
retry logic, no?


It continues attempting the operation until the length of time
since the first attempt exceeds the configured timeout. At which
point it returns SQLITE_BUSY.


If you are using WAL mode, check that connections are being closed
properly. If they are not, a *-wal file may be left on the disk even
after all active clients have disconnected. In this case, when the
next client connects it has to read and process the entire *-wal
file. This is called "recovery", and the client holds an exclusive
lock on the db file while it is underway. Any other clients
that attempt to read or write the db see an SQLITE_BUSY. Any
busy-handler (or timeout) will apply here, but if recovery takes a
while then a short timeout could be exceeded.


Am I correct that if e.g. I have 2 processes, A and B, using the same
WAL-enabled database, then following:
1. A creates the database and starts to write data (connection is open
all the time)
2. B tries to connect to the database
can easily result in SQLITE_BUSY returned to either A or B?


No.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Gregory Petrosyan
On Tue, Apr 3, 2012 at 12:53 PM, Dan Kennedy  wrote:
>> As a side note: why does not SQLite automatically retry implicit
>> transactions after invoking busy handler?
>
> It's a race. That's what SQLITE_BUSY means.
>
> As Jay says, deadlock is not possible for implicit transactions.
> SQLite will keep retrying until either your busy-handler returns
> zero (if you configured a busy-handler) or the timeout is reached
> (if you configured a timeout). It sounds like the latter in this
> case.

I don't follow the logic here: if deadlock is not possible with
implicit transactions, SQLite should retry them after specified
timeout, but not return _BUSY, forcing me into implementing the same
retry logic, no?

> If you are using WAL mode, check that connections are being closed
> properly. If they are not, a *-wal file may be left on the disk even
> after all active clients have disconnected. In this case, when the
> next client connects it has to read and process the entire *-wal
> file. This is called "recovery", and the client holds an exclusive
> lock on the db file while it is underway. Any other clients
> that attempt to read or write the db see an SQLITE_BUSY. Any
> busy-handler (or timeout) will apply here, but if recovery takes a
> while then a short timeout could be exceeded.

Am I correct that if e.g. I have 2 processes, A and B, using the same
WAL-enabled database, then following:
1. A creates the database and starts to write data (connection is open
all the time)
2. B tries to connect to the database
can easily result in SQLITE_BUSY returned to either A or B?

       Gregory
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-03 Thread Dan Kennedy

On 04/03/2012 12:48 PM, Gregory Petrosyan wrote:

On Mon, Apr 2, 2012 at 7:51 PM, Jay A. Kreibich  wrote:

On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the 
wall:

Hello,

I am experiancing a weird problem: sometimes (1 time in a 10-100) when
2 processes try to open the same database file (and execute something
like 'create table foo if not exists'), one of them fails with
SQLITE_BUSY ??? despite 1 second (or bigger) timeout.

Processes themselves produce almost no DB activity; they merely start
and initialize the database roughly at the same time.
sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout().

Unfortunately I can't write a small program that reproduces this
reliably: my test program never crashes this way (except when timeout
is really small ??? say, 10ms). Yet, this behaviour is rare but
reproducible, both from Python (apsw) on Windows, Python (apsw) on OS
X and native C API on OS X (in entirely different program).

Can you please point me at what can be the cause of this?


  You may be seeing a deadlock situation.  This is most commonly
  associated with explicit transactions (that are open for a longer
  period of time), but it is possible with implicit transactions.

  If two connections attempt to write to the DB at the same time, it is
  possible for them to deadlock on the file locking.  SQLite recognizes
  this and has one of the connections back off with an SQLITE_BUSY error.
  If this happens in an explicit transaction, the program needs to
  ROLLBACK the current transaction and start over.  In the case of an
  implicit transaction around a statement, you can safely re-run
  the statement.


Thanks a lot for the reply.

Shouldn't it be extremely unlikely for this situation to happen, though?

Can it be diagnosed with more certaincy? I feel uncomfortable guessing
(hoping) that the bug exists due to this race/deadlock.

As a side note: why does not SQLite automatically retry implicit
transactions after invoking busy handler?


It's a race. That's what SQLITE_BUSY means.

As Jay says, deadlock is not possible for implicit transactions.
SQLite will keep retrying until either your busy-handler returns
zero (if you configured a busy-handler) or the timeout is reached
(if you configured a timeout). It sounds like the latter in this
case.

If you are using WAL mode, check that connections are being closed
properly. If they are not, a *-wal file may be left on the disk even
after all active clients have disconnected. In this case, when the
next client connects it has to read and process the entire *-wal
file. This is called "recovery", and the client holds an exclusive
lock on the db file while it is underway. Any other clients
that attempt to read or write the db see an SQLITE_BUSY. Any
busy-handler (or timeout) will apply here, but if recovery takes a
while then a short timeout could be exceeded.

Other than that, you probably just want to increase the busy-timeout
some.











___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-02 Thread Gregory Petrosyan
On Mon, Apr 2, 2012 at 7:51 PM, Jay A. Kreibich  wrote:
> On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the 
> wall:
>> Hello,
>>
>> I am experiancing a weird problem: sometimes (1 time in a 10-100) when
>> 2 processes try to open the same database file (and execute something
>> like 'create table foo if not exists'), one of them fails with
>> SQLITE_BUSY ??? despite 1 second (or bigger) timeout.
>>
>> Processes themselves produce almost no DB activity; they merely start
>> and initialize the database roughly at the same time.
>> sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout().
>>
>> Unfortunately I can't write a small program that reproduces this
>> reliably: my test program never crashes this way (except when timeout
>> is really small ??? say, 10ms). Yet, this behaviour is rare but
>> reproducible, both from Python (apsw) on Windows, Python (apsw) on OS
>> X and native C API on OS X (in entirely different program).
>>
>> Can you please point me at what can be the cause of this?
>
>  You may be seeing a deadlock situation.  This is most commonly
>  associated with explicit transactions (that are open for a longer
>  period of time), but it is possible with implicit transactions.
>
>  If two connections attempt to write to the DB at the same time, it is
>  possible for them to deadlock on the file locking.  SQLite recognizes
>  this and has one of the connections back off with an SQLITE_BUSY error.
>  If this happens in an explicit transaction, the program needs to
>  ROLLBACK the current transaction and start over.  In the case of an
>  implicit transaction around a statement, you can safely re-run
>  the statement.

Thanks a lot for the reply.

Shouldn't it be extremely unlikely for this situation to happen, though?

Can it be diagnosed with more certaincy? I feel uncomfortable guessing
(hoping) that the bug exists due to this race/deadlock.

As a side note: why does not SQLite automatically retry implicit
transactions after invoking busy handler? This is IMO the behaviout
most people will think SQLite has (after reading the docs); if this is
not true, it is almost impossible to rely on busy handler (timeout) at
all, and it is difficult to understand the need for it — if I need to
wrap all queries into retry/sleep on my own, why have extra strange
internal timeouts in SQLite?

       Gregory
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-02 Thread Jay A. Kreibich
On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the 
wall:
> Hello,
> 
> I am experiancing a weird problem: sometimes (1 time in a 10-100) when
> 2 processes try to open the same database file (and execute something
> like 'create table foo if not exists'), one of them fails with
> SQLITE_BUSY ??? despite 1 second (or bigger) timeout.
> 
> Processes themselves produce almost no DB activity; they merely start
> and initialize the database roughly at the same time.
> sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout().
> 
> Unfortunately I can't write a small program that reproduces this
> reliably: my test program never crashes this way (except when timeout
> is really small ??? say, 10ms). Yet, this behaviour is rare but
> reproducible, both from Python (apsw) on Windows, Python (apsw) on OS
> X and native C API on OS X (in entirely different program).
> 
> Can you please point me at what can be the cause of this?

  You may be seeing a deadlock situation.  This is most commonly
  associated with explicit transactions (that are open for a longer
  period of time), but it is possible with implicit transactions.
  
  If two connections attempt to write to the DB at the same time, it is
  possible for them to deadlock on the file locking.  SQLite recognizes
  this and has one of the connections back off with an SQLITE_BUSY error.
  If this happens in an explicit transaction, the program needs to
  ROLLBACK the current transaction and start over.  In the case of an
  implicit transaction around a statement, you can safely re-run
  the statement.

 http://sqlite.org/c3ref/busy_handler.html 

 The presence of a busy handler does not guarantee that it will be
 invoked when there is lock contention. If SQLite determines that
 invoking the busy handler could result in a deadlock, it will go
 ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of
 invoking the busy handler. Consider a scenario where one process
 is holding a read lock that it is trying to promote to a reserved
 lock and a second process is holding a reserved lock that it is
 trying to promote to an exclusive lock. The first process cannot
 proceed because it is blocked by the second and the second process
 cannot proceed because it is blocked by the first. If both
 processes invoke the busy handlers, neither will make any
 progress. Therefore, SQLite returns SQLITE_BUSY for the first
 process, hoping that this will induce the first process to release
 its read lock and allow the second process to proceed.


   -j


-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes

2012-04-02 Thread Gregory Petrosyan
Hello,

I am experiancing a weird problem: sometimes (1 time in a 10-100) when
2 processes try to open the same database file (and execute something
like 'create table foo if not exists'), one of them fails with
SQLITE_BUSY — despite 1 second (or bigger) timeout.

Processes themselves produce almost no DB activity; they merely start
and initialize the database roughly at the same time.
sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout().

Unfortunately I can't write a small program that reproduces this
reliably: my test program never crashes this way (except when timeout
is really small — say, 10ms). Yet, this behaviour is rare but
reproducible, both from Python (apsw) on Windows, Python (apsw) on OS
X and native C API on OS X (in entirely different program).

Can you please point me at what can be the cause of this?

       Gregory
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users