[sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
Hello, I am experiancing a weird problem: sometimes (1 time in a 10-100) when 2 processes try to open the same database file (and execute something like 'create table foo if not exists'), one of them fails with SQLITE_BUSY — despite 1 second (or bigger) timeout. Processes themselves produce almost no DB activity; they merely start and initialize the database roughly at the same time. sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout(). Unfortunately I can't write a small program that reproduces this reliably: my test program never crashes this way (except when timeout is really small — say, 10ms). Yet, this behaviour is rare but reproducible, both from Python (apsw) on Windows, Python (apsw) on OS X and native C API on OS X (in entirely different program). Can you please point me at what can be the cause of this? Gregory ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On Tue, Apr 3, 2012 at 3:53 PM, Simon Slavinwrote: > > On 3 Apr 2012, at 9:53am, Dan Kennedy wrote: > >> As Jay says, deadlock is not possible for implicit transactions. >> SQLite will keep retrying until either your busy-handler returns >> zero (if you configured a busy-handler) or the timeout is reached >> (if you configured a timeout). It sounds like the latter in this >> case. > > You seem to have a setup where your failures can be produced on demand, > albeit at random. So you can test whether this is a timeout-related problem > but varying your timeout. Run it, and log how many failures you get. Then > multiply (or divide) your timeout setting by ten and run it again. If you > get failures at the same interval, it's not a timeout-related problem. I've managed to make *something* reproducible. Take a look at the following Python code, I hope it is self-explanatory: -- import os import sys import apsw import multiprocessing as mp BUSYTIMEOUT = 5000 NPROCS = 4 def rmdb(): try: os.remove('file.db') os.remove('file.db-wal') os.remove('file.db-shm') except: pass def child(): try: db = apsw.Connection('file.db') db.setbusytimeout(BUSYTIMEOUT) try: db.cursor().execute('PRAGMA JOURNAL_MODE=WAL') except: print >>sys.stderr, "Error with PRAGMA:", sys.exc_info()[0] db.close() except: print >>sys.stderr, "Unexpected error:", sys.exc_info()[0] if __name__ == '__main__': while True: rmdb() ps = [mp.Process(target=child) for i in xrange(NPROCS)] for p in ps: p.daemon = True for p in ps: p.start() for p in ps: p.join() -- It reliably results in the "deadlock" scenario described above; hence SQLite returns _BUSY for "PRAGMA" despite the big timeout (timeout value seems to have no effect). Can you please describe why SQLite does not retries the implicit "PRAGMA" transaction here? Gregory ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On 3 Apr 2012, at 9:53am, Dan Kennedywrote: > As Jay says, deadlock is not possible for implicit transactions. > SQLite will keep retrying until either your busy-handler returns > zero (if you configured a busy-handler) or the timeout is reached > (if you configured a timeout). It sounds like the latter in this > case. You seem to have a setup where your failures can be produced on demand, albeit at random. So you can test whether this is a timeout-related problem but varying your timeout. Run it, and log how many failures you get. Then multiply (or divide) your timeout setting by ten and run it again. If you get failures at the same interval, it's not a timeout-related problem. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On 04/03/2012 04:20 PM, Gregory Petrosyan wrote: On Tue, Apr 3, 2012 at 12:53 PM, Dan Kennedywrote: As a side note: why does not SQLite automatically retry implicit transactions after invoking busy handler? It's a race. That's what SQLITE_BUSY means. As Jay says, deadlock is not possible for implicit transactions. SQLite will keep retrying until either your busy-handler returns zero (if you configured a busy-handler) or the timeout is reached (if you configured a timeout). It sounds like the latter in this case. I don't follow the logic here: if deadlock is not possible with implicit transactions, SQLite should retry them after specified timeout, but not return _BUSY, forcing me into implementing the same retry logic, no? It continues attempting the operation until the length of time since the first attempt exceeds the configured timeout. At which point it returns SQLITE_BUSY. If you are using WAL mode, check that connections are being closed properly. If they are not, a *-wal file may be left on the disk even after all active clients have disconnected. In this case, when the next client connects it has to read and process the entire *-wal file. This is called "recovery", and the client holds an exclusive lock on the db file while it is underway. Any other clients that attempt to read or write the db see an SQLITE_BUSY. Any busy-handler (or timeout) will apply here, but if recovery takes a while then a short timeout could be exceeded. Am I correct that if e.g. I have 2 processes, A and B, using the same WAL-enabled database, then following: 1. A creates the database and starts to write data (connection is open all the time) 2. B tries to connect to the database can easily result in SQLITE_BUSY returned to either A or B? No. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On Tue, Apr 3, 2012 at 12:53 PM, Dan Kennedywrote: >> As a side note: why does not SQLite automatically retry implicit >> transactions after invoking busy handler? > > It's a race. That's what SQLITE_BUSY means. > > As Jay says, deadlock is not possible for implicit transactions. > SQLite will keep retrying until either your busy-handler returns > zero (if you configured a busy-handler) or the timeout is reached > (if you configured a timeout). It sounds like the latter in this > case. I don't follow the logic here: if deadlock is not possible with implicit transactions, SQLite should retry them after specified timeout, but not return _BUSY, forcing me into implementing the same retry logic, no? > If you are using WAL mode, check that connections are being closed > properly. If they are not, a *-wal file may be left on the disk even > after all active clients have disconnected. In this case, when the > next client connects it has to read and process the entire *-wal > file. This is called "recovery", and the client holds an exclusive > lock on the db file while it is underway. Any other clients > that attempt to read or write the db see an SQLITE_BUSY. Any > busy-handler (or timeout) will apply here, but if recovery takes a > while then a short timeout could be exceeded. Am I correct that if e.g. I have 2 processes, A and B, using the same WAL-enabled database, then following: 1. A creates the database and starts to write data (connection is open all the time) 2. B tries to connect to the database can easily result in SQLITE_BUSY returned to either A or B? Gregory ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On 04/03/2012 12:48 PM, Gregory Petrosyan wrote: On Mon, Apr 2, 2012 at 7:51 PM, Jay A. Kreibichwrote: On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the wall: Hello, I am experiancing a weird problem: sometimes (1 time in a 10-100) when 2 processes try to open the same database file (and execute something like 'create table foo if not exists'), one of them fails with SQLITE_BUSY ??? despite 1 second (or bigger) timeout. Processes themselves produce almost no DB activity; they merely start and initialize the database roughly at the same time. sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout(). Unfortunately I can't write a small program that reproduces this reliably: my test program never crashes this way (except when timeout is really small ??? say, 10ms). Yet, this behaviour is rare but reproducible, both from Python (apsw) on Windows, Python (apsw) on OS X and native C API on OS X (in entirely different program). Can you please point me at what can be the cause of this? You may be seeing a deadlock situation. This is most commonly associated with explicit transactions (that are open for a longer period of time), but it is possible with implicit transactions. If two connections attempt to write to the DB at the same time, it is possible for them to deadlock on the file locking. SQLite recognizes this and has one of the connections back off with an SQLITE_BUSY error. If this happens in an explicit transaction, the program needs to ROLLBACK the current transaction and start over. In the case of an implicit transaction around a statement, you can safely re-run the statement. Thanks a lot for the reply. Shouldn't it be extremely unlikely for this situation to happen, though? Can it be diagnosed with more certaincy? I feel uncomfortable guessing (hoping) that the bug exists due to this race/deadlock. As a side note: why does not SQLite automatically retry implicit transactions after invoking busy handler? It's a race. That's what SQLITE_BUSY means. As Jay says, deadlock is not possible for implicit transactions. SQLite will keep retrying until either your busy-handler returns zero (if you configured a busy-handler) or the timeout is reached (if you configured a timeout). It sounds like the latter in this case. If you are using WAL mode, check that connections are being closed properly. If they are not, a *-wal file may be left on the disk even after all active clients have disconnected. In this case, when the next client connects it has to read and process the entire *-wal file. This is called "recovery", and the client holds an exclusive lock on the db file while it is underway. Any other clients that attempt to read or write the db see an SQLITE_BUSY. Any busy-handler (or timeout) will apply here, but if recovery takes a while then a short timeout could be exceeded. Other than that, you probably just want to increase the busy-timeout some. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On Mon, Apr 2, 2012 at 7:51 PM, Jay A. Kreibichwrote: > On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the > wall: >> Hello, >> >> I am experiancing a weird problem: sometimes (1 time in a 10-100) when >> 2 processes try to open the same database file (and execute something >> like 'create table foo if not exists'), one of them fails with >> SQLITE_BUSY ??? despite 1 second (or bigger) timeout. >> >> Processes themselves produce almost no DB activity; they merely start >> and initialize the database roughly at the same time. >> sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout(). >> >> Unfortunately I can't write a small program that reproduces this >> reliably: my test program never crashes this way (except when timeout >> is really small ??? say, 10ms). Yet, this behaviour is rare but >> reproducible, both from Python (apsw) on Windows, Python (apsw) on OS >> X and native C API on OS X (in entirely different program). >> >> Can you please point me at what can be the cause of this? > > You may be seeing a deadlock situation. This is most commonly > associated with explicit transactions (that are open for a longer > period of time), but it is possible with implicit transactions. > > If two connections attempt to write to the DB at the same time, it is > possible for them to deadlock on the file locking. SQLite recognizes > this and has one of the connections back off with an SQLITE_BUSY error. > If this happens in an explicit transaction, the program needs to > ROLLBACK the current transaction and start over. In the case of an > implicit transaction around a statement, you can safely re-run > the statement. Thanks a lot for the reply. Shouldn't it be extremely unlikely for this situation to happen, though? Can it be diagnosed with more certaincy? I feel uncomfortable guessing (hoping) that the bug exists due to this race/deadlock. As a side note: why does not SQLite automatically retry implicit transactions after invoking busy handler? This is IMO the behaviout most people will think SQLite has (after reading the docs); if this is not true, it is almost impossible to rely on busy handler (timeout) at all, and it is difficult to understand the need for it — if I need to wrap all queries into retry/sleep on my own, why have extra strange internal timeouts in SQLite? Gregory ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
On Mon, Apr 02, 2012 at 07:40:58PM +0400, Gregory Petrosyan scratched on the wall: > Hello, > > I am experiancing a weird problem: sometimes (1 time in a 10-100) when > 2 processes try to open the same database file (and execute something > like 'create table foo if not exists'), one of them fails with > SQLITE_BUSY ??? despite 1 second (or bigger) timeout. > > Processes themselves produce almost no DB activity; they merely start > and initialize the database roughly at the same time. > sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout(). > > Unfortunately I can't write a small program that reproduces this > reliably: my test program never crashes this way (except when timeout > is really small ??? say, 10ms). Yet, this behaviour is rare but > reproducible, both from Python (apsw) on Windows, Python (apsw) on OS > X and native C API on OS X (in entirely different program). > > Can you please point me at what can be the cause of this? You may be seeing a deadlock situation. This is most commonly associated with explicit transactions (that are open for a longer period of time), but it is possible with implicit transactions. If two connections attempt to write to the DB at the same time, it is possible for them to deadlock on the file locking. SQLite recognizes this and has one of the connections back off with an SQLITE_BUSY error. If this happens in an explicit transaction, the program needs to ROLLBACK the current transaction and start over. In the case of an implicit transaction around a statement, you can safely re-run the statement. http://sqlite.org/c3ref/busy_handler.html The presence of a busy handler does not guarantee that it will be invoked when there is lock contention. If SQLite determines that invoking the busy handler could result in a deadlock, it will go ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of invoking the busy handler. Consider a scenario where one process is holding a read lock that it is trying to promote to a reserved lock and a second process is holding a reserved lock that it is trying to promote to an exclusive lock. The first process cannot proceed because it is blocked by the second and the second process cannot proceed because it is blocked by the first. If both processes invoke the busy handlers, neither will make any progress. Therefore, SQLite returns SQLITE_BUSY for the first process, hoping that this will induce the first process to release its read lock and allow the second process to proceed. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Simultaneous opening of database sometimes fails even with 1 second timeout and only 2 processes
Hello, I am experiancing a weird problem: sometimes (1 time in a 10-100) when 2 processes try to open the same database file (and execute something like 'create table foo if not exists'), one of them fails with SQLITE_BUSY — despite 1 second (or bigger) timeout. Processes themselves produce almost no DB activity; they merely start and initialize the database roughly at the same time. sqlite3_open_v2() is immediately followed by sqlite3_busy_timeout(). Unfortunately I can't write a small program that reproduces this reliably: my test program never crashes this way (except when timeout is really small — say, 10ms). Yet, this behaviour is rare but reproducible, both from Python (apsw) on Windows, Python (apsw) on OS X and native C API on OS X (in entirely different program). Can you please point me at what can be the cause of this? Gregory ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users