Hello all,

I am working on a project involving the Python SQLite3 API for SQLite (a
compile cache system), and I have started running into "database locked"
problems on the Windows 7 Pro machines some of the instances will run on;
the Windows 10 instances works without any problems.

The database is configured with WAL journaling, and when deployed will
have up to 42 active connection at a time. There are three tables, one
containing a blob with information about a source file and a time stamp
(the timestamp is updated each time the entry is used, the blob may be
updated), one with a blob of binary data, and the third have the time
stamp for the corresponding entry in the second table (updated each time
the blob entry is used). The Python SQLite3 connection is started with a
timeout of 100 seconds.

The lock problem does not appear when seeding the database, but that
process involves compiling the source first, so database updates should
be less frequent.

The problem occurs when the data is (mostly) only being pulled out of the
database for reuse, followed by an update of the time stamp the rows used (to be
used when removing old entries).

Using a locally built DLL with some printfs I found that it looks like the
problem occurs when winLock/winLockFile is called from sqlite3WalClose to
obtain an exclusive lock; it seems to fail in the Exclusive lock part of
the function. The reported windows error code is 33.

For reference, the folder where the database is stored, is excluded from
AV scans, the disk drives are all local SSDs. The size of the DB is
currently about 4GB. The SQLite DLL version from Python 3.6 (can't use
3.7) is 3.21 (no change with 3.24).

I have found a few "workarounds", most of them not remotely satisfactory:

- Increase busy_timeout to at least 40 seconds, wal_autocheckpoint to
3000+ seems to work, although I have had failures in this scenario, too.
The downside is that this takes (at least) as long as seeding the
database, and 5-6 times as long as the system it is supposed to replace,
and 10(!) times as long as the corresponding tests on Windows 10 (and 7
times as long as on my own Win10 machine with half the cores).

- Use defaults for busy_timeout, wal_autocheckpoint, and use a Windows
mutex to lock access to the database during write operation. This *is*
much faster than the previous example, but that is the best that can be
said for it. It is still 10-20% slower than the old system, takes twice as
long as the Win 10 test (and 20-40% longer than tests on my own machine,
with half the cores). Trying to use multiple mutexes only took longer.

- I tried resuming the operation after a lock, that was also slower than
both the old system and the Win 10 system, and it also ran into trouble
when a repeated insert operation broke the uniqeness constraint (the
transaction had completed and been committed before the database locked
error occurred).

- Reducing number of parallel jobs to 20 on the 32 (logical) core machine
worked (24 failed almost at the end), but the reuse case still took close
to 4 times as long as the old system (and 2/3 of the time used for a
normal seeding using). OTOH, using 20 jobs on a 40 core Win 10 machine
took only twice as long as using all cores (effectively what the old
system used).

- I also tried to rebuild the database without WAL. Seeding the database
took 3 times longer than it did with WAL, and the second stage (reuse)
seems to take as long (stopped it when it had taken longer than a normal
seed, when it was just 25% finished).

(And yes, upgrading to Win 10 may be a possibility, although at least two
of the machines was not able to use Win 10 last time I tried.)

I don't know if this is a problem caused by a problem in SQLite, or if it
as limitation in Windows 7.

That halving the number of jobs takes 4 times as long on Win 7 as the old
job (which indicates that a all cores would still require double the time
of the old system), while the Win 10 version was running just doubling (as
expected) the its original time, which was half of the old system's time,
seems to indicate that there is a performance issue with SQLite on Windows 7
(and I don't see similar issues with other applications on these machines).
Additionally, since the *mutex* option actually got the run time within
shouting range of the old system for the data reuse case, my guess is that
it is _not_ a file performance problem with Windows 7. OTOH there is still a possibility that there is a Win 7 file locking performance issue, but I doubt
it.

Does anyone have any suggestions for how to fix/work around this problem
without losing too much (or any) efficiency?

Thanks in advance.
--
Sincerely,
Yngve N. Pettersen
Vivaldi Technologies AS
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to