Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Are your updates sorted by DS? If your queries are sorted then sequential queries are more likely to hit the same db pages while searching the index, resulting in higher cache usage and fewer decompression operations. This would have less benefit if your 100k DS values of the updates are randomly distributed through the 30m available, and more of an effect if they're tightly clustered in some range. If you mainly query by DS, is that the same as saying it is your primary key? If so, have you tried to declare it so and try the table WITHOUT ROWID, and don't bother with the index? It may help since you wouldn't have to decompress both the index pages and the table pages. > On 8 Sep 2017, at 12:33 pm, Dominique Pelléwrote: > > Yue Wu wrote: > >> Hello, >> As mentioned in the subject, our goal is to improve performance regarding >> to batch sql updates. >> The update sql as follow, >> >>> UPDATE ITEM SET FIELD4 =? WHERE DS=? >> >> We run 100,000 updates in a single transaction. The zipvfs version takes >> about 20 min while uncompressed version takes about 7 min. > > Which compression algorithm do you use with zipvfs? > Try LZ4, it's is times faster than zlib at compression and > decompression, at the cost of compressing a bit less: > > https://github.com/lz4/lz4 > > Or try zstd, which can compress more than zlib and faster (especially > on 64-bit architectures), but it's not as fast as LZ4: > > http://facebook.github.io/zstd/ > > What is also your SQLite page size? > Reducing the SQLite page size probably helps to speed up > updates, since zipvfs compresses by pages. > > Dominique > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs
Yue Wuwrote: > Hello, > As mentioned in the subject, our goal is to improve performance regarding > to batch sql updates. > The update sql as follow, > >> UPDATE ITEM SET FIELD4 =? WHERE DS=? > > We run 100,000 updates in a single transaction. The zipvfs version takes > about 20 min while uncompressed version takes about 7 min. Which compression algorithm do you use with zipvfs? Try LZ4, it's is times faster than zlib at compression and decompression, at the cost of compressing a bit less: https://github.com/lz4/lz4 Or try zstd, which can compress more than zlib and faster (especially on 64-bit architectures), but it's not as fast as LZ4: http://facebook.github.io/zstd/ What is also your SQLite page size? Reducing the SQLite page size probably helps to speed up updates, since zipvfs compresses by pages. Dominique ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?
I use journal_mode=WAL and have periods of checkpoint starvation (complicated reasons) so I'm looking to prune the -wal file but in less blunt way than SQLITE_CHECKPOINT_TRUNCATE. Behaviorally I want SQLITE_CHECKPOINT_PASSIVE *and then* if required -wal content < journal_size_limit, to do the truncation a la SQLITE_CHECKPOINT_TRUNCATE SQLITE_CHECKPOINT_PASSIVE gives me the best-effort checkpointing, but in the best case I don't get -wal shrinkage. SQLITE_CHECKPOINT_TRUNCATE is more of an aggressive do-it-now-wait-if-necessary which gets me the shrinkage behavior, but with *required*-effort rather than best-effort. I'd really like both -- best-effort checkpoint AND best-effort truncate. sqlite3WalClose does exactly what I want (the key bits) sqlite3OsLock(pWal->pDbFd, SQLITE_LOCK_EXCLUSIVE) sqlite3WalCheckpoint(pWal, db, SQLITE_CHECKPOINT_PASSIVE,...) sqlite3OsFileControlHint(pWal->pDbFd, SQLITE_FCNTL_PERSIST_WAL, ) if (bPersist) { if (pWal->mxWalSize>=0) { walLimitSize(pWal, 0) } } But AFAICT this is only called when PRAGMA journal_mode changes to not WAL or the pager cache is closed when via sqlite3_close(). I'm a long running process with connection caching and associated prepared statements so torching the connection to trigger this isn't optimal. Can I indirectly get this behavior if I open then immediately close a connection? I think so. If there are no other connections to the same database. If there are any other connections, the checkpoint-on-close won't happen of course. I think I can sorta approximate this it if I disable the busy-handler, do SQLITE_CHECKPOINT_TRUNCATE, ignore Busy/Locked errors and restore the busy-handler before returning, but that's merely 'sorta' -- ugly on several levels. Which part of the sorta is the problem? If you run an SQLITE_CHECKPOINT_TRUNCATE checkpoint without a busy-handler, or with the busy-handler rigged to return 0 immediately, then it will: * attempt a best-effort checkpoint (same as PASSIVE), * if the entire wal file was checkpointed, check if any readers are still using it, * if no readers are still using it, truncate the wal file to zero bytes in size. I don't see any way to directly try to best-effort truncate the -wal file e.g. sqlite3_wal_checkpoint_v2(db, NULL, SQLITE_CHECKPOINT_PASSIVE, NULL, NULL)) sqlite3_file_control(db, NULL, SQLITE_FCNTL_TRUNCATE_WAL, NULL) Is the idea that this file-control would truncate the wal file to zero bytes in size iff it could safely do so without blocking on any other clients? Dan. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?
The docs are a little unclear => https://sqlite.org/pragma.html#pragma_journal_size_limit I need to disable autocheckpoint@close (for other reasons) so I'm looking for ways to fence the -wal file. If I want to bound a database's -wal file to <=1MB when I'm not in a transaction is it just PRAGMA journal_size_limit=1048576;? Does this affect -wal growth *during* a transaction, i.e. if I BEGIN; ...INSERT lots...; COMMIT will I successfully reach COMMIT? The journal_size_limit doesn't affect *growth* of the -wal, right? Just 'at rest' size (i.e. outside any transaction)? Is this the functional equivalent? int limit = ...get current value via sqlite3_exec(db, "PRAGMA journal_size_limit;")... sqlite3_exec(db, "PRAGMA journal_size_limit=0;") sqlite3_wal_checkpoint_v2(db, NULL, SQLITE_CHECKPOINT_PASSIVE, , ); sqlite3_exec(db, "PRAGMA journal_size_limit=" + limit + ";") It's close, but not quite equivalent. The checkpoint operation does not consider the journal_size_limit setting in wal mode. Instead, it is applied when a writer writes a transaction into the very start of the wal file. So, say you manage to checkpoint the entire wal file (because there are no old readers preventing this). The next connection to write to the database will write its transaction into the start of the wal file, overwriting old content. When the writer commits, it checks if the wal file on disk is larger than the configured "PRAGMA journal_size_limit". If it is, the writer truncates it - either to the configured limit or to the smallest possible size without truncating away part of the transaction that was just written. In other words - "PRAGMA journal_size_limit" is applied by the next COMMIT after a successful checkpoint, not by the checkpoint itself. Dan. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Seeking advice on improving batch sql update performance on zipvfs
Hello, As mentioned in the subject, our goal is to improve performance regarding to batch sql updates. The update sql as follow, > UPDATE ITEM SET FIELD4 =? WHERE DS=? We run 100,000 updates in a single transaction. The zipvfs version takes about 20 min while uncompressed version takes about 7 min. I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15;". Any suggestions is very helpful. Some background: We have an app running on Android 4.2 using zipvfs. The page size is 4096 The cache size is - 2000 Table schema for ITEM > CREATE TABLE ITEM ( > FIELD0 NUMERIC, > FIELD1 NUMERIC, > DS TEXT, > FIELD2 TEXT, > FIELD3 TEXT, > FIELD4 NUMERIC, > FIELD5 NUMERIC, > FIELD6 NUMERIC, > FIELD7 NUMERIC, > FIELD8 NUMERIC, > FIELD9 NUMERIC, > FIELD10 NUMERIC, > FIELD11 TEXT); The third column: "DS" is what we query by almost all the time. We also created index: > CREATE INDEX DS_INDEX ON ITEM(DS); There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the one shipped with Android 4.2) size of 1.39gb. Zipvfs db using zlib and aes128, which are default. Thanks -- Yue Wu | Android Developer sqlite> EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15; addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 28000 Start at 28 1 Null 0 1 200 r[1..2]=NULL 2 OpenWrite 0 3 0 13 08 root=3 iDb=0; ITEM 3 OpenWrite 1 4 0 k(2,,) 02 root=4 iDb=0; ds_index 4 Integer1516000 r[16]=15 5 Affinity 161 0 B 00 affinity(r[16]) 6 SeekGE 1 27161 00 key=r[16] 7 IdxGT 1 27161 00 key=r[16] 8 DeferredSeek 1 0 000 Move 0 to 1.rowid if needed 9 Rowid 0 2 000 r[2]=rowid 10 IsNull 2 27000 if r[2]==NULL goto 27 11 Column 0 0 300 r[3]=ITEM.FIELD0 12 Column 0 1 400 r[4]=ITEM.FIELD1 13 Column 0 2 500 r[5]=ITEM.DS 14 Column 0 3 600 r[6]=ITEM.FIELD2 15 Column 0 4 700 r[7]=ITEM.FIELD3 16 Integer108 000 r[8]=10 17 Column 0 6 900 r[9]=ITEM.FIELD5 18 Column 0 7 10 00 r[10]=ITEM.FIELD6 19 Column 0 8 11 00 r[11]=ITEM.FIELD7 20 Column 0 9 12 00 r[12]=ITEM.FIELD8 21 Column 0 1013 00 r[13]=ITEM.FIELD9 22 Column 0 1114 00 r[14]=ITEM.FIELD10 23 Column 0 1215 00 r[15]=ITEM.FIELD11 24 MakeRecord 3 1317CCBBBCCCB 00 r[17]=mkrec(r[3..15]) 25 Insert 0 172 ITEM 07 intkey=r[2] data=r[17] 26Next 1 7 100 27Halt 0 0 000 28Transaction0 1 4 0 01 usesStmtJournal=0 29Goto 0 1 000___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Geeting degrade while using multhi threading
Karthi M wrote: > "Internally SQLite uses locks to serialize calls by multiple threads." > if this is case then how can we achieve concurrency? In general, SQLite serializes accesses to the same connection object. To get higher concurrency (for reading), use multiple connections. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Geeting degrade while using multhi threading
On 8 Sep 2017, at 10:23am, Karthi Mwrote: >Thanks for the Reply.. >"Internally SQLite uses locks to serialize calls by multiple threads." > if this is case then how can we achieve concurrency? >we are using 50 threads and 1500 time it is called for 30min. When making a change (COMMIT) SQLite locks the entire database. But not for very long. Perhaps a few milliseconds. If all 50 threads are trying to write at the same time you may have problems. If just a few of them write each minute then you should be okay. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Proof that a line has been modified
On Fri, Sep 8, 2017 at 12:29 AM, Nico Williamswrote: > > Is there a solution to that? > > You have these choices: > > - hash the whole SQLite3 file and record or sign such hash values for >approved DB files (this will let you detect all changes) > See also https://sqlite.org/dbhash.html#overview --DD ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Geeting degrade while using multhi threading
HI Thanks for the Reply.. "Internally SQLite uses locks to serialize calls by multiple threads." if this is case then how can we achieve concurrency? we are using 50 threads and 1500 time it is called for 30min. At initial stage it works fine at some point of time the performance degrades. Is there any way to achieve maximum concurrency? Please suggest... Thanks Karthi M On Wed, Sep 6, 2017 at 8:53 PM, Jens Alfkewrote: > > > > On Sep 4, 2017, at 7:22 AM, Senthil Kumar Chellappan > wrote: > > > > I am using only select operation ,but it gets degreaded(response time) > when > > i calls the API thur multi threading > > Internally SQLite uses locks to serialize calls by multiple threads. If > multiple threads are trying to access it at once, that will increase > overhead since the locks have to control the access. Usually this isn't > noticeable since the overhead for acquiring a single lock is very low, but > it depends on how many calls are being made. > > You'll need to give more details of what calls you're making, on how many > threads, and how many times they're called. > > —Jens > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Proof that a line has been modified
For complete tamper-proofness, you need "something" external to the system, ie. not in the database, not in the code, and not on the server: - if you use signatures, then those signatures should be made by a secure signing service or device - if you use hashes (be it a merkle tree, a blockchain, or a DAG), then top-level hashes need to be stored (or signed) on another system (and verification be made against those) Note that simple signing of the individual rows would not guard you against deletions, and if not performed by something external, the signature would provide no date/time guarantee (ie. even if the falsifier cannot access the private key, if the attacker can change the signing system time, the attacker could trick the system into creating "historical" records, thus greatly simplifying tampering). To guard against deletions you need hashes at a higher level than the row, a blockchain or DAG (like git) being the simplest solutions, ie. the hash of a row is the hmac of (data of that row + hash of previous row), storing/signing the last row hash externally then allows verifying all previous rows. Eric On Fri, Sep 8, 2017 at 1:56 AM, Michael Stephensonwrote: > In the past, I've used the pager to secure data. This involved encrypting > the data before writing the data to disk and decrypting when loading from > disk but also optionally hashing the page and storing the hash in extra > data reserved for each page. If anyone tampered with the data, the hash > would indicate this and an error could be thrown. > > Also encrypting the page data makes it more difficult to tamper with the > data. > > Products like sqlcipher do things like this (encryption, hashing), and > it's fairly easy to see how it's done by pulling the sqlite source (not the > amalgamation) and diffing it with the sqlcipher source. > > ~Mike > > > On Sep 7, 2017, at 6:34 PM, Jens Alfke wrote: > > > > > > > >> On Sep 7, 2017, at 2:47 PM, Keith Medcalf wrote: > >> > >> Again, this is a detection for changed data and does nothing to prevent > changes being made. > > > > The OP did not require that it be impossible to make changes (which is > clearly impossible without locking down write access to the file.) He > specifically said that detection of changed data was OK: > > > >>> For security reasons, a customer wants to be sure that a database line > cannot be modified after its initial insertion (or unmodified without being > visible, with proof that the line has been modified). > > > > The procedures I described provide detection that a row has been > modified. The first one doesn't make it evident that a row has been > deleted, though the second one does. > > > > —Jens > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users