Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Barry Smith
Are your updates sorted by DS? If your queries are sorted then sequential 
queries are more likely to hit the same db pages while searching the index, 
resulting in higher cache usage and fewer decompression operations. This would 
have less benefit if your 100k DS values of the updates are randomly 
distributed through the 30m available, and more of an effect if they're tightly 
clustered in some range.

If you mainly query by DS, is that the same as saying it is your primary key? 
If so, have you tried to declare it so and try the table WITHOUT ROWID, and 
don't bother with the index? It may help since you wouldn't have to decompress 
both the index pages and the table pages.

> On 8 Sep 2017, at 12:33 pm, Dominique Pellé  wrote:
> 
> Yue Wu  wrote:
> 
>> Hello,
>> As mentioned in the subject, our goal is to improve performance regarding
>> to batch sql updates.
>> The update sql as follow,
>> 
>>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>> 
>> We run 100,000 updates in a single transaction. The zipvfs version takes
>> about 20 min while uncompressed version takes about 7 min.
> 
> Which compression algorithm do you use with zipvfs?
> Try LZ4, it's is times faster than zlib at compression and
> decompression, at the cost of compressing a bit less:
> 
> https://github.com/lz4/lz4
> 
> Or try zstd, which can compress more than zlib and faster (especially
> on 64-bit architectures), but it's not as fast as LZ4:
> 
> http://facebook.github.io/zstd/
> 
> What is also your SQLite page size?
> Reducing the SQLite page size probably helps to speed up
> updates, since zipvfs compresses by pages.
> 
> Dominique
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Dominique Pellé
Yue Wu  wrote:

> Hello,
> As mentioned in the subject, our goal is to improve performance regarding
> to batch sql updates.
> The update sql as follow,
>
>> UPDATE ITEM SET FIELD4 =? WHERE DS=?
>
> We run 100,000 updates in a single transaction. The zipvfs version takes
> about 20 min while uncompressed version takes about 7 min.

Which compression algorithm do you use with zipvfs?
Try LZ4, it's is times faster than zlib at compression and
decompression, at the cost of compressing a bit less:

https://github.com/lz4/lz4

Or try zstd, which can compress more than zlib and faster (especially
on 64-bit architectures), but it's not as fast as LZ4:

http://facebook.github.io/zstd/

What is also your SQLite page size?
Reducing the SQLite page size probably helps to speed up
updates, since zipvfs compresses by pages.

Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?

2017-09-08 Thread Dan Kennedy

I use journal_mode=WAL and have periods of checkpoint starvation (complicated
reasons) so I'm looking to prune the -wal file but in less blunt way than
SQLITE_CHECKPOINT_TRUNCATE.
Behaviorally I want SQLITE_CHECKPOINT_PASSIVE *and then* if required -wal
content < journal_size_limit, to do the truncation a la
SQLITE_CHECKPOINT_TRUNCATE

SQLITE_CHECKPOINT_PASSIVE gives me the best-effort checkpointing, but in the
best case I don't get -wal shrinkage. SQLITE_CHECKPOINT_TRUNCATE is more of an
aggressive do-it-now-wait-if-necessary which gets me the shrinkage behavior,
but with *required*-effort rather than best-effort. I'd really like both --
best-effort checkpoint AND best-effort truncate.

sqlite3WalClose does exactly what I want (the key bits)

 sqlite3OsLock(pWal->pDbFd, SQLITE_LOCK_EXCLUSIVE)
 sqlite3WalCheckpoint(pWal, db, SQLITE_CHECKPOINT_PASSIVE,...)
 sqlite3OsFileControlHint(pWal->pDbFd, SQLITE_FCNTL_PERSIST_WAL, )
 if (bPersist) { if (pWal->mxWalSize>=0) { walLimitSize(pWal, 0) } }

But AFAICT this is only called when PRAGMA journal_mode changes to not WAL or
the pager cache is closed when via sqlite3_close(). I'm a long running process
with connection caching and associated prepared statements so torching the
connection to trigger this isn't optimal.

Can I indirectly get this behavior if I open then immediately close a
connection?


I think so. If there are no other connections to the same database. If 
there are any other connections, the checkpoint-on-close won't happen of 
course.



I think I can sorta approximate this it if I disable the busy-handler, do
SQLITE_CHECKPOINT_TRUNCATE, ignore Busy/Locked errors and restore the
busy-handler before returning, but that's merely 'sorta' -- ugly on several
levels.


Which part of the sorta is the problem? If you run an 
SQLITE_CHECKPOINT_TRUNCATE checkpoint without a busy-handler, or with 
the busy-handler rigged to return 0 immediately, then it will:


  * attempt a best-effort checkpoint (same as PASSIVE),
  * if the entire wal file was checkpointed, check if any readers are 
still using it,
  * if no readers are still using it, truncate the wal file to zero 
bytes in size.



I don't see any way to directly try to best-effort truncate the -wal file e.g.

 sqlite3_wal_checkpoint_v2(db, NULL, SQLITE_CHECKPOINT_PASSIVE, NULL, NULL))
 sqlite3_file_control(db, NULL, SQLITE_FCNTL_TRUNCATE_WAL, NULL)


Is the idea that this file-control would truncate the wal file to zero 
bytes in size iff it could safely do so without blocking on any other 
clients?


Dan.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?

2017-09-08 Thread Dan Kennedy

The docs are a little unclear =>
https://sqlite.org/pragma.html#pragma_journal_size_limit
I need to disable autocheckpoint@close (for other reasons) so I'm looking for
ways to fence the -wal file. If I want to bound a database's -wal file to <=1MB
when I'm not in a transaction is it just PRAGMA journal_size_limit=1048576;?
Does this affect -wal growth *during* a transaction, i.e. if I BEGIN; ...INSERT
lots...; COMMIT will I successfully reach COMMIT? The journal_size_limit
doesn't affect *growth* of the -wal, right? Just 'at rest' size  (i.e. outside
any transaction)?




Is this the functional equivalent?
int limit = ...get current value via sqlite3_exec(db, "PRAGMA
journal_size_limit;")...
sqlite3_exec(db, "PRAGMA journal_size_limit=0;")
sqlite3_wal_checkpoint_v2(db, NULL, SQLITE_CHECKPOINT_PASSIVE, , );
sqlite3_exec(db, "PRAGMA journal_size_limit=" + limit + ";")



It's close, but not quite equivalent.

The checkpoint operation does not consider the journal_size_limit 
setting in wal mode. Instead, it is applied when a writer writes a 
transaction into the very start of the wal file.


So, say you manage to checkpoint the entire wal file (because there are 
no old readers preventing this). The next connection to write to the 
database will write its transaction into the start of the wal file, 
overwriting old content. When the writer commits, it checks if the wal 
file on disk is larger than the configured "PRAGMA journal_size_limit". 
If it is, the writer truncates it - either to the configured limit or to 
the smallest possible size without truncating away part of the 
transaction that was just written.


In other words - "PRAGMA journal_size_limit" is applied by the next 
COMMIT after a successful checkpoint, not by the checkpoint itself.


Dan.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Seeking advice on improving batch sql update performance on zipvfs

2017-09-08 Thread Yue Wu
Hello,
As mentioned in the subject, our goal is to improve performance regarding
to batch sql updates.
The update sql as follow,

> UPDATE ITEM SET FIELD4 =? WHERE DS=?

We run 100,000 updates in a single transaction. The zipvfs version takes
about 20 min while uncompressed version takes about 7 min.
I've also attached results of "EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS
= 15;".

Any suggestions is very helpful.

Some background:
We have an app running on Android 4.2 using zipvfs.
The page size is 4096
The cache size is - 2000

Table schema for ITEM

> CREATE TABLE ITEM (
> FIELD0 NUMERIC,
> FIELD1 NUMERIC,
> DS TEXT,
> FIELD2 TEXT,
> FIELD3 TEXT,
> FIELD4 NUMERIC,
> FIELD5 NUMERIC,
> FIELD6 NUMERIC,
> FIELD7 NUMERIC,
> FIELD8 NUMERIC,
> FIELD9 NUMERIC,
> FIELD10 NUMERIC,
> FIELD11 TEXT);


The third column: "DS" is what we query by almost all the time. We also
created index:

> CREATE INDEX DS_INDEX ON ITEM(DS);


There are 30 million records. Zipvfs size of 471mb vs uncompressed db (the
one shipped with Android 4.2) size of 1.39gb.
Zipvfs db using zlib and aes128, which are default.

Thanks
-- 

Yue Wu |  Android Developer
sqlite> EXPLAIN UPDATE ITEM SET FIELD4 = 10 WHERE DS = 15;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 28000  Start at 28
1 Null   0 1 200  r[1..2]=NULL
2 OpenWrite  0 3 0 13 08  root=3 iDb=0; ITEM
3 OpenWrite  1 4 0 k(2,,) 02  root=4 iDb=0; ds_index
4 Integer1516000  r[16]=15
5 Affinity   161 0 B  00  affinity(r[16])
6 SeekGE 1 27161  00  key=r[16]
7   IdxGT  1 27161  00  key=r[16]
8   DeferredSeek   1 0 000  Move 0 to 1.rowid 
if needed
9   Rowid  0 2 000  r[2]=rowid
10  IsNull 2 27000  if r[2]==NULL goto 
27
11  Column 0 0 300  r[3]=ITEM.FIELD0
12  Column 0 1 400  r[4]=ITEM.FIELD1
13  Column 0 2 500  r[5]=ITEM.DS
14  Column 0 3 600  r[6]=ITEM.FIELD2
15  Column 0 4 700  r[7]=ITEM.FIELD3
16  Integer108 000  r[8]=10
17  Column 0 6 900  r[9]=ITEM.FIELD5
18  Column 0 7 10   00  r[10]=ITEM.FIELD6
19  Column 0 8 11   00  r[11]=ITEM.FIELD7
20  Column 0 9 12   00  r[12]=ITEM.FIELD8
21  Column 0 1013   00  r[13]=ITEM.FIELD9
22  Column 0 1114   00  r[14]=ITEM.FIELD10
23  Column 0 1215   00  r[15]=ITEM.FIELD11
24  MakeRecord 3 1317CCBBBCCCB  00  
r[17]=mkrec(r[3..15])
25  Insert 0 172 ITEM   07  intkey=r[2] 
data=r[17]
26Next   1 7 100
27Halt   0 0 000
28Transaction0 1 4 0  01  usesStmtJournal=0
29Goto   0 1 000___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Geeting degrade while using multhi threading

2017-09-08 Thread Clemens Ladisch
Karthi M wrote:
> "Internally SQLite uses locks to serialize calls by multiple threads."
> if this is case then how can we achieve concurrency?

In general, SQLite serializes accesses to the same connection object.

To get higher concurrency (for reading), use multiple connections.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Geeting degrade while using multhi threading

2017-09-08 Thread Simon Slavin


On 8 Sep 2017, at 10:23am, Karthi M  wrote:

>Thanks for the Reply..
>"Internally SQLite uses locks to serialize calls by multiple threads."
> if this is case then how can we achieve concurrency?
>we are using 50 threads and 1500 time it is called for 30min.

When making a change (COMMIT) SQLite locks the entire database.  But not for 
very long.  Perhaps a few milliseconds.  If all 50 threads are trying to write 
at the same time you may have problems.  If just a few of them write each 
minute then you should be okay.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Proof that a line has been modified

2017-09-08 Thread Dominique Devienne
On Fri, Sep 8, 2017 at 12:29 AM, Nico Williams 
wrote:

> > Is there a solution to that?
>
> You have these choices:
>
>  - hash the whole SQLite3 file and record or sign such hash values for
>approved DB files (this will let you detect all changes)
>

See also https://sqlite.org/dbhash.html#overview --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Geeting degrade while using multhi threading

2017-09-08 Thread Karthi M
HI

Thanks for the Reply..
"Internally SQLite uses locks to serialize calls by multiple threads."
if this is case then how can we achieve concurrency?
we are using 50 threads and 1500 time it is called for 30min. At
initial stage it works fine at some point of time the performance degrades.
   Is there any way  to achieve maximum concurrency? Please suggest...

Thanks
Karthi M



On Wed, Sep 6, 2017 at 8:53 PM, Jens Alfke  wrote:

>
>
> > On Sep 4, 2017, at 7:22 AM, Senthil Kumar Chellappan
>  wrote:
> >
> > I am using only select operation ,but it gets degreaded(response time)
> when
> > i calls the API thur multi threading
>
> Internally SQLite uses locks to serialize calls by multiple threads. If
> multiple threads are trying to access it at once, that will increase
> overhead since the locks have to control the access. Usually this isn't
> noticeable since the overhead for acquiring a single lock is very low, but
> it depends on how many calls are being made.
>
> You'll need to give more details of what calls you're making, on how many
> threads, and how many times they're called.
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Proof that a line has been modified

2017-09-08 Thread Eric Grange
For complete tamper-proofness, you need "something" external to the system,
ie. not in the database, not in the code, and not on the server:
- if you use signatures, then those signatures should be made by a secure
signing service or device
- if you use hashes (be it a merkle tree, a blockchain, or a DAG), then
top-level hashes need to be stored (or signed) on another system (and
verification be made against those)

Note that simple signing of the individual rows would not guard you against
deletions, and if not performed by something external, the signature would
provide no date/time guarantee (ie. even if the falsifier cannot access the
private key, if the attacker can change the signing system time, the
attacker could trick the system into creating "historical" records, thus
greatly simplifying tampering).

To guard against deletions you need hashes at a higher level than the row,
a blockchain or DAG (like git) being the simplest solutions, ie. the hash
of a row is the hmac of (data of that row + hash of previous row),
storing/signing the last row hash externally then allows verifying all
previous rows.

Eric


On Fri, Sep 8, 2017 at 1:56 AM, Michael Stephenson 
wrote:

> In the past, I've used the pager to secure data.  This involved encrypting
> the data before writing the data to disk and decrypting when loading from
> disk but also optionally hashing the page and storing the hash in extra
> data reserved for each page.  If anyone tampered with the data, the hash
> would indicate this and an error could be thrown.
>
> Also encrypting the page data makes it more difficult to tamper with the
> data.
>
> Products like sqlcipher do things like this (encryption, hashing), and
> it's fairly easy to see how it's done by pulling the sqlite source (not the
> amalgamation) and diffing it with the sqlcipher source.
>
> ~Mike
>
> > On Sep 7, 2017, at 6:34 PM, Jens Alfke  wrote:
> >
> >
> >
> >> On Sep 7, 2017, at 2:47 PM, Keith Medcalf  wrote:
> >>
> >> Again, this is a detection for changed data and does nothing to prevent
> changes being made.
> >
> > The OP did not require that it be impossible to make changes (which is
> clearly impossible without locking down write access to the file.) He
> specifically said that detection of changed data was OK:
> >
> >>> For security reasons, a customer wants to be sure that a database line
> cannot be modified after its initial insertion (or unmodified without being
> visible, with proof that the line has been modified).
> >
> > The procedures I described provide detection that a row has been
> modified.  The first one doesn't make it evident that a row has been
> deleted, though the second one does.
> >
> > —Jens
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users