[sqlite] Order with another sequence

2015-10-10 Thread David King
> When I use ORDER BY an ? comes after a z. Is it possible to make an ? come
> after a z?
> If it is important I am using SQLite 3.8.6 and Python 3.4.1.


If you're using the Python built-in sqlite3 module, you're looking for 
create_collation 
https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.create_collation

Something like this (adapted from that example in the docs):

import sqlite3

def collate(string1, string2):
return cmp(string1, string2)

con = sqlite3.connect(":memory:")
con.create_collation("mycollation", collate)

cur = con.cursor()
cur.execute("create table test(x)")
values = [u'z', u'?']
values = map(lambda x: (x,), values)
cur.executemany("insert into test(x) values (?)", values)

cur.execute("select x from test order by x collate mycollation")
for row in cur:
print row

con.close()

That still won't sort right, but now you have the collate() function as a place 
to plug in your actual collation logic. For that, you want something like 
Locale (https://docs.python.org/2/library/locale.html) or ICU 
(https://pypi.python.org/pypi/PyICU/)

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



[sqlite] insert or increment?

2015-03-20 Thread David King
Both of the answers here are pretty good if you have a primary key/unique 
constraint on the name 

https://stackoverflow.com/questions/3647454/increment-counter-or-insert-row-in-one-statement-in-sqlite

My personal favourite is there as:

BEGIN;
  INSERT OR IGNORE INTO observations VALUES (:src, :dest, :verb, 0);
  UPDATE observations SET occurrences = occurrences + 1
WHERE src = :src
AND dest = :dest
AND verb = :verb;
COMMIT;


On 20 Mar 2015, at 11:07, Felipe Gasper  wrote:

> Hello,
> 
>   I?m looking for logic like this:
> 
> INSERT data
> ON CONFLICT oldrow.col1 += oldrow.col2
> 
> Does SQLite have anything that would make this simpler than:
> 
> 1) UPDATE
> 2) if 0 rows updated, then INSERT
> 
>   Thank you!
> 
> ??
> 
> -FG
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



[sqlite] test for bounce detection

2015-02-13 Thread David King
Also FYI:

I sent this last one to both sqlite-users at mailinglists.sqlite.org and 
sqlite-users at sqlite.org, and the one to sqlite-users at 
mailinglists.sqlite.org bounced with the same "Relay Access Denied" but the 
other one went through.



On 13 Feb 2015, at 15:39, David King  wrote:

> Maybe unrelated, but in the last few hours the list ID header changed from 
> "sqlite-users.sqlite.org" to "sqlite-users.mailinglists.sqlite.org", which 
> messes up my mail filtering.
> 
> Of course I'll fix it locally, I only mention in because it might be related 
> to whatever server config changes have been taking place.
> 
> 
> 
> On 13 Feb 2015, at 15:27, Simon Slavin  wrote:
> 
>> This is a test message.  I just sent another message to the list and it got 
>> bounced with the message "Relay access denied" so I want to see if this one 
>> made it.
>> 
>> If it does, can the listadmin see if there's anything wrong ?  Sorry to 
>> occupy your Friday night or weekend.
>> 
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 
<http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/attachments/20150213/32ff3150/attachment.pgp>


[sqlite] test for bounce detection

2015-02-13 Thread David King
Maybe unrelated, but in the last few hours the list ID header changed from 
"sqlite-users.sqlite.org" to "sqlite-users.mailinglists.sqlite.org", which 
messes up my mail filtering.

Of course I'll fix it locally, I only mention in because it might be related to 
whatever server config changes have been taking place.



On 13 Feb 2015, at 15:27, Simon Slavin  wrote:

> This is a test message.  I just sent another message to the list and it got 
> bounced with the message "Relay access denied" so I want to see if this one 
> made it.
> 
> If it does, can the listadmin see if there's anything wrong ?  Sorry to 
> occupy your Friday night or weekend.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



Re: [sqlite] Saving PDF Files in SQlite

2015-01-15 Thread David King
> How do I save PDF files in SQLIte?

Why do you want to store PDF files in sqlite? What is the "real" problem that 
that solves for you?

There is http://www.sqlite.org/sar/doc/trunk/README.md which does what you're 
literally asking, but I feel that with more context we can give you better 
direction



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Uncached performance

2015-01-08 Thread David King
> SS> Is there a difference between a key being present in your database with
> NULL value, and the key not being present ?
> Surely there is. But, actually, those 1 queries are made only to keys
> with a non-NULL value (usually less than 1 Kb, and maybe 20% are about 16
> kb).

*Probably* not related to your performance concerns (since as you say, 
searching for the key doesn't look to be your issue), but for this case you may 
consider a partial index https://www.sqlite.org/partialindex.html to reduce the 
number of seeks a little bit by reducing the depth of the index tree.


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread David King
Why are you trying to hard to avoid using the backup API? It sounds like it 
does exactly what you want



On 11 Dec 2014, at 12:36, Nick  wrote:

> 
> On 11 Dec 2014, at 10:43, Simon Slavin wrote:
> 
>> 
>> I don't know enough about the internals of SQLite to be sure, but various 
>> parts of me are concerned that this is a bad idea.  I don't know what WAL 
>> mode would be like without checkpointing but there has to be a reason for 
>> checkpointing and disabling it between backups sounds bad.
>> 
> 
> I would say the docs (https://www.sqlite.org/wal.html#how_to_checkpoint) do 
> not imply application initiated checkpoints is a bad idea.
> 
> Regards
> Nick
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full outer joins

2014-10-21 Thread David King
> I want to find the rows that dont appear in both tables. So I think I
> need a full outer join which I understand is not supported by SQLite
> I have seen this which gives an alternative, but it seems very complex
> http://en.wikipedia.org/wiki/Join_%28SQL%29#Full_outer_join
> Is there an easier way?

It looks hairy but here's what it's doing. Given tables A,B:

1. Do the regular join (all rows with matches in both A and B)
2. Find rows in A that aren't in B
3. Find rows in B that aren't in A
4. Concatenate those 3 queries together with UNION ALL

Part of the reason the wikipedia one looks complex is that it is worried about 
types (all of the CAST stuff). But you don't have to worry about types in 
sqlite, because each cell has a type rather than each column.

> and, I am sure this has been asked before, but out of interest why are
> full out joins not supported?

I can't speak for *why* it's not supported directly, but you can emulate it 
with the above type of query



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select 1 where 1 - 1;

2014-05-02 Thread David King
> Quick, without trying it out, what would you expect the following statement 
> to return:
> select 1 where 1 - 1;
> (a) one row
> (b) no row
> (c) syntax error
> For extra entertainment, try some variations:
> select 1 where 1;
> select 1 where 0;
> etc...
> Bonus points for a rationalization of any of the behaviors you see or don’t 
> see.

I'm not sure why this is surprising. sqlite happily uses 1 and 0 as booleans:

sqlite> select 1 = 1;
1
sqlite> select 1 = 100;
0

And the WHERE clause is just a boolean expression. Your expressions are simply 
"select the literal 1 where true" or "where false" which will either cause the 
value to be selected or not, just due to the WHERE clause.

Am I missing something?


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Establishing a connection and downloading tables from an SQLite server

2014-04-08 Thread David King
> I guess that using a command line interface program could work. I used it
> for a local database and it worked perfectly, but the problem is that I'm
> not sure how to do it when the original file is in a remote server. Again,
> maybe I'm missing something obvious and if I am, I'm sorry.
> I need the downloads from a remote server to be scheduled. That's why I was
> thinking about using powershell - I have used it before in scheduling
> tasks. I still found your suggestion interesting, but I can't really make
> use of it effectively.

Sqlite databases are just files, it's not like mysql or postgres where you have 
to connect and operate on a remote server. Once you have the file you can use 
the tools you've been using.

So how to schedule the task and how to download the file is entirely unrelated 
to sqlite, it's entirely a product of the voip device and environment



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT several rows

2014-04-03 Thread David King
> Wouldn't it be easier just to upgrade to SQLite 3.8.4.3?  Why do you feel
> like you need to stick with 3.7.2 which is 3.5 years old (what is that in
> internet years? 100?)

I can't speak for Roman, but the biggest thing that keeps me from using newer 
sqlite features is that I write desktop apps in Python and have to run on 
whatever version of Python happens to be installed on my target machines. It's 
python's sqlite3 module's version that I have to rely on, which I can't control 
(and is generally 3.7 or so)

I'd actually prefer to depend on APSW (a Python sqlite library that doesn't try 
to wrap sqlite in Python's database standard, so you get more direct access to 
sqlite), but there's no easy way for my app to do so because of the author's 
opinions of the standard Python ways to require packages 
(http://rogerbinns.github.io/apsw/download.html#easy-install-pip-pypi)

So unless I can control the entire compilation process of Python or of some 
external utilities, I have to use 3.7.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] HTML Tokenizer

2014-02-13 Thread David King
> New to Sqlite, anybody knows is there a HTML tokenizer for full text search,
> Or do I need to implement my own?

There isn't an HTML tokeniser. But the default tokeniser considers punctuation 
like <> to be word breaks so it may already work for you with the down side 
that things like Hello! will consider "div", "class", 
"foo", and "hello" as words. (Rather than the just "hello" that you may be 
after)


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] REINDEX - Performance increase?

2013-12-13 Thread David King
ANALYZE can help increase performance if the query planner is making bad 
guesses for you, especially if you've never run it, or have changed patterns in 
the cardinality of index columns. It's useful to do once and probably only to 
do again if you make a lot of changes to your data patterns.

VACUUM might help if it increases the amount of effective block cache available 
to you (i.e. your DB is 50gb but you only have 2gb of data) but generally I 
wouldn't expect it to unless you're on a fit/not fit in memory boundary. It 
might be a maintenance task if you need to reclaim the disk space, or if your 
data patterns are something like "do inserts over the course of a month, then 
on the first delete delete most of it" or other such wide oscillations in size.

REINDEX is in kind of the same boat, but even less so: if your index is very 
sparse across disk, it might help to improve locality in the same way. But 
generally I wouldn't expect this to be a maintenance task or anything



On 13 Dec 2013, at 16:37, veeresh kumar  wrote:

> We are looking for database maintenance commands for the Sqlite database and 
> came across 2 commands which we felt that it would improve the performance if 
> used. The commands are VACUUM and REINDEX. I came to know that VACUUM just 
> helps us to reclaim the space and does not give any performance increase. 
> Just wanted to check of REINDEX or any other Sqlite command which you suggest 
> users to run it every month? 
> Our database size varies from 30GB to 50 GB,with various operations like 
> insert/update/delete would be performed on day to day basis.Just wanted to 
> know if there is a need to maintain the sqlite database so that the 
> performance does not degrade or just leave as it is. 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Insert statement

2013-09-12 Thread David King
> What is the most efficient way to insert several records into a table which
> has a fk ref to the auto incrementing pk of another insert I need to do in the
> same statement.

Without knowing too much about your application, I'd say that it's usually fine 
to just:

1. Do the INSERT
2. Get the last_insert_rowid()
3. Do your dependent INSERT with that ID.

Usually the reason people want to combine steps #1 and #2 is that there is 
network latency in between or lock contention some other cost to separating 
them. But sqlite doesn't have that, your requests don't go over a network, it's 
all just in your process space.

Is there another reason that you want to combine these steps?



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What pragma to use to get maximum speed (at expense of safety)?

2013-04-26 Thread David King
> The idea of temporary tables in-memory is nice, but I do not know how
> to apply it.
> I see in the documentation I can use the TEMP in CREATE TABLE, but I am not
> sure of the effect.
> Does it mean that the table is created in memory and it is lost in
> sqlite3_close?



There are a few things at work here. You can have your entire database in 
memory (and is lost on close) by giving sqlite_open the filename :memory:

Additionally and unrelatedly, you can create temporary tables (CREATE TEMP 
TABLE, as you say). These are tables that disappear on close. They are 
associated with the database connection, not the database file itself. They are 
usually stored in temporary files in a different on-disk location from your 
main database file.

With the pragma temp_store set to MEMORY, the temporary tables can be entirely 
in memory, even if the rest of your database isn't. Using this, you can mix 
on-disk tables (in your regular database) and in-memory tables (temporary 
tables, when temp_store=MEMORY). That lets you easily keep some intermediate 
stuff in memory and only flush out the disk tables when you're ready. Sometimes 
(depending on your usage patterns of course) this batching can help speed 
things up.


> 
> On Fri, Apr 26, 2013 at 8:07 PM, David King  (mailto:dk...@ketralnis.com)> wrote:
> > auto_vacuum Turn off autovacuum and just run it yourself when you're idle
> > foreign_keys Turn off foreign keys checks (or just don't use foreign keys)
> > ignore_check_constraints Same
> > journal_mode OFF might actually be faster than MEMORY, but disables 
> > rollback support
> > locking_mode EXCLUSIVE can be mildly faster in some cases
> > 
> > 
> > secure_delete OFF
> > 
> > 
> > synchronous OFF as you said
> > 
> > 
> > cache_size beef up as you said. this won't always make everything faster 
> > though, since it can starve the other processes on your machine for memory 
> > even for rarely-used sqlite data when they could potentially make better 
> > use of the OS page cache.
> > temp_store set to MEMORY, this will help if you have queries that create 
> > temporary tables, even if you're not doing so yourself (e.g. unindexed 
> > group bys)
> > 
> > 
> > 
> > If you can, use an entirely :memory: database. This may not work for you, 
> > but if it does, great.
> > 
> > Since you're not using journal_mode = WAL this is moot for you, but if you 
> > were I'd say turn off wal_autocheckpoint and wal_checkpoint yourself when 
> > you're idle
> > 
> > For my somewhat-similar use case I find that writing intermediate changes 
> > to an in-memory table (insert into my_temp_table) and periodically flushing 
> > those to disk (insert into real_table select from my_temp_table; delete 
> > from my_temp_table) can help speed things up if a lot of index-updating is 
> > involved in the on-disk table.
> > 
> > Make sure you're doing all of your inserts in a transaction. inserting is 
> > pretty cheap, but committing a transaction is expensive, and if you're not 
> > in a transaction each insert is its own transaction
> > 
> > Make sure you're re-using your prepared statements
> > 
> > Play with page_size to get it right for your write patterns
> > 
> > Don't use a connection concurrently, it's doing internal locking anyway. If 
> > you must, use the shared page cache. If you're doing it from multiple 
> > processes, use WAL mode.
> > 
> > 
> > 
> > 
> > On Friday, 26 April, 2013 at 10:44, Paolo Bolzoni wrote:
> > 
> > > The subject pretty much says it all, I use sqlite3 as a way to save
> > > temporary results from a calculation.
> > > 
> > > In this context I do not care about safety of the data. If the program
> > > fails or there is a blackout I will just delete the sqlite3 file, 
> > > eventually
> > > fix the bug, and restart.
> > > 
> > > At the moment I use this pragmas:
> > > 
> > > PRAGMA synchronous = OFF;
> > > PRAGMA journal_mode = MEMORY;
> > > PRAGMA cache_size = -10240;
> > > 
> > > Is there anything other I can do to speed-up sqlite3 at expenses of
> > > safety?
> > > 
> > > Thanks,
> > > Paolo
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > 
> > 
> > 
> > 
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What pragma to use to get maximum speed (at expense of safety)?

2013-04-26 Thread David King
auto_vacuum Turn off autovacuum and just run it yourself when you're idle 
foreign_keys Turn off foreign keys checks (or just don't use foreign keys)
ignore_check_constraints Same
journal_mode OFF might actually be faster than MEMORY, but disables rollback 
support
locking_mode EXCLUSIVE can be mildly faster in some cases


secure_delete OFF


synchronous OFF as you said


cache_size beef up as you said. this won't always make everything faster 
though, since it can starve the other processes on your machine for memory even 
for rarely-used sqlite data when they could potentially make better use of the 
OS page cache.
temp_store set to MEMORY, this will help if you have queries that create 
temporary tables, even if you're not doing so yourself (e.g. unindexed group 
bys)



If you can, use an entirely :memory: database. This may not work for you, but 
if it does, great.

Since you're not using journal_mode = WAL this is moot for you, but if you were 
I'd say turn off wal_autocheckpoint and wal_checkpoint yourself when you're idle

For my somewhat-similar use case I find that writing intermediate changes to an 
in-memory table (insert into my_temp_table) and periodically flushing those to 
disk (insert into real_table select from my_temp_table; delete from 
my_temp_table) can help speed things up if a lot of index-updating is involved 
in the on-disk table.

Make sure you're doing all of your inserts in a transaction. inserting is 
pretty cheap, but committing a transaction is expensive, and if you're not in a 
transaction each insert is its own transaction

Make sure you're re-using your prepared statements

Play with page_size to get it right for your write patterns

Don't use a connection concurrently, it's doing internal locking anyway. If you 
must, use the shared page cache. If you're doing it from multiple processes, 
use WAL mode.




On Friday, 26 April, 2013 at 10:44, Paolo Bolzoni wrote:

> The subject pretty much says it all, I use sqlite3 as a way to save
> temporary results from a calculation.
> 
> In this context I do not care about safety of the data. If the program
> fails or there is a blackout I will just delete the sqlite3 file, eventually
> fix the bug, and restart.
> 
> At the moment I use this pragmas:
> 
> PRAGMA synchronous = OFF;
> PRAGMA journal_mode = MEMORY;
> PRAGMA cache_size = -10240;
> 
> Is there anything other I can do to speed-up sqlite3 at expenses of
> safety?
> 
> Thanks,
> Paolo
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite4: consistent BUS ERROR on dropping index

2013-04-25 Thread David King
Is this the right list for sqlite4 bugs?

I have sqlite4 build fde11ff78c433d66 and a largish database. When I drop one 
of the indices on it through the sqlite4 command line tool, the tool dies with 
a BUS ERROR. For this database file it reproduces 100% of the time for me. I 
have the gdb backtrace below. I can share the database but it's quite large so 
I'd rather not upload it unless someone thinks they can actually make use of it.

1.7G database.db
347M database.db-log
2.0G database.db-shm

$ gdb $(which sqlite4) --args sqlite4 databasedb/database.db "drop index 
idx_followers_main; create index idx_followers_main on followers(hash, 
next_token, count)"

This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared 
libraries .. done

(gdb) run
Starting program: /Users/dking/sync/sqlite4/sqlite4 databasedb/database.db 
drop\ index\ idx_followers_main\;\ create\ index\ idx_followers_main\ on\ 
followers\(hash,\ next_token,\ count\)
Reading symbols for shared libraries + done

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 10 at address: 0x0001f7231004
treeShmalloc (pDb=0x10088, bAlign=1052536, nByte=40, pRc=0x1f7231000) at 
lsm_tree.c:683
683 lsm_tree.c: No such file or directory.
in lsm_tree.c
(gdb) bt
#0  treeShmalloc (pDb=0x10088, bAlign=1052536, nByte=40, pRc=0x1f7231000) 
at lsm_tree.c:683
#1  0x00010004b7fe in treeShmallocZero [inlined] () at lsm_tree.c:712
#2  0x00010004b7fe in newTreeNode [inlined] () at :723
#3  0x00010004b7fe in treeInsert (pDb=0x7fff5fbfe150, pCsr=0x100100f78, 
iLeftPtr=1606410576, iTreeKey=1606410576, iRightPtr=2147450820, iSlot=3) at 
lsm_tree.c:978
#4  0x00010004b741 in treeInsert (pDb=0x7fff5fbfe1b0, pCsr=0x7fff5fbfe260, 
iLeftPtr=1606410672, iTreeKey=1606410672, iRightPtr=2147450740, iSlot=3) at 
lsm_tree.c:939
#5  0x00010004b741 in treeInsert (pDb=0x7fff5fbfe210, pCsr=0x7fff5fbfe260, 
iLeftPtr=1606410768, iTreeKey=1606410768, iRightPtr=2147450688, iSlot=3) at 
lsm_tree.c:939
#6  0x00010004c115 in treeInsertLeaf [inlined] () at :1052
#7  0x00010004c115 in treeInsertEntry (pDb=0x7fff5fbfe3e0, 
flags=1606411232, pKey=0x7fff5fbfe3e0, nKey=1606411232, pVal=0x15, 
nVal=1606411232) at lsm_tree.c:1557
#8  0x00010003bb9f in doWriteOp (pDb=0x10088, bDeleteRange=1606411312, 
pKey=0x100102a98, nKey=16, pVal=0x7fff5fbfe430, nVal=1606411312) at 
lsm_main.c:696
#9  0x000100031914 in kvlsmDelete (pKVCursor=0x1001032a0) at kvlsm.c:297
#10 0x000100030474 in sqlite4KVCursorDelete (p=0x1001032a0) at kv.c:202
#11 0x000192a5 in sqlite4VdbeExec (p=0x10088) at vdbe.c:3933
#12 0x000100064b9a in sqlite4Step [inlined] () at :383
#13 0x000100064b9a in sqlite4_step (pStmt=0x1001039f0) at vdbeapi.c:444
#14 0x0001322d in shell_exec () at ctype.h:175
#15 0x00011c32 in main (argc=3, argv=0x7fff5fbfe918) at shell.c:2892
(gdb)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite4 and sqlite3_busy_timeout

2013-04-24 Thread David King
I've solved these two (in my case sqlite3_busy_timeout isn't necessary and can 
just be removed, sqlite4_result_blob and sqlite4_result_text now take a xDel 
argument that can just be zeroed out, and sqlite4_create_collation now takes a 
callback to generate keys for the k/v storage engine that AFACT can be NULL'd 
for my use case but YMMV).

Now I believe I've stumbled on a bug in the amalgamation builder for sqlite4. 
When I generate the amalgamation for sqlite3 it includes fts3, and compiles any 
everyone's happy. When I generate the amalgamation for sqlite4 as of 
fde11ff78c433d66 it generates but fails to compile because it has a #include 
"fts3.h" line (and of course fts3.h isn't present in the project including the 
amalgamation).

Am I just compiling this wrong or is it a build bug?



On Tuesday, 23 April, 2013 at 09:18, David King wrote:

> I'm experimenting with sqlite4 by porting an application that currently runs 
> sqlite3. I'm following the surprisingly straight-forward instructions at 
> http://sqlite.org/src4/doc/trunk/www/porting.wiki 
> 
> I see that sqlite4 no longer provides sqlite4_busy_timeout, is there an 
> equivalent that I should be using? If not, what's the preferred strategy here 
> now?
> 
> Also these appear to take different numbers of arguments than they used to, 
> but aren't listed in that doc as needing to take a sqlite4_env*. Should they 
> be added to that list or is there a different change?
> 
> sqlite4_result_blob
> sqlite4_result_text
> sqlite4_create_collation
> 
> I realise that sqlite4 isn't production quality (this is just a toy that I'm 
> porting) and I'm happy to just help update docs where possible 
> 
> Thanks!
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite4 and sqlite3_busy_timeout

2013-04-23 Thread David King
I'm experimenting with sqlite4 by porting an application that currently runs 
sqlite3. I'm following the surprisingly straight-forward instructions at 
http://sqlite.org/src4/doc/trunk/www/porting.wiki 

I see that sqlite4 no longer provides sqlite4_busy_timeout, is there an 
equivalent that I should be using? If not, what's the preferred strategy here 
now?

Also these appear to take different numbers of arguments than they used to, but 
aren't listed in that doc as needing to take a sqlite4_env*. Should they be 
added to that list or is there a different change?

sqlite4_result_blob
sqlite4_result_text
sqlite4_create_collation

I realise that sqlite4 isn't production quality (this is just a toy that I'm 
porting) and I'm happy to just help update docs where possible 

Thanks!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to achieve fastest possible write performance for a strange and limited case

2013-03-29 Thread David King
In case you haven't seen this: http://www.sqlite.org/intern-v-extern-blob.html 


On Friday, 29 March, 2013 at 13:52, Jeff Archer wrote:

> On Fri, Mar 29, 2013 at 4:46 PM, Richard Hipp  (mailto:d...@sqlite.org)> wrote:
> > On Fri, Mar 29, 2013 at 4:41 PM, Michael Black  > (mailto:mdblac...@yahoo.com)> wrote:
> > 
> > > I think many people would tell you not to store your images in your
> > > database.
> > > Just store a filepath to them.
> > > That will speed things up quite a bit and even possibly prevent having to
> > > use an SSD.
> > > 
> > > With the filepath your processing apps can use file locking too if you 
> > > need
> > > it.
> > > 
> > 
> > 
> > Lots of people think this. But experiments suggest otherwise, depending on
> > the size of your images.
> > 
> > For BLOBs smaller than about 100KB, it is actually faster to read them out
> > of an SQLite database than it is out of a file. Presumably the extra
> > overhead of the open/close is what makes the difference. For BLOBs larger
> > than 100KB it is faster to store them in a file. That's for reading - I've
> > never run the experiment for writing. Note also that the 100KB is highly
> > operating-system and file-system and storage-media dependent and so your
> > mileage may vary.
> > 
> 
> 
> FWIW. My tests are definitely indicating that BLOBs in database are
> significantly faster.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org (mailto:sqlite-users@sqlite.org)
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance with journal_mode = off

2013-03-27 Thread David King
> I am populating a database with 5764 records using the exact same data set
> each time into a newly created file.
> When I use no explicit transactions (default atomic commit) it takes 17.7
> seconds.
> When I set journal_mode = off, same operation takes 5.5 seconds.
> If I do all 5764 inserts within a single transaction only 2.5 seconds.


That sounds about right, yeah. With journalling, most disk writes have to be 
done twice (once to the journal and once to the data file).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-20 Thread David King
> Actually my understanding would suggest that INSERT OR REPLACE should
> execute slower than UPDATE + INSERT (or INSERT + UPDATE).


After some experimentation, I think that they turn out to be a wash in my case. 
INSERT OR REPLACE is implemented in sqlite as INSERT OR DELETE THEN INSERT, 
which due to my access patterns puts the really common (k1, k2) pairs together 
at the end of the database (or at least with the highest row ids, which tend to 
group together), which increases the chances to share pages for those common 
pairs.

> And about journal size: imagine that you've got "lucky" and
> all those 94k rows are each in it's own page in the counters table.
> SQLite will have to save each of that pages in the journal which will
> give journal size of about 94k * 4096 ~ 400M.


Maybe it makes sense to drop my page size then all the way down to the disc's 
sector size. That reduces the changes that a given set of pairs shares pages 
but clearly that's already very low. It would at least reduce the number of 
bytes needing to be written, even if the number of seeks stays essentially the 
same.

But since it's seeks that are probably dominating the I/O, maybe it makes sense 
to increase the transaction batch size so that we're overwriting most of the DB 
on every commit rather than overwriting (say) 10%, 10% as often. Does that make 
sense?


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-13 Thread David King
> > > BTW, in case you don't do that yet your best performance will be if
> > > you prepare your UPDATE and INSERT statements only once and then do
> > > bind + step + reset in that 100k times loop.
> > >  
> >  
> > In principle I agree, but since the temporary-table version is blindingly 
> > fast up the the update-the-disk portion it's definitely not a bottleneck at 
> > this point
> >  
>  
> I was talking about your initial implementation when you did 100k times
> > update_counter(k1, k2, count=count+1, expires=now+count*1day)
> > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)
> >  
>  
> Not about your final version with one INSERT OR REPLACE. Was your
> statement about the same thing? If yes I didn't understand what you
> meant.
>  
>  


I just meant that the naïve way of making the prepared statements with python's 
sqlite3 module (which it may or may not cache, but I assume doesn't) was 
already so fast that I'm not worried about shaving a few milliseconds off of 
re-preparing the statements every time when the actual problem occurs at a 
lower level than that.

So yeah, preparing the statement once and re-binding it every time would speed 
things up, but so little that I'd rather solve the real problem of reducing the 
time taken by the disk-writes first

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-13 Thread David King
> > The logic is, "keep a given (k1, k2) pair around for one day for each
> > time it's been seen". I could calculate it when it's needed, but
> > since it's a computed value, I couldn't then have an index on it.
> 
> I'm not sure about SQLite, but in principle the query optimizer can
> often use the base table's index for a derived value. Consider
> 
> A join B on A.a = 1 + B.a
> or
> A join B on A.a < sqrt(B.a)
> 
> An index on B.a is useful to finding the values meeting the criterion. 
> 
> But perhaps you've measured this. How much faster is the updating
> process you're concerned with than the SELECT that would avoid it?


I've measured the performance gain of leaving off that column (and therefore 
index) entirely. It buys me less than my rounding error in performance for the 
updates. I only left it in the problem description for completeness.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-13 Thread David King
> BTW, in case you don't do that yet your best performance will be if
> you prepare your UPDATE and INSERT statements only once and then do
> bind + step + reset in that 100k times loop.


In principle I agree, but since the temporary-table version is blindingly fast 
up the the update-the-disk portion it's definitely not a bottleneck at this 
point


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-13 Thread David King
On Wednesday, 13 March, 2013 at 06:15, Michael Black wrote:
> You're simply missing the where clause on your update so you're updating the
> entire database every time you do an insert.
> update_counter(k1,k2 count=count+1,expires=now+count*1day) where field1=k1
> and field2=k2;
> 
> And a (k1,k2) index would help that update a lot.

That's pseudo code. In real life it looks more like 

UPDATE counters
SET count=count+1, expires=?+count*(24*60*60)
WHERE k1=? AND k2=?

As in the schema I listed, (k1,k2) is the primary key so there's already an 
index on it.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-12 Thread David King
> > At first I was just doing something like this pseducode:
> > update_counter(k1, k2, count=count+1, expires=now+count*1day)
> > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)
>  
> Assuming these 2 statements constitute each of the 10k-100k steps you
> mentioned above and all of these steps are wrapped up in BEGIN-COMMIT
> block this is probably the most efficient way of doing this. The only
> improvement could be if you are doing creates more often than updates.
> Then you can switch and do INSERT first and then UPDATE if necessary.
> It could gain you a little time.


Yeah. I even tried keeping track of how many hits/misses I had and re-ordering 
the attempted INSERT/UPDATE as appropriate. A batch of 100k of these is done in 
a single transaction

> > but was having serious performance problems that seems to be confined to 
> > those lines. So I converted ir to INSERT OR REPLACE which had no noticeable 
> > impact on performance.
> Actually my understanding would suggest that INSERT OR REPLACE should
> execute slower than UPDATE + INSERT (or INSERT + UPDATE).

[…]  
> > Convinced the problem was in my code, I decided to offload as much as 
> > possible to sqlite. Now my code looks like:
>  
> This should be much-much slower than UPDATE + INSERT.


That's unfortunate because the overall performance was about the same ±10% 
between all three approaches :(
  
> First of all in the statement above you don't gain benefit from
> uniqueness and replace about 10k rows twice.


Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM trans_counters_v 
AS c", the grouped temporary view. So it should only see any given key pair 
once before it starts doing any inserting at all

> Second with such low
> repeatability you don't gain much from doing it with such complicated
> INSERT. And about journal size: imagine that you've got "lucky" and
> all those 94k rows are each in it's own page in the counters table.
> SQLite will have to save each of that pages in the journal which will
> give journal size of about 94k * 4096 ~ 400M.


I hadn't thought about it that way, that's true. And it's probably wildly 
seeking all over the disk to do it. The reads are probably fine because the 
machine has plenty of RAM to devote to page cache, it's the random writes that 
are killing it.

> I don't think there's anything better than what you did initially.

As for the fundamental approach, I figured as much. The rearrangement into the 
giant INSERT OR REPLACE was just to prove to myself that the problem wasn't 
elsewhere in my code

For optimising it on the sqlite front, I've played with page sizes, journaling 
modes, and changing the transaction batch size without much luck. I don't have 
strong consistency requirements for e.g. power failures or OS crashes but I do 
need an application crash to not take it out so I can't just go without the 
journal altogether (which does help the problem, but isn't huge).


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient way to store counters

2013-03-12 Thread David King
> > At first I was just doing something like this pseducode:
> > update_counter(k1, k2, count=count+1, expires=now+count*1day)
> > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)
> 
> Might I suggest that instead of trying to store an ever-changing value, you 
> simply figure it out when it's needed ? I don't quite understand the logic 
> you're applying to calculate your 'expires' value but I think it could be 
> calculated with a SELECT whenever you needed it rather than being stored.


The logic is, "keep a given (k1, k2) pair around for one day for each time it's 
been seen". I could calculate it when it's needed, but since it's a computed 
value, I couldn't then have an index on it. That said, I've removed this in 
several tests and it doesn't appear have any bearing on the performance issue.

> > Now the code that inserts all of the rows into the memory table executes 
> > nearly instantly, but the big INSERT takes 15+ minutes. Meanwhile the 
> > journal (in either rollback or wal mode) balloons to over 300mb in size.
> You don't list any indexes which would help with your WHERE clauses, so I 
> suspect SQLite is having to look through all the records in 'counters' in 
> order to find the rows it needs for each COALESCE. The large size of the 
> journal is because you are replacing every row in the databases.


Maybe I wasn't clear, the  ---cut here--- bit is in addition to the existing 
schema (after all, the INSERT OR REPLACE updates it, so surely it must already 
exist).

In the INSERT OR REPLACE operation there is no WHERE clause. (k1, k2) is the 
primary key on the 'counters' table, so the INSERT OR REPLACE takes each value 
out of the temporary trans_counters_v view of the in-memory trans_counters 
temporary table and either replaces or inserts a value for each corresponding 
entry in 'counters'. AFAICT, no amount of indexing here would help. There's no 
reason to index the temporary table, since the sort for the GROUP BY is 
n*log(n), and updating the index for each individual entry would still sum to 
n*log(n). The temporary table is single-use and will have to be scanned every 
time regardless. (And if I isolate that GROUP BY operation, it's blindingly 
fast.). Here's the EXPLAIN QUERY PLAN for the INSERT OR REPLACE:

(0, 0, 0, u'SCAN TABLE trans_counters (132971 rows)')
(0, 0, 0, u'USE TEMP B-TREE FOR GROUP BY')
(0, 0, 0, u'EXECUTE CORRELATED SCALAR SUBQUERY 1')
(1, 0, 0, u'SEARCH TABLE counters USING INDEX sqlite_autoindex_counters_1 (k1=? 
AND k2=?) (~1 rows)')
(0, 0, 0, u'EXECUTE CORRELATED SCALAR SUBQUERY 2')
(2, 0, 0, u'SEARCH TABLE counters USING INDEX sqlite_autoindex_counters_1 (k1=? 
AND k2=?) (~1 rows)')





So no, it's not having to replace every entry in the 'counters' table. Also, if 
it were replacing every row in the database, then the journal would grow to 
equal the (VACUUM'd) size of the database, but it doesn't get that big. It gets 
to 300mb+, but not to the full size of 1.9gb.
 
> > So I feel like something about what I'm doing is fundamentally flawed given 
> > something about sqlite's performance model. All I want is a count of the 
> > number of times that I've seen each pair (k1, k2), is there a better way to 
> > do this without storing them all individually and grouping them later?
> 
> If you have a table with two columns k1, k2 in, and you want to count the 
> times each pair occurs, you can do it in software far faster by having this 
> index and using this SELECT
> CREATE INDEX myTable_keypair ON myTable (k1,k2)
> SELECT k1,k2 from myTable ORDER BY k1,k2
> you might even use one of the following if you know it will always return a 
> unique value for unique keys
> SELECT k1||k2 from myTable ORDER BY k1,k2
> SELECT k1||':'||k2 from myTable ORDER BY k1,k2
> Just count the unique values in your programming language as they go past. 
> Yes, you can use horrendous complication to make SQLite present a neatly 
> formatted return with the counts included, but defining that in SQL makes 
> SQLite do more work than your programming language would need to do.


The problem isn't grouping them to count them. If I evaluate the TEMPORARY VIEW 
that does the GROUP BY clause above it completes in just under a second for all 
100k items in the table it references. That part is very fast. The slow bit is 
incorporating those counts into the versions on disk.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Efficient way to store counters

2013-03-12 Thread David King
I'm trying to find an efficient way to store simple incrementing integers but 
I'm having trouble finding an efficient way to do it

My database looks like:

CREATE TABLE counters
  k1, k2,
  count, -- how many we've seen
  expires,
  PRIMARY KEY (k1, k2)
);
CREATE INDEX counters_expires_idx ON counters(expires);

It is about 1.9gb and contains ~22 million of these rows. A given transaction 
updates or creates between 10k and 100k of them.

At first I was just doing something like this pseducode:

update_counter(k1, k2, count=count+1, expires=now+count*1day)
if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)

but was having serious performance problems that seems to be confined to those 
lines. So I converted ir to INSERT OR REPLACE which had no noticeable impact on 
performance. Convinced the problem was in my code, I decided to offload as much 
as possible to sqlite. Now my code looks like:

=== cut here =

PRAGMA synchronous=OFF;
PRAGMA temp_store=MEMORY;



CREATE TEMPORARY TABLE trans_counters(k1, k2);

-- (add all of the items to that temporary table)

CREATE TEMPORARY VIEW trans_counters_v AS
SELECT k1 AS k1,
k2 AS k2,
COUNT(*) AS count
FROM trans_counters
GROUP BY (k1, k2);



INSERT OR REPLACE INTO counters
SELECT c.k1 AS k1,
c.k2 AS k2,
COALESCE((SELECT count FROM counters WHERE k1 = c.k1 AND k2 = c.k2),
0)+c.count AS count,
(COALESCE((SELECT count FROM counters WHERE k1 = c.k1 AND k2 = c.k2),
0)+c.count)*24*60*60+? AS expires
FROM trans_counters_v AS c





=== cut here =

Now the code that inserts all of the rows into the memory table executes nearly 
instantly, but the big INSERT takes 15+ minutes. Meanwhile the journal (in 
either rollback or wal mode) balloons to over 300mb in size. The temporary 
table itself is only about 1.8mb of data (102,603 rows, 94,064 unique) so where 
is all of the journal coming from?. The process takes nearly 0 CPU during this 
time, the disk becomes very active (but low throughput, reading and writing 
maybe 200k/s judging by the rate of growth of the journal), and sampling the 
process with OS X's Activity Monitor while it's busy outputs:

100% 2869 _pysqlite_query_execute (in _sqlite3.so) + 1886 [0x101945e5e]
100% 2869 pysqlite_step (in _sqlite3.so) + 47 [0x10194893f]
100% 2869 sqlite3_step (in libsqlite3.dylib) + 1883 [0x7fff8d95ca5b]
100% 2869 sqlite3VdbeExec (in libsqlite3.dylib) + 3327 [0x7fff8d95e3af]
100% 2869 sqlite3BtreeMovetoUnpacked (in libsqlite3.dylib) + 761 
[0x7fff8d97ab89]
100% 2869 moveToChild (in libsqlite3.dylib) + 146 [0x7fff8d96c872]
100% 2869 sqlite3PagerAcquire (in libsqlite3.dylib) + 194 [0x7fff8d93dc22]
100% 2869 sqlite3PcacheFetch (in libsqlite3.dylib) + 475 [0x7fff8d93e02b]
100% 2869 pagerStress (in libsqlite3.dylib) + 670 [0x7fff8d9c407e]
100% 2869 pager_write_pagelist (in libsqlite3.dylib) + 149 [0x7fff8d999a35]
100% 2869 unixWrite (in libsqlite3.dylib) + 83 [0x7fff8d98bd73]
100% 2869 pwrite (in libsystem_kernel.dylib) + 10 [0x7fff8130bab6]



That is, 2869 of 2869 samples, 100% of the time, was spent in sqlite3_step 
writing the data to disk. Further samples look basically the same with an 
occasional read-path taking up to ~10% of the time.

VACUUM ANALYZE doesn't look to have any effect. I'm running sqlite 3.7.7 on Mac 
OS X 10.7.5 via the Python sqlite3 module

So I feel like something about what I'm doing is fundamentally flawed given 
something about sqlite's performance model. All I want is a count of the number 
of times that I've seen each pair (k1, k2), is there a better way to do this 
without storing them all individually and grouping them later? (This would be 
prohibitively large.) 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users