Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-18 Thread James K. Lowden
On Fri, 18 Nov 2016 08:55:11 -0800
"Kevin O'Gorman"  wrote:

> All of the python code is a single thread.  The closest I come
> is a few times where I use subprocess.Popen to create what amounts to
> a pipeline, and one place where I start a number of copies of a C
> program in parallel, but each is a separate process with its own
> input and output files.  These C programs have been in use for a
> number of months for earlier stages of this project, and I regard
> them as quite reliable.  None of them uses threads, and they are
> mostly very simple filters.

As you know, a process started with Popen cannot corrupt the Python
process's memory.  If you're not doing anything to defeat the GIL, a
segfault inside the Python interpreter would be considered a bug.  

But is it happening in the interpreter, or in SQLite for that matter?
ISTM that's what you need to know.  To know that, you're going to need
to run a debug version of the interpreter under gdb.  When it faults, a
backtrace will tell you where.  That's not definititive proof; memory
corruption is often detected far from where it was caused.  But if the
fault is at a consistent place in SQLite code, for example, you can
use a hardware watchpoint to discover what's writing to it.  

I don't know what more to suggest.  I would be surprised if you find a
fault in Python, in the Python standard library, or in SQLite.  I'm
sure it won't be in anything on the other side of a popen call.  Are
there non-standard libraries or Python modules in use that you haven't
mentioned?  

The most likely culprit in my mind is RAM.  You're exercising new memory
pretty hard, running a bunch of processes at it at full tilt.  Any
defect in the chips or DMA could explain what you're seeing.  An easy
test, not necessarily cheap, would be to replace the RAM (or, if
possible, run with some removed).  

I have two war stories related to rotten I/O hardware, where the device
appeared to work for all intents and purposes, but was actually a
high-speed bit munger. Those were both over 20 years ago.  It will
be interesting to hear if that turns out to be your issue.  

HTH.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread James K. Lowden
On Wed, 16 Nov 2016 08:59:03 -0600
Paul Egli  wrote:

> > Using the systemid sequence and the recordid sequence directly
> > however, has a 0% probability of collision, so any rational person
> > would use that directly and forgo entirely the introduction of
> > uncertainty and bugs using "UUID" type crappola will cause.
> >
> 
> As Dominique said, the issue here is decentralization... 

Decentralization, you say, but not no centralization.  If the data on 
those disconnected devices never came together, their keys would never
conflict.  

We've handled this before, more than once.  Ethernet cards have unique
addresses.  The Domain Name System supports a certain amount of
wackiness, but doesn't rely on randomness.  

Of course, those systems were designed by competent engineers  

> and i would add, particularly in a disconnected environment and/or
> one with no central authority. The method you describe does not
> handle device rollbacks or cloning.

I don't see how the method you describes solves anything.  If two
devices represent the same data -- or data belonging to the same
entity, or whatever -- they need a shared identifier to represent that
ownership.  If that identifier is chosen randomly or taken from
the devices, it will not join the information.  The owner of the
information will at some point have to assert their commonality: will
have to apply a known, common identifier to both sets of information.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 crashes mysteriously on 3.6.20-1

2016-11-14 Thread James K. Lowden
On Mon, 14 Nov 2016 20:30:57 -0500
"pisymbol ."  wrote:

> One last thing: This is during initialization and I access the
> database through that query several times before hitting this crash.
> 
> I thought it was memory corruption but it always the same line.

Trying to be helpful, even if it doesn't sound like it: That doesn't
exonerate your code! 

Presumably you do something with the results of those several queries.
Probably what you do is highly deterministic, maybe identical, run upon
run.  Likely is you're just corrupting memory in the same way each
time.  Not corrupting as in "writing to random memory", but as in
"writing in a determistic way to memory you don't mean to".  That the
error is repeatable suggests it's not related to a race condition, but
where threading is concerned that's never a culprit to be dismissed.  

I would run your code under valgrind first.  If that doesn't find
anything, trap the segfault in gdb and find the basis for it, even if
it's deep in the parser.  Some offset/pointer is wrong.  Find out what
it is, and set a watchpoint on it.  If it's as determistic as you say,
I'll lay odds gdb will stop at a *very* surprising place, in your
code.  

HTH, really.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Autoincrement sequence not updated by UPDATE

2016-11-03 Thread James K. Lowden
On Tue, 1 Nov 2016 11:01:24 +
Simon Slavin  wrote:

> attempts to change a value in that column using UPDATE always
> generate an error.  I didn't know that.  I looked it up.  Apparently
> Microsoft's SQLSERVER blocks it

Blocks but does not prevent.
https://msdn.microsoft.com/en-us/library/ms188059.aspx

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Default ordering of SELECT query

2016-10-06 Thread Heather, James (ELS-LON)

> > > > Any DB update or insert or vacuum or analyze might alter the
> > > > row order produced where the ordering isn't explicit.
> I am interested only in the same ordering between two subsequent
> query executions. The things which can happen in between are:
>  * checkpointing
>  * close/open in with different journal mode: DELETE->WAL/WAL->DELETE

You can't assume that it'll give you the same ordering even if nothing
at all happens in between.

You can't assume even that two executions on exactly the same database,
under exactly the same conditions, will return in the same order.
There's no contractual requirement for the implementation to be
deterministic.

If you write code and it works today, it might not work tomorrow.
Imagine: perhaps the internals of sqlite will be rewritten tomorrow;
perhaps the query will be executed on two threads, and the results of
the two threads will be merged; perhaps they'll be merged arbitrarily
if there's no reason to return the results in a specified order. This
won't be a breaking change as far as the sqlite API is concerned; but
it'll break your code.

The only safe way to write the code is to go by the strict semantics of
the statements you're executing; i.e., trust it to fulfil its contract,
but nothing else.

James



Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this query much slower when I index the columns?

2016-10-01 Thread Heather, James (ELS-LON)
On Sat, 2016-10-01 at 13:57 +0200, Clemens Ladisch wrote:

Heather, James (ELS-LON) wrote:


I have ...



... asked this question elsewhere:
http://dba.stackexchange.com/questions/150858/why-is-this-sqlite-query-much-slower-when-i-index-the-columns




Yes, I didn't mean this to be subterfuge. It seemed an appropriate question for 
both fora.


James



Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why is this query much slower when I index the columns?

2016-10-01 Thread Heather, James (ELS-LON)


Without any indexes, SQLite is probably creating its own transient
index to use for the join.  And when SQLite creates an index, it will
be creating a single index on all three columns being joined.

But if you have your own (inferior, single-column) indexes, then
SQLite figures you know what you are doing and so it tries to use your
indexes rather than create its own.  But the single-column indexes are
far less useful at solving the problem.

Rather than creating 6 new indexes, I suggest creating just one index:

   CREATE INDEX gofast ON fakenames_uk (givenname, middleinitial, surname);

If that doesn't work.  Please run ANALYZE, then send in the output of
".fullschema" for your database and I'll look again.


Thanks very much for this.

This is odd. I had rather assumed it would be possible to speed up the 
searching using those individual indexes... in fact I know it's possible, 
because MySQL performs the same query very much faster when I add the 
individual indexes. How MySQL differs from sqlite in terms of its optimiser, 
I've no idea; but am I right that MySQL stands as an existence proof that the 
individual indexes can be used to solve the problem efficiently?



Unrelated stylistic notes:

(1) the grave accent quotes (`givenname`) is a MySQL-ism.  SQLite
understands this for compatibility, but it is not standard SQL.  No
systems other than MySQL and SQLite understand it.  Better to use
double-quotes: "givenname".


Right. In fact I didn't use any quoting when I created the indexes. When I 
grabbed the index creation statements for pasting into the email, I got 
sqlite-browser to give me the statements. I'm surprised it used backticks. I 
don't know whether that's something sqlite-browser has done for itself or not. 
(I did try all the queries from the command line, though, so I know that the 
main issue I've raised isn't a sqlite-browser one.)




(2) There exist many people in the world whose names do not parse
neatly into "givenname", "middleinitial", "surname".  And those people
tend to become grumpy when presented with a form that wants a name in
that format. :-)


Also true, but for my application, it didn't matter. I just wanted to 
demonstrate to some people in a tutorial that the queries would run much faster 
if the columns were indexed, so I grabbed some fake names etc. from a web site 
that generates them, and stuffed them into a database. The column names were 
generated by the web site. It's not used for anything real.

James



Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Why is this query much slower when I index the columns?

2016-10-01 Thread Heather, James (ELS-LON)
I have a sqlite database with two tables, each with 50,000 rows in, containing 
names of (fake) people. I've constructed a simple query to find out how many 
names there are (given name, middle initial, surname) that are common to both 
tables:


select count(*) from fakenames_uk inner join fakenames_usa on 
fakenames_uk.givenname=fakenames_usa.givenname and 
fakenames_uk.surname=fakenames_usa.surname and 
fakenames_uk.middleinitial=fakenames_usa.middleinitial;


When there are no indexes except on the primary keys (irrelevant to this 
query), it runs quickly:


[james@marlon Downloads] $ time sqlite3 generic_data_no_indexes.sqlite "select 
count(*) from fakenames_uk inner join fakenames_usa on 
fakenames_uk.givenname=fakenames_usa.givenname and 
fakenames_uk.surname=fakenames_usa.surname and 
fakenames_uk.middleinitial=fakenames_usa.middleinitial;"
131

real0m0.115s
user0m0.111s
sys 0m0.004s


But if I add indexes to the three columns on each table (six indexes in all):


CREATE INDEX `idx_uk_givenname` ON `fakenames_uk` (`givenname` )
//etc.


then it runs painfully slowly:


[james@marlon Downloads] $ time sqlite3 generic_data.sqlite "select count(*) 
from fakenames_uk inner join fakenames_usa on 
fakenames_uk.givenname=fakenames_usa.givenname and 
fakenames_uk.surname=fakenames_usa.surname and 
fakenames_uk.middleinitial=fakenames_usa.middleinitial;"
131

real1m43.102s
user0m52.397s
sys 0m50.696s


Is there any rhyme or reason to this? I'm running 3.13.0, but I've also tried 
with 3.11.0, and got the same problem.


Here's the query plan for the version with the indexes:


addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 26000  Start at 26
1 Null   0 1 100  r[1..1]=NULL
2 OpenRead   0 2 0 6  00  root=2 iDb=0; 
fakenames_uk
3 OpenRead   1 5208  0 6  00  root=5208 iDb=0; 
fakenames_usa
4 OpenRead   2 11215  0 k(2,,) 02  root=11215 iDb=0; 
idx_us_middleinitial
5 Rewind 0 19000
6   Column 0 4 200  
r[2]=fakenames_uk.middleinitial
7   SeekGE 2 182 1  00  key=r[2]
8 IdxGT  2 182 1  00  key=r[2]
9 Seek   2 0 100  Move 1 to 2.rowid
10Column 0 3 300  
r[3]=fakenames_uk.givenname
11Column 1 3 400  
r[4]=fakenames_usa.givenname
12Ne 4 173 (BINARY)   51  if r[4]!=r[3] 
goto 17
13Column 0 5 500  
r[5]=fakenames_uk.surname
14Column 1 5 600  
r[6]=fakenames_usa.surname
15Ne 6 175 (BINARY)   51  if r[6]!=r[5] 
goto 17
16AggStep0   0 0 1 count(0)   00  accum=r[1] 
step(r[0])
17  Next   2 8 100
18Next   0 6 001
19Close  0 0 000
20Close  1 0 000
21Close  2 0 000
22AggFinal   1 0 0 count(0)   00  accum=r[1] N=0
23Copy   1 7 000  r[7]=r[1]
24ResultRow  7 1 000  output=r[7]
25Halt   0 0 000
26Transaction0 0 260  01  usesStmtJournal=0
27TableLock  0 2 0 fakenames_uk   00  iDb=0 root=2 write=0
28TableLock  0 5208  0 fakenames_usa  00  iDb=0 root=5208 
write=0
29Goto   0 1 000


Thanks very much for your help!


The database is moderately large (45MB or so), but contains no private 
information, so I can provide it if it would help.


James




Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.15.0 scheduled for 2016-10-14

2016-09-29 Thread James K. Lowden
On Fri, 23 Sep 2016 16:35:07 +
Quan Yong Zhai  wrote:

> Quote <<
> A "row value" is an ordered list of two or more scalar values. In
> other words, a "row value" is a vector.>>
> 
> A ?row value? is a tuple, not a vector. When your using a tuple, you
> know how many items in it, and the type of each item of it.

That's correct, and addresses Dominique's point, albeit obliquely:
to use a row-value as a list-argument to IN would be to confuse rows
and columns.  A row-value has 1 or more columns, but only one row.

It would be nice to use row-values correctly in IN:

select * from T where (a,b) IN ( (1, 'a'), (2, 'b') )

Is that valid? 

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.15.0 scheduled for 2016-10-14

2016-09-22 Thread James K. Lowden
On Thu, 22 Sep 2016 12:43:29 -0700
Darren Duncan  wrote:

> single-element row could be done with say a trailing comma; eg
> "(42,)" 

All hail the Python tuple!  

"Tuples of two or more items are formed by comma-separated
lists of expressions. A tuple of one item (a ?singleton?) can be formed
by affixing a comma to an expression (an expression by itself does not
create a tuple, since parentheses must be usable for grouping of
expressions). An empty tuple can be formed by an empty pair of
parentheses."

https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy

--jkl


  
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Complicated join

2016-09-19 Thread James K. Lowden
On Thu, 15 Sep 2016 15:53:10 + (UTC)
David Bicking  wrote:

> (1) The CombinedKeyFields must always match in each table(2) Match
> using the EvtNbr, but if no match, use the lowest M.EvtNbr that
> matches the CombinedKeyFields
> 
> (3) Match using the TransDate but if no exact match, match on the
> M.TransDate that is less than the E.TransDate but greater than the
> prior E.TransDate

I think this is what you describe: 

select E.CombinedKeyFields, max(M.EvtNbr) as EvtNbr, max(M.TransDate)
as TransDate from E left join M
on E.CombinedKeyFields = M.CombinedKeyFields
and E.EvtNbr >= M.EvtNbr
and E.TransDate >= M.TransDate

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-19 Thread James K. Lowden
On Wed, 14 Sep 2016 18:29:36 + (UTC)
Alex Ward  wrote:

> Perhaps our schema needs a rework, would one table with a million
> rows be better than 500 tables with 2000 rows each? 

500 tables isn't right or wrong, but *counting* tables is.  Table count
is not a design-quality metric.  

There is no rule of thumb except Boyce-Codd Normal Form.  

Table design reflects the entities you choose to represent your domain
of discourse.  You should strive, as Einstein advised, to make your
model as simple as possible, but no simpler.  

Sometimes we see schemas on this list with sets of identical tables,
each set dedicated to a different client or somesuch.  Those folks buy
themselves trouble, because otherwise identical SQL has to vary by
tablename, adding nothing but complexity to the application layer.  

At the opposite end of the spectrum is the classic
entity-attribute-value design error.  One table conquers all, including
the programmers when they discover how slow things are when SQL is used
before learned.  

Number of columns?  Many widely used and correctly normalized financial
market databases maintain thousands of rows on hundreds of columns.  

Complex models can easily have a hundred tables.  Before I would agree
a design has "too many" tables, I would want an affirmative answer to
one of two questions:

1.  Is there a process in place to dynamically extend the schema?  If
so, that indicates data -- some change over time -- has found its way
into the metadata (the table names).  

2.  Do you frequently find yourself looking in more than one place for
what -- to you -- substantially the same information.  That would
indicate the tables do not reflect your mental model of the problem
domain.  

There are forces that drive intelligent table design other than the
logical model.  Things like performance and convenience inevitably
cause tables that would otherwise pass textbook muster to be split and
combined.  Those things count, too.  One thing that doesn't count is
the count.  

Regards, 

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-19 Thread James K. Lowden
On Fri, 16 Sep 2016 07:29:28 -0400
Richard Hipp  wrote:

> The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
> than a full-up "ORDER BY" because is only keeps track of the top N
> entries seen so far, discarding the rest.  But it also only uses a
> single thread.  

My immediate thought was that this is an optimization opportunity.  As
the OP alludes to, N is the sum of LIMIT and OFFSET. 

Would you have information on how these are typically used?  My guess
is that the LIMIT argument is typically small, less than 20, but that
OFFSET marches on, and grows to be a significant fraction of the
table.  

If LIMIT N is small and OFFSET is not used, a memory-efficient,
nonlocking parallel algorithm would reserve N slots for each thread,
and divide the table among the threads, each processing 1/threads
rows.  Then merge-sort their outputs.  

Humbly submitted, 

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow Sqlite3 Select Query while Insert/Update/Delete has no issue

2016-09-19 Thread James K. Lowden
On Wed, 14 Sep 2016 16:27:37 +0530
SinhaK  wrote:

> strlen(MyString.str().c_str())

BTW, as a matter of style, 

MyString.str().size()

gets you to the same place sooner.   

> MyString<<"select TokenNo,Price ,sum(QTY) from 'Stream0' where 
> TokenNo=?1 and Side=66 group by Price order by Price desc limit 5";

You should group by TokenNo, Price.  You should not have 

 'Stream0' 

in single quotes; that makes it a string.  

But I agree with Teg: SQLite is providing you with transactions you
don't need, and puts an interpreted language exactly where you don't
want it: in a performance-critical spot.  The C++ standard library has
all the bits you need, and is almost as convenient to use.  

You have only one table, and probably just a few simple queries.
std::set gives you lower_bound and upper_bound.  Hand those two
iterators to std::accumulate, and you have GROUP BY.  Call that for 5
prices.  Not very much code, and I bet 100x faster than SQL.  If more
than one thread is updating the table, obviously protect your set with
a mutex.  

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WHERE col IN tab

2016-09-19 Thread James K. Lowden
On Fri, 16 Sep 2016 16:59:17 +0200
Dominique Devienne  wrote:

> Is that <> SQL standard?

No.  

The two most frequently used pointless words in SQL are "select *".
The SELECT clause (not statement) chooses columns; in relational
algebra terms, it's a project operator.  If "all columns" is what you
need, what you don't need is projection.  Requiring "select *" makes as
much sense as requiring "WHERE TRUE" if there is no restriction.  


The strict select-from-where construct in SQL is an artifact of its
roots in IBM's 1970s-era "4th generation" languages.  That's why the
language looks so much like Cobol and so little like math.  

But it is what it is.  In SQL, a tablename is a parameter for FROM (and
nowadays, JOIN).  Predicates -- IN, EXISTS -- take SELECT.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in CREATE INDEX

2016-08-18 Thread James K. Lowden
On Mon, 8 Aug 2016 10:48:58 -0700
"Kevin O'Gorman"  wrote:

> Very cool.  But 4?  I will be running this on machines with 8 and 16
> cores.  Does going beyond 4 not help much?

Four doesn't seem like a bad starting point.  

I don't have any information specific to SQLite, but in general adding
cores to a sort helps only until I/O bandwidth is saturated (or RAM is
exhausted).  

And there's always the possibility the system may have other things to
do besides sorting.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] C API - Parameterized Atomic Transactions

2016-08-18 Thread James K. Lowden
On Tue, 9 Aug 2016 17:09:39 -0300
Paulo Roberto  wrote:

> I would like something like this:
> 
> "BEGIN EXCLUSIVE TRANSACTION;"
>  "SELECT counter FROM mytable WHERE counterid = ?;"
>  "UPDATE mytable SET counter=? WHERE counterid = ?;"
>   "COMMIT TRANSACTION;"

begin transaction;

UPDATE mytable SET counter = (
select 1 + max(counter)
from mytable where counterid = ? )
WHERE counterid = ?;

select counter - 1 as counter 
from mytable where counterid = ?;

commit transaction;

Standard SQL.  Doesn't rely on BEGIN EXCLUSIVE.  Should be just as
fast.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] REFERENCES from TEMPORARY table to main database table?

2016-07-26 Thread James K. Lowden
On Sat, 23 Jul 2016 01:06:23 +
"Smith, Randall"  wrote:

> Using "REFERENCES main.Symbols" appears to be a syntax error.  

I think you got caught by a special case.  

Any kind of DRI would be hard to enforce across database boundaries.
In general, if we have two databases, there's no requirement
they be used together, no requirement they both be attached.  Although
the DBMS could conceivably implement a rule that says any constraint
referencing a nonexistent (unattached) database yields False, I haven't
used one that works that way.  To a one, they simply prohibit
schema-qualified names in DRI rules.  (I suppose that's one
justification for the use of triggers.)  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] insert or replace on PK and UNIQUE INDEX

2016-07-15 Thread James K. Lowden
On Tue, 12 Jul 2016 15:35:20 +0200
Dominique Devienne  wrote:

> Now we know OR REPLACE is never what we want (in our use cases),

Besides being nonstandard, REPLACE is not atomic.  I haven't seen the
use case that benefits from those characteristics but, judging
from this list, you're not the first person bitten by them..  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] builtin functions and strings with embedded nul characters

2016-07-07 Thread James K. Lowden
On Mon, 4 Jul 2016 13:07:18 +0200
R Smith  wrote:

> I think you are missing an important bit in all of this - the strings
> in C is the problem, they think a Null character indicates
> termination. It has nothing to do with how SQL stores data - SQLite
> will store it with all bytes intact, but you typically retrieve or
> set it via some C calls using a C api.. and this is where the problem
> is. 

Dijkstra: On anthropomorphism in science
https://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD936.html

C doesn't have strings, and they don't think.  

C has some standard functions that by convention treat byte arrays as
strings.  The convention is to signify EOS with a NUL bytes.  Using
those functions on arrays with non-terminating NULs will probably lead
to undesired results.  

IIUC, there are some string functions in SQLite (including in the SQL
itself) that behave unpredictably if presented with strings that
include embedded NULs.  That needs no defense: it is a defect.  The
DBMS keeps (as it should) explicit lengths for all data.  Treating NUL
specially is only problematic.  Saying their behavior is undefined in
that case at least gives the user fair warning; better would be make it
defined.  

A few people in this thread mentioned something along the lines of
"SQLite data are encoded as UTF-8".  That's not true.  It does not
check that text is correctly or uniformly encoded, nor does it record
what encoding is in force for a given database.  It would be more
accurate to say that SQLite supports 4 Unicode encodings.  

The default encoding is "binary", which is to say unencoded
bytes-as-text.   If the comparison function is memcmp(3), NUL needs no
special treatment.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint violation

2016-07-01 Thread James K. Lowden
On Tue, 28 Jun 2016 19:19:43 -0700
J Decker  wrote:

> Duplication can also result as part of the - in process - moving of
> rows. To change the order of [1,2,3,4] to
> [1,3,2,4]  there is(can be) a state that is [1,2,2,4] before the
> second part that sets three back into 2.

I'd just like to point out to the OP that, while SQLite does behave in
the way described above, it's a flaw.  SQL semantics are
per-statement, not per-row.  From an SQL perspective, any "state" that
might occur within a statement is meaningless.  

In SQLite, an update to a primary key K fails

update T set K = K + 1;

if, for any value v in K, there exists v + 1.  By the rules of SQL, it
should succeed and, in most other DBMSs, does.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



Re: [sqlite] sqlite3_column_origin_name for INSERT and UPDATE

2016-06-18 Thread James K. Lowden
On Fri, 17 Jun 2016 19:59:56 +0200
Rapin Patrick <rapin.patr...@gmail.com> wrote:

> 2016-06-17 18:24 GMT+02:00 James K. Lowden <jklow...@schemamania.org>:
> 
> > You are encoding type information in the name.  If you move the type
> > information into the data, SQLite can manage the unit dimension.
> > You could use a CHECK constraint to require that speed was in m/s,
> > or a trigger to make it so.
> 
> That would indeed be an option. I suppose that you mean to use a
> string like "2.34 m/s" or a BLOB of typically 9 bytes (a 'double' and
> a unit enum).

My first instinct is to add a Units column to any table with a
measurement.  The value in Units would be a foreign key to a Units
table.  Another table, Conversions, could hold conversion factors.  

That would support multiple units for a given column in a table or,
with a trigger, enforce (and coerce) all values in a column to use a
single unit.  It would also permit arbitrary conversion in SELECT,
simply by reference to the Conversions.  

A more sophisticated DBMS would have stored procedures for inserts,
user-defined types, and INFOMATION_SCHEMA.  With those, you could
define a type domain for each measurement. You could forgo the Units
column, and instead use just a Units parameter.  The stored procedure
would look up the units for the measurement's domain, and apply the any
conversion or reject the insert.  That seems be roughly what you're
doing with your C++ wrapper.  

> But this would slightly increase the database size, and slow down
> access due to formatting / parsing.

I doubt you would notice.  

> In addition, my approach had the big advantage to being backward
> compatible: older versions of our application would just ignore the
> column unit type, and read/write regular 'double'.

If I were working with you, I would object to the requirement that the
column names encode anything.  I would want the units expressed
explicitly in the database, however they might be applied or enforced.
>From an application-programming perspective, I would prefer to treat
the column names as opaque identifiers.  Rather than parsing them for
"x_per_y", I'd prefer to look up the units by column name.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Correct, best, or generally accepted database structure for groups of things

2016-06-17 Thread James K. Lowden
On Fri, 17 Jun 2016 07:37:16 +0100
Chris Locke  wrote:

> I fail to see what any of this has to do with sqlite.  I thought this
> was a mailing list for sqlite?  Seeing queries (no pun intended) on
> sql statements is very subjective, especially with the limited data
> provided by the original poster.

A query question frequently exposes design choices, either logical or
physical.  Both of those can have great effect on the utility and
performance of the system.  Answering SQL questions helps people use
SQLite more effectively, and to understand where it differs from other
DBMSs.  

Queries occasionally provoke changes in SQLite itself, either because
the output was wrong (or unexpected), or because it presented a case
for optimization.  I have to believe that real queries from users on
this list serve to inform the developers in how SQLite is used.  (I
have never seen a homework question on this list.)  

> it won't stop there, and as soon as the original poster has another
> query 

No, it won't, because it hasn't.  I've  been hanging out here for 18
months, and I remember only one annoying trivial-query participant.
Out of 13,693 messages, that doesn't amount to much.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Correct, best, or generally accepted database structure for groups of things

2016-06-17 Thread James K. Lowden
On Thu, 16 Jun 2016 20:53:25 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> CREATE TABLE Apples (
> ID INTEGER PRIMARY KEY,
> Color TEXT COLLATE NOCASE, --Could be Red, Green, or Yellow

check Color in ( 'Red', 'Green', 'Yellow' ), -- FTFY

> Height REAL, --Measured in cm
> Width REAL --Measured in cm
> Weight REAL --Measured in grams
> );
> 
> And say I had a function that looks at the Apples table and finds
> groups of 4 apples that match in color, dimensions, and weight.

create view FourApples as
select max(ID) as ID
, Color, Height, Width, Weight
from Apples
group by Color, Height, Width, Weight
having count(*) = 4
;

Why bother with a table? 

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Trouble coding conditional UNIQUE

2016-06-17 Thread James K. Lowden
On Thu, 16 Jun 2016 23:23:33 +0200
Dominique Devienne <ddevie...@gmail.com> wrote:

> > > On Behalf Of James K. Lowden
> > >
> > >   create view vParts as
> > >   select 1 as Matched, * from Parts
> > >   UNION
> > >   select 0, * from UnmatchedParts
> >
> 
> Why UNION instead of UNION ALL here?
> Thanks to the 1 and 0, there can't be any dups, so the dedup step of
> UNION wastes cycles, no?

Well spotted, yes!  (Maybe one of these days we'll have a relational
query language without the strange "select distinct / union all"
contradiction.)  

I think you're right that, by construction, these two sets are
distinct.  I was going to suggest that perhaps SQLite could detect
that case, and skip the de-duplication step.  But I think it's a bridge
too far.  To begin with, there's no way in SQL to *declare* two tables
represent distinct sets.  If we can't even do that, how can the DBMS be
expected to deduce that any two derived sets are distinct?  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_column_origin_name for INSERT and UPDATE

2016-06-17 Thread James K. Lowden
On Fri, 17 Jun 2016 10:56:32 +0200
Rapin Patrick  wrote:

> And my C++ wrapper then knows that this column is a speed expressed
> in meters per second. So when making a SELECT on t1 table, the
> wrapper will output number objects with unit dimension of Speed
> expressed in m/s !

You are encoding type information in the name.  If you move the type
information into the data, SQLite can manage the unit dimension.  You
could use a CHECK constraint to require that speed was in m/s, or a
trigger to make it so.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Trouble coding conditional UNIQUE

2016-06-14 Thread James K. Lowden
On Tue, 14 Jun 2016 16:27:29 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> Once the part has been grouped into a set (Matched=1) it receives a
> unique permanent serial number and the temporary serial number can be
> reused, so (Model, TemporarySerialNumber) doesn't have to be unique
> anymore. Information about what parts belong to what set is stored in
> a different table.
> 
> So is using a conditional constraint in this case okay, or is there a
> better way?

It's OK, Bill; it was OK before I posted.  I'm only suggesting an
alternative you might like better.  

You pasted your particulars into my generic description, and they fit
perfectly.  You have two sets: 

1.  Parts with a unique, permanent serial number. 
2.  Parts with a temporary, reusable serial number.  

You could remove the latter set to a new table, perhaps
"UnmatchedParts", having the characteristics you want, namely a primary
key or unique constraint on (Model, TemporarySerialNumber).  Then you
don't need the Matched bolean column in either table, solving your
original problem (a conditional constraint).  Then you have a view, 

create view vParts as 
select 1 as Matched, * from Parts
UNION
select 0, * from UnmatchedParts
; 

I think there's a natural tendency to put similar things in one table
when they share common properties, sometimes by adding a discriminator
column. I've seen lots of tables like that, and designed some myself
upon a time.  SQL encourages it, because that way you can write
just one INSERT, etc., and just set the flag right.  

When you push hard on such a table, though, by trying to do things
right, you wind up with little conundrums (conundra?) like the one you
posted. They grow out of the fact that the things aren't the same.
They're just a smidgen different, so they need to be tweaked just so,
and before you know it you either have to lean on some oddball feature
of the DBMS, or punt.  If you separate them, the you might have more
query text, but each one will be simpler and easier to understand.  

HTH.  

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Apple announces new File System with better ACID support

2016-06-14 Thread James K. Lowden
On Tue, 14 Jun 2016 10:49:05 +0900
??  wrote:

> > On 13 Jun 2016, at 10:13pm, Richard Hipp  wrote:
> >
> > The rename-is-atomic assumption is so wide-spread in the Linux
> > world, that the linux kernel was modified to make renames closer to
> > being atomic on common filesystems such as EXT4.
> 
> http://man7.org/linux/man-pages/man2/rename.2.html

rename(2) *is* atomic.  That doesn't mean it's synchronous with respect
to external storage.  It only means that no two processes will ever see
the file "in flight" in two places.  If process A calls rename(N,M), at
no point will process B have acceess to both N and M.  Once M is
available, N is extinquished.  

That's a useful property for a process that succeeds, and for which the
OS successfully flushes the data to disk.  

When Richard says rename isn't atomic, he means that it's not
synchronous with respect to the disk.  It makes no guarantee that the
directory entries were updated on disk.  The rename happens in the
kernel's filesystem memory structures, which *eventually* are persisted
to disk.  I have heard that that time lag may be measured in seconds.  

> I am interested to know what it would take to make linux renames
> fully atomic. Reading it as is it feels like the action of rename
> would be the most important piece to making rename atomic.  The docs
> claim this is atomic.  What other aspects would be necessary?

To make Linux rename fully synchronous is technically infeasible and
politically impossible.  

On the political side, the preference in Linux is invariably for
performance, often at ever-finer divisions of responsibility.  As an
example, Unix fsync(2) traditionally updated both the file and its
metadata; Linux divided those into fsync and fdatasync, and added the
requirement to call fsync on the directory. What was once a single call
became 2 or 3.  

As a technical matter, it's really infeasible because there are too
many moving parts: kernel, filesystem driver, and hardware.  It is
possible for a human being to know what kind of disk is installed and
how configured, and to know the semantics of a given filesystem.  It is
not possible for the kernel to patrol all those things, and hence the
kernel cannot make any guarantees about them.  (To take an extreme
example: NFS.)  

By the way, every DBMS I know anything about (and SQLite no
exception), tends to eschew OS services except at the most minimal
level.  The internals of a DBMS carry a lot of state information
unavailable to the kernel that the DBMS uses to prioritize how memory
is used and when and where I/O is required.  That's why every DBMS has
its own logging mechnism, and some bypass the filesystem altogether.

--jkl





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Trouble coding conditional UNIQUE

2016-06-14 Thread James K. Lowden
On Mon, 13 Jun 2016 19:11:29 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> I need UNIQUE(B, C) only when E=0. 

A conditional constraint is evidence that you have two kinds of things
represented in one table: those E=0 types that are identified by {B,C},
and the rest.  They're represented in a single table because they
seem to have the same columns, although the E=0 types don't need an E
column.  

A better solution might be to separate the two types into to two
tables, each with its own constraints, and use a UNION to represent
them as one.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Managing SQLite indices.

2016-06-14 Thread James K. Lowden
On Tue, 14 Jun 2016 01:04:27 +0100
Simon Slavin  wrote:

> When your application runs fast enough not to annoy you, you're
> done.  If you're not willing to do step (1), don't bother with
> anything else.

Simon's entire post is excellent advice.  To the OP: print it, and
frame it.  

I would only add to 

> Plan your schema properly.  

the "properly" is code for BCNF or at least 3NF.  It's the gift that
keeps on giving.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE statement without FROM clause

2016-06-06 Thread James K. Lowden
On Sat, 4 Jun 2016 18:18:36 +0200
skywind mailing lists  wrote:

> At the moment I have to run something like:
> 
> UPDATE A SET item1=(SELECT B.item FROM B WHERE B.ID=A.ID),...
> itemN=... WHERE EXISTS (SELECT 1 FROM B WHERE B.ID=A.ID);
> 
> Using a FROM clause I just need one scan through B (at least in
> principle). Now, I need N+1 scans.

Nonsense.  SQL provides no instruction to the implementation on how to
organize or traverse the data.  SQLite is free to scan B once, twice,
or not at all.  

Syntax has nothing to do with performance.  A correlated subquery is an
expression of logic; it's not meant to be taken literally, and often
isn't.  This particular form "just" needs to be recognized by the
optimizer.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQL / SQLite for Beginners

2016-06-06 Thread James K. Lowden
On Thu, 26 May 2016 10:54:30 -0400
r.a.n...@gmail.com wrote:

> FWIW, since it's inception, S.Q.L has been pronounced allot like
> CICS. 

This may be more true than you know.  It's not too hard to find
old-timers who pronounce it "kicks".  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Sqlite incompatibility with Postgres

2016-05-21 Thread James K. Lowden
On Fri, 20 May 2016 14:17:25 +1000
"dandl"  wrote:

> Every aggregation function is at least second order: a function that
> applies a function to the set. So for MIN the function is 'less
> than', for SUM() the function is 'plus' and so on. In Andl
> aggregation functions are provided by fold(), which takes a function
> as an argument.

I want you to know that you hijacked my Saturday.  I was bothered about
what "first order" and "second order" mean, suspecting that we meant
different things.  After an afternoon with the Oracle of All Knowledge,
I think we were talking about different things, and you had a better
handle on your end than I did on mine.  

I was concerned that we were treading in territory outside first-order
predicate logic.  On review, as Wikipedia explains, HOL deals in
another beast, namely the quantification of sets of sets.  

You were talking about something much simpler, second-order *functions*.
The input is still a value -- an individual member of a set -- plus
another function.  As you say, there are many such in SQL.  In keeping
with the language's purpose, the primitive components are not exposed,
so it's not possible to reconstruct min as FOLD(MIN,X). We can do
similar things with subqueries, e.g.

select sum(N) from (select count(*) as N from T group by a) as A

One can imagine that restated as 

select F(sum, count, t) from T

where F is defined as taking two functions and a value.  I guess that
would make F a third-order function. 

APL is instructive in this regard.  What we usually call
operators --  + - x ?  -- are termed *functions* in APL, in keeping
with their mathematical definition.  A function that takes a function
is called an operator.  One such is "/", the reduction operator; SUM(t)
could be expressed as 

+/t

> > 2.  Limit, as currently implemented, lies outside the theory
> > because it doesn't operate on sets.
> 
> I'll put that one on hold pending a suitable reference or detailed
> mathematical treatment.

I think I can accept "first(N)" could be a set function, and if SQL
dealt in sets, LIMIT would be a deterministic function.  But SQL deals
in bags, and with a couple of odd exceptions -- random(), now() -- all
its functions are determistic.  LIMIT is not a deterministic
function.  I'm not sure what happens to first order predicate logic in
the face of nondeterminism, but I'm sure it's not good.  

> Sorry. Your nth() is a second order function 

OK. 

> The (single-pass) implementation would maintain a temporary table of
> rows that are 'minimum so far seen', to a maximum of N. It would be an
> implementers decision what to do with a row equal to one in that
> table once N has been reached: add it or ignore it?

nth() acts on a single column; it keeps the set of N smallest values, as
you say.  The answer to your question is "ignore it" because a value
equal to one in the set is already a member.  Given the input


C {1, 1, 2, 2, 2, 3}

min(C) = 1
nth(C, 1) = {1}
nth(C, 2) = {1, 2}

I'm not claiming any deep insight, only that nth() would be handy and
can be defined mathematically (even if I can't do it).  

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-19 Thread James K. Lowden
On Thu, 19 May 2016 10:29:48 +1000
"dandl"  wrote:

> > Restriction is applied to the values of the tuple.  The number of
> > tuples is not a value of the tuple.
> 
> No, I can't agree. Restriction is a membership test, a function on
> members: should this tuple be included in the result set or not?
> Cardinality of a set is a second order function on the members of the
> set, obtainable simply by examining all the tuples in the set at the
> same time. There is no a priori reason not to use cardinality in a
> membership functions.

That's an interesting perspective.  If you're dealing with genuine
sets, and you define your language in terms of second-order operations,
then something like LIMIT could be included.  Would have to be, I
guess.  

But that's not what SQL is, or what LIMIT is. 

You were rather dismissive of my nth() function, but that approximates
what LIMIT does (approximation is all that's possible) with a 
first-order operation.  

BTW, I still think you're agreeing with me.  I'm insisting on using the
"values of the tuple", implicitly restricted to first-order
operations.  Cardinality, as you say, as a second order *function*,
hardly a "value".  But at least I understand your argument now.  

> To that you can successively add negation, recursion, higher order
> functions and fixpoint/while. Each of those allows operations that
> others do not, but there is disagreement about which should be
> considered 'relational'. 

OK, I see.  It's fitting that the debate is about the definition of the
set of relational operators.  

I'm conservative in that regard.  I'm wary of the complexity that
higher-order operations bring to the language.  Each higher level
brings (I suspect) more complexity than the prior, while solving fewer
new problems.  

I think recursion is a good extension, and a good example.  It permits
the expression of hierarchies.  It's indispensable ... for maybe 1% of
queries.  

I guess you could convince me it makes SQL Turing Complete, but that's
a very dense thicket.  Recursive structures are useful.  If they could
be manipulated without making the language Turing Compiete, I'd
consider that a plus.  

> Thank you for the reference -- I didn't have that one. I'm familiar
> with the material.

You're welcome, and it shows.  

I think we've managed to hash out some agreement:

1.  Second order functions are "relational", or can be, depending on
one's definition.  We have support for them already in SQL.  

2.  Limit, as currently implemented, lies outside the theory because it
doesn't operate on sets.  

Regards, 

--jkl




[sqlite] foreign_key_check mystery

2016-05-18 Thread James K. Lowden
On Wed, 18 May 2016 19:06:30 +0200
R Smith  wrote:

> > I'm not convinced the requirement that the referenced columns be
> > unique is justified
> 
> How do you see a parent-child relationship possible where the parent
> is not Unique?

I think I can convince you that uniqueness is a good rule of thumb, but
that enforcing it ahead of RI is undesirable.  But the price of making
me think about it is reading a long answer.  

The referenced table represents the domain of the foreign key
relationship.  When we say 

foreign key R(A) references S(B)

we're saying every value in A appears in B.  We're not saying anything
about B: not its type, not its cardinality.  The statement describes A
only.  

We're also saying something logical, not enforced by the DBMS: that R
is a part of S.  S can exist without R, but R without S is
meaningless.  But that's a modelling question, and I can't think of
another SQLite feature that enforces any aspect of database design.
Why start here of all places?  Was that even the intention?  

So lets's look at my table T and its FK

   , foreign key (SegName, DbdName) references 
Segm(Name, DbdName)

as it happens, Segm was defined with

, Nametext not NULL primary key
, DbdName text not NULL

One day, Segm may have a 2-column PK: primary key (Name, DbdName).
But today, Name uniquely identifies it.  (Note that therefore {Name,
DbdName} also uniquely identifies it!)  T extends Segm, and requires
that its {SegName, DbdName} pair appear in Segm.  

Let's assert that's *correct*, even though Segm.Name is unique today.
What is *wrong* with saying the FK relationship refers to more columns
than are in the domain table's PK? After all, the above assertions are
still true: 

1.  T{SegName, DbdName} must be in Segm{Name, DbdName}
2.  T extends Segm

Even more -- though not required IMO -- Segm{Name, DbdName}is unique
(because Segm{Name} is unique). 

You could probably get me to agree that the relationship is anomalous.
I suppose if Segm.Name is unique, the FK should refer only to it.

In general, though, not every domain is manifested in a table.  One
might have these PKs: 

S {A, B}
R {B}

Now let me assert that R extends S: that is, for any S there could be
an R.  The rule: If an R exists for S, there is only one,
regardless of A.  

If that's logically incoherent, I don't understand why.  

Remember, there could be a missing domain table, say, T {B}, and the
real rule would be that for some T there must be an R.  But T is
missing because it has no non-key attributes, and S serves in its
stead.  

That's where "not sure justified" comes from.  foreign_key_check
nudges the user in the right direction most of the time, and as a
linter I have no argument with it.  However, as implemented, "foreign
key mismatch" prevents reporting of a genune error, namely "FOREIGN KEY
constraint failed".  By my lights that's putting a design
recommendation before a data error, definitely cart before horse.  

I hope that one day FK enforcement becomes a property of the database,
not of the connection.  If that comes to pass, this issue needs careful
consideration.  As things stand, I think it might be better if "foreign
key mismatch" were demoted to a warning.  

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-18 Thread James K. Lowden
On Wed, 18 May 2016 20:29:26 +1000
"dandl"  wrote:

> > 2.  Otherwise, if exactly the number of specified rows must be
> > returned without other restrictions, then the result is possibly
> > indeterminate.
> 
> I agree, with one tiny tweak. The SQL standard already notes that
> certain queries of this kind are "implementation-dependent". Here is
> an example.
> 
> "If the  does not contain an , or
> contains an  that [...]

Anything implementation-dependent depends on the implementation.  Ergo,
it is not mathematicaly defined.  It has no theoretical basis.  

Darren Duncan  wrote:

> The options with point 1 are not only deterministic but fully
> relational.

Darren, I agree one could define LIMIT to be deterministic.  But, as
you know, deterministic does not imply relational.  

The point I've made consistently is that the input to LIMIT is not a
value in the database.  As such, it couldn't be a relational operator,
and it's no surprise it appears nowhere in the literature.  Crowning it
"fully relational" is stuff and nonsense.  

I personally don't see any value in making it deterministic.  The best
use of LIMIT is to control command-line output while inspecting the
data.  Pretty much every other use invites error.  Deterministic error
doesn't look like much of an improvement to me.  

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-18 Thread James K. Lowden
On Wed, 18 May 2016 10:41:21 +1000
"dandl"  wrote:

> > You lost me at "subset S of N tuples".  Which relational operator
> > takes N as an argument?
> 
> Restriction determines whether a tuple should be included or not; you
> also need cardinality and less than (for comparing members).

Restriction is applied to the values of the tuple.  The number of
tuples is not a value of the tuple.  

Neither of us is stupid, David.  I've boiled this down to something
very simple.  If you look at it algebraically, I think you'll come to
the same conclusion I have.  

I wouldn't persist except that you're worth convincing.  Andl holds
promise, and seeks higher ground than SQL holds.  Insofar as possible,
if I can I want to help you get it right.  

> For this query: calculate the average of that set of numbers after
> excluding the 2 largest and 2 smallest values. Again, a pure set
> operation.
> 
> A reasonable solution would be to use two subqueries with ORDER BY
> ASC/DESC and LIMIT 2, followed by an aggregation. 

Sadly, no.  If we're talking about a "pure set operation", and the set
is {1, 1, 2}, the "two smallest" is {1, 2} but LIMIT 2 would 
yield {1, 1}.  

Here again, my putative nth() function *does* give the right answer,
simply because it's a function of the values, and not of the number of
values.  

> > There's no debate about the relational operators.

By which I meant: there's no debate about what they do.  

> You might be surprised to learn that there is considerable academic
> uncertainty as to exactly which operators should be included. 

There's no uncertainty.  Some operators are defined in terms of
others.  No suprise: even under De Morgan you don't need OR if you have
NOT and AND.  The redundancy makes the language more expressive.  

LIMIT doesn't belong in this part of the discussion, btw, because it is
not defined relationally.  

> For example, is CTE RECURSIVE relational, or not? 

http://wiki.epfl.ch/provenance2011/documents/foundations%20of%20databases-abiteboul-1995.pdf

Cf. Chapter 14.  Adding recursion changes the language.  It adds power;
if memory serves permits answering second-order queries.  

> What about LEAD and LAG in the windowing functions? What about string
> concatenation as an aggregation operator?

AFAIK there's no debate about those, either.  They can be expressed in
terms of simpler operations, and exist for convenience, such as it is.  

--jkl


[sqlite] foreign_key_check mystery

2016-05-18 Thread James K. Lowden
On Wed, 18 May 2016 08:32:24 +0200
Clemens Ladisch  wrote:

> You get "foreign key mismatch" if you do not have the required
> indexes, i.e., according to a comment in the source,
> 1) The named parent key columns do not exist, or
> 2) The named parent key columns do exist, but are not subject to a
>UNIQUE or PRIMARY KEY constraint, or

Thank you, Clemens.  The FK declaration was 

   , foreign key (SegName, DbdName) references 
Segm(Name, DbdName)

but the Segm PK was only Name.  

I'm not convinced the requirement that the referenced columns be unique
is justified, but at least now I understand the message.  In this case
it uncovered an error, so I got two for the price of one.  :-)

--jkl



[sqlite] foreign_key_check mystery

2016-05-17 Thread James K. Lowden
I seem to be getting a foreign key check anomaly.  I've checked the constraint 
mentioned in the error message (and the other one, just in case).  Am I 
overlooking something, or has this been fixed since 3.8.4.1?  

sqlite> pragma foreign_key_check;
Error: foreign key mismatch - "Field" referencing "Segm"

sqlite> pragma foreign_key_list(Field);
id  seq table   fromto  on_update   
on_delete   match 
--  --  --  --  --  --  
--  --
0   0   SegmSegName NameNO ACTION   NO 
ACTION   NONE  
0   1   SegmDbdName DbdName NO ACTION   NO 
ACTION   NONE  
1   0   Datatypes   DatatypeNameNO ACTION   NO 
ACTION   NONE  

sqlite> select count(*) from Field as f 
where not exists (
  select 1 from Segm 
  where Name = f.SegName 
  and DbdName = f.DbdName);
count(*)  
--
0 

In case it's interesting, the FK declaration in the CREATE TABLE for Field is   
  

   , foreign key (SegName, DbdName) references
Segm(Name, DbdName)

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-17 Thread James K. Lowden
On Tue, 17 May 2016 11:09:53 +1000
"dandl"  wrote:

> Any disagreement so far?

Full agreement; your description is perfectly sound.  

I am quite certain nevertheless that LIMIT has no relational basis.
Nothing based on Order By could.  And I'll try to clear up what I meant
by a cursor.  

> So the "3" is a perfectly valid argument for a set-oriented theory:
> find a subset S of N tuples with the following test for set
> membership: that each member of S is greater than each member not in
> S when compared by certain attributes, for N = 3. Pure set logic with
> a membership function.

You lost me at "subset S of N tuples".  Which relational operator takes
N as an argument?  

You could be right vis a vis set theory.  But strictly within
relational theory, I'll cede your point when you demonstrate it
relationally.  The N in "N tuples" is not to be found in the relation's
extension. Even if we include aggregation, all relational functions
operate on the *values* of the set, or functions of the values of the
set (e.g. min() or avg()).  N is not among them.  

> > "Order by 1" is always valid.
> 
> By analogy, not because they're the same. In order to apply LIMIT 3
> the query parser should require a test of set membership that is fully
> determined for every member. It can do that by either requiring all
> select list columns to appear in the ORDER BY, or by applying other
> constraints such as a unique key. 

Unless your point is constrained to the LIMIT operator, you're making a
mistake here.  Normally the Select list is a superset of the Order By
list.  If they need to be equal for LIMIT to work, that's just one more
nail in LIMIT's coffin, a byproduct of ts nonrelationality.  

Given a table T {A,B}, what's wrong with "select A from T
order by B" (even though that's not valid SQL)? The system has access
to the full table, can sort by B and project A. Nothing to do with
indexes or constraints.  The query is invalid not because it couldn't
be executed, but because it's undefined: there's no "sort" operator to
apply before "project".  Order By, not being a relational function, is
a filter applied to the relational product.  

( Yes, "filter"; filters don't alway remove: 
$ echo hello | rev
olleh
)

All members of the Order By list must appear in the Select list because
the Select list is input to Order By, see next.  

> > > Order By just produces a cursor for convenient traversal of the
> > > results.
> 
> Not so. In standard SQL ORDER BY establishes a comparison function
> between tuples and is part of the DECLARE CURSOR syntax, but the
> cursor exists regardless.

Regarding Order By and cursors, I'm referencing CJ Date.  The reason
Order By cannot appear in a subquery is that its output is not a table,
but a cursor.  Whether Order By "establishes a comparison function
between tuples" is irrelevant; so too does Where.  It's the product
that's different.  

If you accept that Order By can appear only in the outermost query,
it's clear that it has no relational role at all.  It affects only the
order in which the rows are returned to the caller.  Other than syntax,
how is that different from a cursor?  

> The point of LIMIT is that it is a complete query; the rows can
> be returned in a single network round trip; the result set can be
> discarded.

So it's just a tiny communication optimization?  After all, compared to
a network round trip, sorting the result (in order to apply LIMIT
rationally) is usually far more expensive.  I bet no study has ever
shown LIMIT to improve performance measurably, not that that would
justify its existence.  

> > LIMIT causes harm.  Bad results come of bad math.
> 
> Disagree. The problem (if there is one) is that it is not
> well-defined.

Then I think you mean you agree!  Because LIMIT is nonrelational, it's
*undefined*.  We have a long thread here that might be titled "what
should LIMIT do?"  There's no debate about the relational operators.
It's only the ad hoc add-ons that present problems, precisely because
they lie outside the theory and provide (unwittingly) idiosyncratic
behavior.  Like NULL, LIMIT is unspecified by the theory.  Like NULL,
LIMIT is hard to get "right" because each implementation has to make
its own decision about what it means, instead of relying on the math.  

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-17 Thread James K. Lowden
On Tue, 17 May 2016 11:09:53 +1000
"dandl"  wrote:

> > I'll invent here and now to replace LIMIT:  nth().  
> 
> The issue is find the "top N". This does not solve the problem.

nth() does find "top N".  For any query, nth(c, N) returns N rows.  It
also exposes the arbitrariness of LIMIT.  To use nth() correctly for
Simon's data

select * from T as out where exists ( 
select 1 from T
where out.first = first and out.second = second
group by second
having nth(second, 2) <= out.second
);

produces 4 rows.  Deterministically.  Unambiguously.  

The "problem" is to produce 3 rows where, relationally, the only
answers have 2 or 4 rows.  There is no right answer to the problem
because there is no answer to the problem.  

--jkl


[sqlite] Sqlite incompatibility with Postgres

2016-05-16 Thread James K. Lowden
On Mon, 16 May 2016 16:17:35 +1000
"dandl"  wrote:

> > > All true.  But it brings up a question.  Suppose the following:
> > >
> > > first second
> > > - --
> > > Mark  Spark
> > > Emily Spark
> > > Mary  Soper
> > > Brian Soper
> > >
> > > SELECT first,second FROM members ORDER BY second LIMIT 3

First, hat tip to Simon for providing a motivating example.  :-)  

The question illustrates what I mean when I say Limit is not "rooted in
the data": in this case, "3" is not in the data, and is not a function
of the data.  Having introduced an extraneous arbitrary element,
ambituity and contradiction are inevitable.  It's practically the
definition of a hack, right?  Does the job, albeit incorrectly.  

> I would say that this is an invalid query. As already applies for
> DISTINCT and GROUP BY, the query parser should require that every
> column in the column list should appear in the ORDER BY list. If it
> does not, then the result is indeterminate.

Order By does not requre Group By, and the Select list is a *superset*
of the Order By list.  I'm not sure where you got the notion that the
the Select and Order By sets are equal.  "Order by 1" is always valid.  

David, permit me to elaborate on my indictment of LIMIT.  You said
earlier: 

> You can't sort the relation, but you can certainly apply an order when
> performing a query. How else would MIN() work?

I'm not disputing that.  Window functions even require multiple sorts
in the same query.  

Whether or not "LIMIT is perfectly relational", we do know relational
algebra has no Sort operator, and that Order By is never part of an
input to a relational operation (because of course relatational operands
have no order). Order By just produces a cursor for convenient traversal
of the results.  

I'd be perfectly fine with a function I'll invent here and now to
replace LIMIT:  nth().  It's a generalization of min(); the
construction nth(C, 1) is equivalent to min(C).   You use it this way: 

SELECT first,second 
FROM members
where second < nth(second, 2)

That query is based in the data.  It's unambiguous.  Given Simon's
input, it produces 2 rows; with "< 3" it produces 4 rows.  It can be
used without Order By (for the same reason min() can).  While it
*implies* a sort, it doesn't require one (because indexes), as LIMIT
does. And, like min() and unlike Order By, it can be used in a
subquery.  

LIMIT is a hack.  It's an "obvious" extension to SQL, so simple it
needn't even be part of it, because the program reading the rows from
the DBMS can always stop wherever it wants.  Simple things are always
implemented freely -- even if unnecessary or misbegotten, simply
because they're easy to do and understand -- and LIMIT was
no exception.   Ironically, though, seemingly simple things are very
hard, sometimes impossible, to explain mathematically.  In that way,
LIMIT shelters under the same roof as NULL and SQL's use of bags
instead of sets.  

While that's an abstract argument, it's at the root of very practical
problems.  LIMIT is a FAQ on this mailing list.  Given the number of
SQLite programmers, we can bet every day someone uses limit in a
subquery, getting or concealing a nondeterministic result.  Which is to
say: LIMIT causes harm.  Bad results come of bad math.  

--jkl


[sqlite] Any way to do inexpensive SQLite database/table versioning? (I.e. cheap snapshotting & jumping between)

2016-05-16 Thread James K. Lowden
On Sun, 15 May 2016 10:42:37 -0500
mikeegg1  wrote:

> I was once told of an idea (decades ago) of versioning data within a
> table where one column has a real/float value that is the version
> number. 

You can have a point-in-time database if: 

*  each transaction has an id
*  DELETE is redefined not remove but simply to mark the row 
*  UPDATE is redefined as DELETE + INSERT
*  every table includes both a transaction id and a deleted flag

Every query would need "where transaction_id <= N and is_deleted = 'N'".

There's your structure.  Just a small matter of programming.  ;-)  

--jkl


[sqlite] 64bit DLL vs 32bit

2016-05-14 Thread James K. Lowden
On Thu, 12 May 2016 00:36:31 +1000
"dandl"  wrote:

> But I think if you compile code for the x64 processor chip and call
> it from x86 or vice versa then either it doesn't work or you pay a
> high price for thunking from one to the other. I think that's
> unavoidable regardless of OS.

Right: doesn't work.  There's no performance penalty because there's no
32-64 bit thunking layer.  

https://blogs.msdn.microsoft.com/oldnewthing/20081020-00/?p=20523

--jkl



[sqlite] 64bit DLL vs 32bit

2016-05-14 Thread James K. Lowden
On Wed, 11 May 2016 11:30:34 +1000
"dandl"  wrote:

> > more about DLLs than it is about SQLite.
> 
> Actually, it's everyone using a language other than C/C++, plus a
> proportion of those too. I use C#, but if you want to call Sqlite
> from Java, Python, etc or even some generic C/C++ app that supports
> plug-ins, then at some point there is a DLL 

How does that follow?  Any higher-than-C language has its own binding
system, and SQLite is a module of some kind, where the C library is
wrapped in the module that exposes its own API.  If the module
statically links in libsqlite3.a -- as, arguably, it should -- then
there's no version ambiguity, no DLL, and no chance of conflict.  

--jkl



[sqlite] Sqlite incompatibility with Postgres

2016-05-14 Thread James K. Lowden
On Fri, 13 May 2016 15:13:01 +0100
Simon Slavin  wrote:

> On 13 May 2016, at 3:07pm, dandl  wrote:
> 
> > I have no deep knowledge of standard SQL.
> 
> I used to know SQL92 very well.  There's no facility for doing
> anything like LIMIT or OFFSET in it.  You had to use your programming
> language to work your way through all the results and skip the ones
> you didn't want.

I suggest the reason LIMIT hasn't been standardized is that it's
contrary to the fundamental idea that rows in a table have no
meaningful order.  SQL doesn't honor relational theory with complete
fidelity, but at least that horse is still in the barn.   

The problem with LIMIT is it's not based in the data.  Cutting off
results at some arbitrary N tells you *nothing* about the data other
than that N or more rows met the criteria.  Note that predicate logic
has constructs for "for all" and "there exists" , but not "are some"!  

I have yet to see a query using LIMIT 1 posted on this list that cannot
be expressed -- better, IMO -- with min().  Queries that limit the
results to "top N" to support things like pagination inevitably include
assumptions about transactionality (or lack thereof) that are either
invalid or ill-considered.  Every one would be better served either by
just fetching the needed rows as required (and letting pending rows
pend), or by supplying the last "high" value as a minimum for the
WHERE clause instead of an OFFSET.  Were I a fan of conspiracies, I'd
suspect the LIMIT-OFFSET constructs were invented by antilogicians to
prevent learning and hobble performance.  

By the way, i'm also a LIMIT club member, with limits.  I use it for
convenience on the command line while exploring data.  It's great for
that, in the absence of pager support.  Maybe keeping it a little
"weird" will help remind new developers to use it as a convenience
instead of a crutch.  

--jkl



[sqlite] Working with blob

2016-04-28 Thread James K. Lowden
On Thu, 28 Apr 2016 20:27:17 +0200
"deltagamma1 at gmx.net"  wrote:

> If I store the blob directly in the sqlite, is there a way to open the
> blob directly with the respective programm (e.g. irfanview or a pdf
> with acroread) ? 

I have heard of a FUSE filesystem implemented with SQLite.  I don't
know if it exposes tables as files, or anything like that.  

If you're going to use an external program like acroread to view the
blob, you're pretty much forced to provide it with a filename, hence
you'll have to write your blob to a file, and give the application that
filename.  A few such programs might be willing to read from standard
input, in which case you could provide a pipe.  Or you might be able to
use a named pipe (mkfifo) instead of a temporary file.  

I don't know a way to write binary data to a file using the
command-line tool.  Maybe there's one out there someone else knows of.  

--jkl


[sqlite] No datasize field - why?

2016-04-25 Thread James K. Lowden
On Mon, 25 Apr 2016 02:31:25 +0100
Simon Slavin  wrote:

> > These are different concerns, and they don't really pose any
> > difficulty.  Given an encoding, a column of N characters can take
> > up to x * N bytes.  Back in the day, "x" was 1.  Now it's something
> > else.  No big deal.  
> 
> No.  Unicode uses different numbers of bytes to store different
> characters.  You cannot tell from the number of bytes in a string how
> many characters it encodes, and the programming required to work out
> the string length is complicated.  

"up to", I said.  You're right that you can't know the byte-offset for a
letter in a UTF-8 string.  What I'm saying is that given an encoding
and a string, you *do* know the maximum number of bytes required.
>From the DBMS's point of view, a string of known size and encoding can
be managed with a fixed length buffer.  

> I would definitely be reading the documentation for the SQL engine I
> was using.

Well, yeah.  :-)  It's well to know how the software you're using
works, whether it's the DBMS or something else.  

Although I have to say I've never had to worry about the size of my
database as a function of string size.  When size matters, rows
dominate, and large numbers of rows never seem to come with big
strings.  

--jkl


[sqlite] No datasize field - why?

2016-04-24 Thread James K. Lowden
On Sun, 24 Apr 2016 08:51:09 -0400
Carlos  wrote:

> But, with very fast CPUs and RAM memory buffers for the directory 
> entries in the disks, the variable length records would probably
> result in gain for much less I/O for the data.




[sqlite] No datasize field - why?

2016-04-24 Thread James K. Lowden
On Sun, 24 Apr 2016 14:09:50 +0100
Simon Slavin  wrote:

> 
> On 24 Apr 2016, at 1:51pm, Carlos  wrote:
> 
> > But, with very fast CPUs and RAM memory buffers for the directory
> > entries in the disks, the variable length records would probably
> > result in gain for much less I/O for the data.
> 
> Agreed.  Which is one reason why fixed-length string columns are less
> important and less used now.  When the bottleneck is the speed of the
> backing store, storing fewer characters can mean the difference
> between having to write one sector or two.

You still have only two choices: compute or seek.  The physical
structure is either like an array, and you can compute the record's
location, or it's like a list, and you have to iterate.  

> Another reason is that we use Unicode not ASCII/SIXBIT/EBCDIC, and in
> Unicode different characters take different numbers of bytes.  So
> even if you're storing a fixed number of bytes the convenience of
> always knowing exactly how many characters to display no longer
> exists.

These are different concerns, and they don't really pose any
difficulty.  Given an encoding, a column of N characters can take up to 
x * N bytes.  Back in the day, "x" was 1.  Now it's something else.  No
big deal.  

Note that SQL still defines lengths in terms of characters.  It's up
the DBMS how to store them (regardless of the agreed-on encoding).  

--jkl


[sqlite] No datasize field - why?

2016-04-24 Thread James K. Lowden
On Sat, 23 Apr 2016 19:22:04 -0600
Scott Robison  wrote:

> So if you could make your table up of integers, floats, and text
> with character limits on them you could get fixed-length rows, which
> might reduce your access time by 60% or more.  Such a decrease in
> access time could mean the difference between being able to update a
> database live or being able to update only during an overnight run.

As I tried to make clear in my reply to Keith, efficiency concerns are
beside the point.  The theory underpinning SQL rests on predicate logic
and set theory.  Freeing the user from concerns of physical storage and
addressing were objectives Codd cited in his first paper.  

In point of fact, unbounded-length records have been supported for
decades.  Sybase called them TEXT and IMAGE types.  You could not
search them, though, only retrieve them once the row had been located
by other means.  Why?  Just as you posit: for efficiency.  The
physical row kept a "pointer" to the TEXT data and, yes, every
physical row had the same length, for efficiency reasons.  

--jkl



[sqlite] No datasize field - why?

2016-04-24 Thread James K. Lowden
On Sat, 23 Apr 2016 14:50:45 -0400
"Keith Medcalf"  wrote:
> > On Sat, 23 Apr 2016 08:56:14 -0400
> > "Keith Medcalf"  wrote:
> > 
> > > Those things that those other DBMSes do are holdovers to maintain
> > > backwards compatibility with the good old days when dinosaurs
> > > ruled the earth
>  
> > As amusing as your rant is, it's not accurate.  Treating columns as
> > types is a "holdover" from mathematics and logic.  It has nothing
> > to do with maintaining backwards compatibility, or the cost of
> > RAM.  
> 
> The specification of "Length/precision" as well as the use of
> "varchar", "varbinary" and the like are holdovers from the days when
> files had to have fixed record lengths so BDAM would work.  

They might have originated in that way on System R.  Ingres was
developed contemporaneously on Unix, which then and now had no
record-based file types.  As I pointed out, 

> > Many new DBMS engines have been written in recent years
> > (notably columnar stores) and, of those that support SQL, none
> > abandoned strict column types.

> (contrary to common mis-belief, varchar(100) does not mean a
> "variable length character field with a length up to 100 characters",

SQL does *not* define implementation.  It defines semantics: user
provides X, system replies Y.  It makes no statement about how a column
is stored.  You know that, surely.  Why paint yourself into a corner
with an argument you know is lost before it begins?  

I guess I should remind you that length-limited character strings have
lots of utilty irrespective of storage concerns.  Some strings *are*
fixed length, for example cusip, ssn, drivers licence, employee id,
phone number.  Length-checking is a simple aspect of validation.  

There are also more banal concerns about external representation.   An
address might be limited to 60 characters so that it fits in the
billing envelope window.  Maybe 60 characters is arbitrary, but we both
know that 6000 characters will be too many.  Length limits help keep
the ruby on the rails.  

> In order for Codd and Date to conceive of a world wherein duck-typing
> existed, a duck-typed language would have to exist first.  Since such
> was inconceivable before its invention, it is no wonder that it never
> occurred to anyone that a database column could be ducky too.

You are seriously underestimating them.  You're misinterpreting the
meaning and purpose of column types.  Your rant about VSAM is, as 
physicists sometimes say, not even wrong.  

If the theory seems abstruse, you also utterly ignore observed
detrimental effects of the lack of type enforcement, namely the
complexity that arises at SELECT time, when the application has to cope
with whatever detritus got parked in the database.  On a SQLite scale,
many times that's not a problem because writers are highly
constrained.  But in general it's a central concern, and was one of the
motivations for the invention of the relational model.  

--jkl


[sqlite] No datasize field - why?

2016-04-23 Thread James K. Lowden
On Sat, 23 Apr 2016 08:56:14 -0400
"Keith Medcalf"  wrote:

> Those things that those other DBMSes do are holdovers to maintain
> backwards compatibility with the good old days when dinosaurs ruled
> the earth

As amusing as your rant is, it's not accurate.  Treating columns as
types is a "holdover" from mathematics and logic.  It has nothing to do
with maintaining backwards compatibility, or the cost of RAM.  Many new
DBMS engines have been written in recent years (notably columnar
stores) and, of those that support SQL, none abandoned strict column
types.  

Granted, duck-typing is/was a design choice.  The good news is it makes
simple things simple.  But it's also the source of quite a bit of
confusion and perplexity, not to mention errors and complexity:
*inserting* is easy, but selection is made more complicated.  The
application has to be prepared to cope with inconsistencies (and
errors) in the data that are prevented by type-checking.  In SQLite,
the "dba" has to exercise care, in a very reduntant way, to effect
type-enforcement of a kind he gets "for free" with a standard DBMS.  

--jkl


[sqlite] No datasize field - why?

2016-04-23 Thread James K. Lowden
On Fri, 22 Apr 2016 19:11:46 -0700
Darren Duncan  wrote:

> The general case of a data type definition is an arbitrarily complex
> predicate expression whose parts vary on the base type and other
> factors.  Given this, if component details of type definitions were
> split out into their own table_info() columns, you'd have a large
> number of columns where most are inapplicable in any given case, eg
> some only apply to strings, some only to numbers, etc.  And so, just
> expressing the type definition as a SQL fragment like table_info()
> currently does provides a compact generic representation with all the
> details, same as in CREATE TABLE.

INFORMATION_SCHEMA.  

It might not be "a compact generic representation", but it is something
CREATE TABLE is not: a tabular representation.  Something that can be
queried instead of parsed.  

--jkl


[sqlite] Caveat entry

2016-04-17 Thread James K. Lowden
On Fri, 15 Apr 2016 22:53:57 +0100
Simon Slavin  wrote:
> On 15 Apr 2016, at 10:05pm, Cecil Westerhof 
> wrote:
> 
> * SQLite datatypes and how SQLite decides which datatype you want
> * SQLite uses affinities not column types
> * thinking you should index each column instead of indexes for queries
> * consequences of various PRAGMAs which might appear to speed SQLite
> up
> * consequences of the two different journal types (including file
> size)

That's a nice list.  I might add using LIMIT and ORDER where MIN would
serve better.  

> The most common mistake which leads to posts to this list is -- still
> the winnah after five years ! -- premature optimization.  I can't
> believe how much coding time and software complication is wasted by
> people who spend hours on multi-thread, queueing and partitioning
> code instead of redesigning their schema or thinking through how
> indexes work.

Yes, except it's not *premature* optimization, is it?  It's misdirected
optimization, worrying about the wrong thing because not measuring.  

Knuth's message is often misunderstood and oversimplified.  Perhaps he
could be better paraphrased as "don't sweat the small stuff", because
he was most certainly not saying, "just make it work and worry about
design choices later".  He did not mean we should ignore O(1) versus 
O(n^2) choices when those choices matter.  His advice doesn't imply
that orders of magnitude are unworthy of our concern.  

The best database performance advice I ever got still applies: think
about how the DBMS will traverse the data, and minimize the work it
will do.  It's all about I/O.  No number of additional threads will
bring the data into memory sooner.  

--jkl





[sqlite] Avoid duplicate sets with join table

2016-04-17 Thread James K. Lowden
On Sat, 16 Apr 2016 01:20:55 +0200
Ketil Froyn  wrote:

> I have two tables and a join table, in principle like this:
> 
> CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT);
> CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE);
> CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord
> INTEGER);

(As a style suggestion, consider not reduplicating plurals.  It's a
one-man band, a 3-star general, and a 5-finger discount.  What you have
 is a set of record_features.)  

> Later, if I come across a new record that has exactly features
> "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want
> to add a new record for this. So given a new set of feature IDs, how
> can I best check if there's another record that has these exact
> features before I insert it?
> 
> To be clear, a record with features "feature1", "feature2", "feature4"
> would be ok. So would "feature1", "feature2". Subsets are ok, but not
> exact duplicates.

This is an application of relational division: you want to know if two
sets are equal.  Well, almost.  You really want to know if the "new" set
is a subset of an existing one.  

You're also a little stuck for lack of syntax.  There are two sets in
question: the extant one and the one to be inserted.  You can insert a
set in SQLite, 

insert into T values (a), (b), (c);

but because that set of rows (a, b, and c) doesn't have a name, you
can't refer to them again in the same query.  So you have to dance a
little jig, something like:

begin transation
insert into record features ...
delete from record features 
id = new_id
where exists ( 
select 1 from record features as rf
where rf.id_r = record_features.id_r
and rf.id_f = record_features.id_f
group by id
having count(*) = (
select count(*) 
from record_features where id_r = new_id
)
);
commit transaction

That puts the rows into the database -- where they can be examined --
and deletes them if they turn out to be a duplicate.  You could also
apply the same logic in an insert & update trigger, and probably should
if the rule applies to the *data*, and not just the application's
expectations.  

--jkl




[sqlite] Working with booleans

2016-04-15 Thread James K. Lowden
On Thu, 14 Apr 2016 20:10:08 -0400
"Keith Medcalf"  wrote:

> select  from  where isActive;

vs. 

> select  from  where isActive = 'T';

AFAIK, the SQL standard requires the second form.  You can't simply say
"WHERE variable"; you must say "WHERE expression".  

OP: I personally usually use the T/F or Y/N versions over 1/0 because my
interactive query results are just a little easier to interpret.  Use
what you like, because size and performance differences are likely to
be negligible.  

--jkl


[sqlite] Primary key values can be NULL

2016-04-15 Thread James K. Lowden
On Fri, 15 Apr 2016 14:13:12 +0200
Cecil Westerhof  wrote:

> 2016-04-15 1:19 GMT+02:00 J Decker :
> 
> > I would total expect any column I created without NOT NULL (double
> > negative) to allow NULL whether INDEX or UNIQUE or PRIMARY is
> > applied additionallywhat database does otherwise?  MSSQL?
> >
> 
> ?Every database that is not SQLite, because it is the standard.?

Last I checked, Microsoft SQL Server raised an error if you declared a
table to have a primary key with a NULL column.  Whether NULL or NOT
NULL was the default depended on database and connection options, but
if the effect was to use a NULL column in a primary key, it failed.  

I stopped worrying about the default a long time ago.  I figure every
time I type NOT NULL, an angel gets its wings.  

--jkl


[sqlite] FOREIGN KEY constraint failed

2016-04-07 Thread James K. Lowden
On Tue, 5 Apr 2016 23:56:53 +0200
R Smith  wrote:

> On 2016/04/05 11:15 PM, Keith Medcalf wrote:
> > Are we confusing immediate constraints (checked per statement) with
> > DEFERRED constraints (checked at COMMIT time) again?

In SQLite some constraints are checked per row, not per statement.  

> We might be - though I assume the OP implicated only deferred 
> constraints - since immediate constraints will fail on contact, and
> as such, no mystery surrounds their origins.

One table may have more than one constraint.  Primary key violations
are reported as such, but CHECK constraint and FK constraint messages
don't mention the column involved.  

OP: I investigated the problem a year ago or more and concluded it's not
easy to remedy.  Today violations are simply counted.  To report them
in detail would require carrying much more state, and externalizing
constraint conditions in human-readable form.  Unless you have a patch
that does all that, I doubt PRAGMA DEBUG_MODE will be realized.  

--jkl


[sqlite] regression in 3.12.0 vs. 3.11.0, column type information in PRAGMA missing

2016-04-06 Thread James K. Lowden
On Wed, 6 Apr 2016 06:13:01 +
Hick Gunter  wrote:

> You are hopefully aware of the fact that SQLite associates type with
> the actual values and not the containers(columns) used to hold these
> values? This means that a data object of any type may be
> held/returned in a column, irrespective of the declared type (which,
> for expressions, is NULL).

Yes, but some of us are careful to include CHECK constraints to enforce
type-checking.  Even when not that careful, many follow the practice of
restricting each column to a single type.  

> What would your date routine do with the string 12.17.9.17.15?

It would never see it.  Problem solved by prevention: 

sqlite> create table T(t TEXT not NULL 
check (strftime('%m', t) is not  NULL)); 

sqlite> insert into  T values ('2016-04-06');
sqlite> insert into  T values ('2016-14-06');
Error: CHECK constraint failed: T
sqlite> insert into  T values ('12.17.9.17.15');
Error: CHECK constraint failed: T

--jkl


[sqlite] regression in 3.12.0 vs. 3.11.0, column type information in PRAGMA missing

2016-04-06 Thread James K. Lowden
On Tue, 5 Apr 2016 13:19:50 -0400
Richard Hipp  wrote:

> CREATE TABLE t2(w SHORT INT, x DECIMAL, y BIGINT, z REAL);
> CREATE VIEW v3 AS SELECT w+x+y+z FROM t2;
> 
> What should "PRAGMA table_info('v3')" report as the column type?

It should report it as for a table, with values consistent with how
SQLite will treat the column.  

A fundamental rule for views is that -- for purposes of SELECT -- they
*are* tables.  The fact that tables have physical representation in the
database and views are derived is, er, immaterial to the relational
algebra implemented by the DBMS.  

--jkl


[sqlite] C API reference manpages

2016-04-03 Thread James K. Lowden
On Thu, 31 Mar 2016 10:21:53 -0400
Richard Hipp  wrote:

> On 3/31/16, Kristaps Dzonsons  wrote:
> >
> > Is there any interest in integrating this tool to have manpages in
> > the doc distribution without downstream bits?
> >
> 
> I think that would be cool.  Integrating your tool into the source
> tree would mean that as the comment formats evolve, your tool would
> evolve in lock-step.  

+1.  I'm another who's wanted SQLite man pages for a long time.  I hope
you can see your way to integrating Kristaps's tool.  

--jkl



[sqlite] Help needed for COPY Command.

2016-03-25 Thread James K. Lowden
On Fri, 25 Mar 2016 17:18:16 +0100
"Domingo Alvarez Duarte"  wrote:

> Why not have direct command ".export table_name" and internally it
> does all commands you mention in one go, simple and intuitively.  

Importing has unique requirements.  

.import filename tablename

is quite limited, but necessary: SQL has no "read from file" feature.   

You could have 

.export filename tablename

but SQL already has a "read from table" feature, and shell already has
a "write to file" feature.  So an export command would be a redundant
subset of what's already available.  

--jkl


[sqlite] sqlite fixed data loading extension

2016-03-25 Thread James K. Lowden
On Fri, 25 Mar 2016 06:49:22 -0500
Don V Nielsen  wrote:

> I have a need for something that can parse and load into sqlite tables
> fixed length data. 

Insert your own separators.  

$ cat input
12345678910111213141516171819202122232425

Print two 5-byte ranges separated by ", ".  

$ awk '{ OFS=", "; print substr($0, 1, 5), substr($0, 6, 5); }'  \
input 
12345, 67891

--jkl



[sqlite] Minor documentation improvement

2016-03-25 Thread James K. Lowden
On Fri, 25 Mar 2016 15:06:46 +
Simon Slavin  wrote:

> Could something be added to indicate that it pays attention to
> '.separator' ?  If you don't already know, you might think it is
> fixed to one file format.  It could be as simple as
> 
> ".import FILE TABLE Import data from FILE into TABLE.
> See .separator ."

The most confusing part of .import for me was figuring out that the
FILE needs quotes.  

--jkl



[sqlite] Article about pointer abuse in SQLite

2016-03-22 Thread James K. Lowden
On Tue, 22 Mar 2016 11:00:24 -0500
"Marc L. Allen"  wrote:

> I don't think compilers "run" your code.  

Provided we're talking about a C compiler, you're right.  Optimizers
don't run the code, they reason about it.  

> The fact that the code never actually allows that path to occur is
> beyond the scope of most compilers, isn't it?

Yes and no.  If the compiler can prove a particular branch can never be
taken, it can remove it because the logic of the program will not be
affected.  If it cannot prove that, the code will remain.  For example,
given

int foo = 0;
if (foo)
exit(0);

the compiler can delete lines 2 & 3.  If there's no other reference to
foo, it can delete line 1, too.  However, 

extern int foo;
if (foo)
exit(0);
and
int foo = 0;
extern int *pfoo;
pfoo = 
if (foo)
exit(0);

both leave most optimzers flat-footed.  The potential for another
module to affect the value of foo means the code could run, and thus
must remain.  

--jkl


[sqlite] Article about pointer abuse in SQLite

2016-03-22 Thread James K. Lowden
On Tue, 22 Mar 2016 09:58:52 -0400
Adam Devita  wrote:

> I don't know the reasoning, but it seems that VS6 often
> initialized things to 0xcd in debug mode and (usually) had memory
> uninitialized to 0x00 when complied in Release (perhaps 0x00 just
> happens to be what was on the stack or heap).  

I would be talking out of school here if you're talking about C#.  For
C and C++, the 0xcd initialization helps make (mis)use of uninitalized
objects more obvious.  If the allocated buffer happens to be
zero-initialized, things like printf will make them appear empty when
they're actually invalid.  

This link has a nice discussion: 


http://stackoverflow.com/questions/2769247/controling-crt-memory-initialization

and includes a link to the documented behavior: 

https://msdn.microsoft.com/en-us/library/Aa270812

--jkl


[sqlite] Article about pointer abuse in SQLite

2016-03-22 Thread James K. Lowden
On Tue, 22 Mar 2016 09:56:57 +0100
"Cezary H. Noweta"  wrote:

> On 2016-03-22 00:35, James K. Lowden wrote:
> >[...]  An example from Clang's discussion is
> >
> > int i = 10 << 31;
> 
> Could you provide a link for that discussion? (Or google's phrase to 
> retrieve such link?)

I'm sorry, no.  Not for the first time I wish my browser had a feature
like "find links in history with documents matching regex".  

I didn't read it on the Clang mailing list.  I think I saw it by
reference in Regehr's discussion of "friendly C".  It specifically
mentioned  10 << 31  as an example of an "integer" requiring 35 bits,
something gcc assigns silently and clang diagnoses with a warning.  

If you haven't seen it, 

http://blog.regehr.org/archives/1180

is a good starting point.  It mentions "Towards Optimization-Safe
Systems: Analyzing the Impact of Undefined Behavior" 
(http://pdos.csail.mit.edu/papers/stack:sosp13.pdf), which is where I
learned that sharp-edged optimization is not a brand-new phenomenon.  

DJB provides a properly justified grumpy, frustrated view, 


https://groups.google.com/forum/m/#!msg/boring-crypto/48qa1kWignU/o8GGp2K1DAAJ

wherein he mentions one of the defenses for the status quo, 

"that a boring C compiler can't possibly support the desired 
 system _performance_. Even if this were true (which I very much 
 doubt), why would it be more important than system _correctness_?"

That should be the only argument needed.  DJB is concerned about
security.  DRH is concerned about correctness.  The serious C
programmer doesn't breath who prizes performance over correctness, yet
that is the license the compiler writers have granted themselves.  

--jkl






[sqlite] Reserved column names

2016-03-21 Thread James K. Lowden
On Mon, 21 Mar 2016 11:32:28 +0100
Dominique Devienne  wrote:

> > Explicitly documented by SQLite:
> >
> 
> And? That's still non-SQL standard.
> 
> SQLite tries to be compatible with non-standard extensions from
> various popular RDBMS', but when a standard alternative exists, it
> should be preferred IMHO. 

Roger's APSW is SQLIte specific.  It's pretty easy to imagine, isn't
it, that 

char sql[] = "select [col] from [foo]";

is easier for him to use than

char sql[] = "select \"col\" from \"foo\"";

even if he's not using C?  

I would certainly advise (and often do) anyone using SQL to learn to
distinguish between standard SQL and any given product's deviations
from it.  Favoring standard constructs helps avoid weird corners and
style.  

But machine-generated code inside a driver specifically for SQLite?
Hard to see who benefits, one way or the other.  

--jkl


[sqlite] Article about pointer abuse in SQLite

2016-03-21 Thread James K. Lowden
On Mon, 21 Mar 2016 13:48:06 -0700
Scott Perry  wrote:

> Compilers allow you to choose your standard; --std=c11 means
> something very specific (and unchanging) 

They do.  And that covers what the standard covers.  The standard also
has limits.  It includes constructs that are syntactically permitted
but whose behavior is left undefined, known by the scarred as "UB" for
"undefined behavior". An example from Clang's discussion is

int i = 10 << 31;

The standard says << is a shift operator.  It places no limit on the
number of bits to be shifted.  If that number is so large that the
product cannot be represented by the assigned variable, that is *not*
an error.  The standard allows the compiler to do anything or nothing
with it.  As you may imagine, the varieties of anything and nothing are
many.  

Compiler writers are well aware that "nothing" is faster done than
"something".  Over time, they have gotten more aggressive in simply
deleting UB code.  As a consequence, programmers who thought they wrote
standards-conforming code get burned when they upgrade/change
compilers.  Mysterious and sometimes subtle errors are introduced by
the compiler for the user's benefit.  

Your googlefu will turn up lots of discussion.  One I liked that wasn't
on Page 1:

http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer

--jkl


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread James K. Lowden
On Sat, 19 Mar 2016 02:04:35 -0600
Scott Robison  wrote:

> As he says, there's not real choice between fast and
> > correct
> 
> Except that testing can verify something is correct for a given
> environment.  

That's actually not true, on a couple of levels.  

"[T]esting can be used very effectively to show the presence of
bugs but never to show their absence."
-- EWD303

I think that should be called Dijkstra's Dictum.  It's not just quip;
it's a concise insight into limits of testing versus proving
correctness.  

Second, you can't test the future.  If the correctness of the code is
subject to change by the compiler's interpretation of the language, how
is the programmer to prevent it?  

> > finally drive gcc & friends in the direction of working
> > with their users for a change.  Or make them irrelevant.
> 
> I think they'd continue to be popular with people looking to eek out
> as much performance as possible.

You may be right.  As a consultant I've often felt I was hired to
sprinkle magic pixie performance dust on the system.  People want to
believe that performance is found in tools.  How come there's no -O5?  

In truth, every performance problem I've encountered was a design
problem, often obvious, always unnecessary.  "Use a better compiler"
has never been the solution.  Unloading mounds of unnecessary
processing with a pitchfork is.  

Doubtless there are some well tested, highly stable applications run at
scale, for which 5% is a measurable and meaningful gain.  IMO they're
actually the ones driving UB treatment by compiler writers.  The
other 99% stand to gain from a compiler that emphasizes correctness and
predictable behavior.  

--jkl



[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread James K. Lowden
On Fri, 18 Mar 2016 16:33:56 -0600
Scott Robison  wrote:

> I'd rather have code that might use some "undefined behavior" and
> generates the right answer than code that always conformed to defined
> behavior yet was logically flawed. 

Code that falls under undefined behavior *is* logically flawed, by
definition.  Whether or not it works, it's not specified to.  The
compiler may have generated perfectly correct machine code, but another
compiler or some future version of your present compiler may not.  

You might share my beef with the compiler writers, though: lots things
that are left undefined shouldn't be.  Because hardware architecture
varies, some practices that do work and have worked and are expected to
work on a wide variety of machines are UB.  A recent thread on using
void* for a function pointer is an example: dlsym(2) returns a function
pointer defined as void*, but the C standard says void* can only refer
to data, not functions!  

Machines exist for which the size of a function pointer is not 
sizeof(void*).  Source code that assumes they are the same size is not
portable to those architectures.  Fine.  But a particular compiler
generates code for a particular architecture.  On x86 hardware, all
pointers have always been and will always be the same size.  All
Linux/Posix code relies on that, too, along with a host of other
assumptions. If that ever changed, a boat load of code would have to be
changed.  Why does the compiler writer feel it's in his interest or
mine to warn me about that not-happening eventuality?  For the machine
I'm compilng for, the code is *not* in error.  For some future machine,
maybe it will be; let's leave that until then.  

I was looking at John Regehr's blog the other day.  I think it was
there that I learned that the practice of dropping UB code on the floor
has been going on longer than I'd realized; it's just that gcc has been
more aggressive in recent years.  I think it was there I saw this
construction:

if( p < p + n)
error

where p is a pointer.  On lots of architectures, for large n, p + n can
be negative.  The test works.  Or did.  The C standard says that's
UB, though. It doesn't promise the pointer will go negative.  It doesn't
promise it won't.  It doesn't promise not to tell your mother about
it.  And, in one recent version, it doesn't compile it.  Warning?  No.
Error? No.  Machine code?  No!  It's UB, so no code is generated (ergo,
no error handling)!  Even though the hardware instructions that would
be -- that used to be -- generated work as implied by the code.

Postel's Law is to be liberal in what you accept and conservative in
what you emit.  The compilers have been practicing the opposite,
thwarting common longstanding practice just because they "can".  

Dan Bernstein is calling for a new C compiler that is 100%
deterministic: no UB.  All UB per the standard would be defined by the
compiler.  And maybe a few goodies, like zero-initialized automatic
(stack) variables.  

Such a compiler would enjoy great popularity, even if it imposed, say,
a 5% performance penalty, because C programmers would have greater
confidence in their code working as expected. They'd have some
assurance that the compiler wouldn't cut them off at the knees in its
next release.  As he says, there's not real choice between fast and
correct  If the "always defined befavior" compiler got off the ground,
may it would finally drive gcc & friends in the direction of working
with their users for a change.  Or make them irrelevant.  

--jkl




[sqlite] CAST STRING => INTEGER

2016-03-17 Thread James K. Lowden
On Wed, 16 Mar 2016 01:53:59 -0600
Scott Robison  wrote:

> > For example, even the operation "select cast(pow(2,65) as integer)"
> > and
> "select cast(-pow(2,65) as integer)" should return NULL rather than
> MAXINT and MININT respectively.
> 
> The $64 bit question ;) is how much existing code might break if such
> changes were made. One can argue that the existing implementation is
> broken, but a lot of software has been written to use it as it is.
> What happens to them if such an improvement is made?

Doubtless there is code out there that relies on division by zero
producing NULL.  I personally would like to see a 

pragma MATH=ON

to change that.  

Many of the outputs the OP has enumerated are considered undefined
behavior.  Any computation relying on them would produce incorrect
results.  Changing that to NULL, as Keith recommends, would arguably
make them "correct", and wouldn't violate the promise not to change
*defined* behavior.  IMO the user would be better served by raising an
error, so that it can be dealt with at its source.  

--jkl



[sqlite] CAST STRING => INTEGER

2016-03-17 Thread James K. Lowden
On Tue, 15 Mar 2016 19:33:32 -0600
"Keith Medcalf"  wrote:

> > Yes, if the string cannot be represented as an integer, CAST should
> > raise a range error.  That spares the application from applying the
> > same test in an ad hoc and inconsistent way.
> 
> Since there is no way to "trap" such errors, out-of-bounds
> conversions should return NULL.  J

No way to trap an error condition for a SELECT statement?  Is that what
you mean?  

To my way of thinking, a CAST range error is a constraint violation.
Why can sqlite3_step not return SQLITE_CONSTRAINT_CHECK?  

> ust as attempting to divide by zero returns NULL.

In SQLite, yeah.  Other DBMSs don't make that mistake.  It's at the
top of my list for why SQLite is unsuitable for quantitative work.  

--jkl



[sqlite] SQLite Pronunciation

2016-03-17 Thread James K. Lowden
On Wed, 16 Mar 2016 14:09:08 -0500
Jay Kreibich  wrote:

> although if you trace SQL back to the IBM days of SEQUEL, there is a
> strong argument that the term ?sequel? makes more sense.

IBM insisted "SQL" be pronounced as three letters for exactly that
reason: to distinguish it from its forerunner, SEQUEL.  IIRC, SEQUEL
was trademarked by another firm, and pronouncing SQL to sound like
"sequel" risked trademark infringement.  

> ?an ess-cue-ell lite database?

Yup, that's what I write, too.  But I've given up my Pedant Pin in
conversation because "sequelite" trips off this English-speaker's
tongue.  Practically saves a second every time I say it, must have
added up to hours by now.  

--jkl



[sqlite] CAST STRING => INTEGER

2016-03-15 Thread James K. Lowden
On Tue, 15 Mar 2016 01:02:17 +0100
"Cezary H. Noweta"  wrote:

> 2nd row: why REALs can have trailing spaces, while INTEGERs cannot?
> 3rd row: why REALs can have trailing trash, while INTEGERs cannot?

I think we know now that string->integer conversion is pathologically
broken for inputs that cannot be represented as a 64-bit integer.
Appending '0' to any such string produces a different, illogical
result, too.  

> While fixing, I spotted a problem mentioned by you:
> 
> > 2)  sqlite> select cast('1' as int);
...
> > 7766279631452241920
> 
> It would be:
> 
> 1. left as is, i.e. CAST(manydigitstext AS INTEGER) == 
> MAX(TEXT2INT(manydigitstext)%2^64,LARGEST_INT64) --- Clemens Ladisch' 
> opinion;

That is not acceptable IMO.  Conversions should be reversible,
including string->integer->string.  Conversions should also be
equivalent, and no one is claiming '1' == 2^64.  

> 2. CAST(manydigitstext AS INTEGER) == {SMALLEST_INT64,LARGEST_INT64}
> --- your opinion as I understood you well;

Yes, if the string cannot be represented as an integer, CAST should
raise a range error.  That spares the application from applying the
same test in an ad hoc and inconsistent way.  

To my way of thinking, SQLite's handling of giant integers per se
is an edge case.  Because such huge numbers don't normally arise, the
non-error path (inputs in bounds) almost always produces correct
results. The reason to have CAST raise a range error when the output
would be invalid is to guard against erroneous inputs creating spurious
outputs that, when used in further computation, produce inexplicable
results. It's better for the user and programmer to detect the error
early -- nearest the cause, where it can be diagnosed -- than to
produce a bogus answer and be forced into a manual recapitulation of the
processing.  

--jkl




[sqlite] CAST STRING => INTEGER

2016-03-14 Thread James K. Lowden
On Mon, 14 Mar 2016 13:25:09 +0100
Clemens Ladisch  wrote:

> > that ``SELECT CAST(col AS INTEGER);'' should return (not so) random
> > result set, and receiving any INTEGER should mean that a source
> > string could have trillion or more possible values?
> 
> The documentation does not specify how the textual representation of
> an out-of-range integer is to be converted, so anything SQLite does
> can be considered correct.

While I'm sympathetic to the GIGO notion that invalid input (a domain
error, in this case) is bound to produce unpredictable output, you have
to admit there's some inconsistency here.  I would argue the
inconsistency is harmful, and would be better thought of as a
constraint that needs to be enforced.  Consider:

1)  sqlite> select cast('1' as float);
cast('1' as float)
--
1.0e+20   

2)  sqlite> select cast('1' as int);
cast('1' as int)

7766279631452241920 

3)  sqlite> select 1;
1
-
1.0e+20  

4)  sqlite> select 100 * 100 ;
100 * 100
-
1.0e+20  

5)  sqlite> select cast(100 * 100 as int) ;
cast(100 * 100 as int)
--
9223372036854775807   


6)  sqlite> select cast(1 as text);
cast(1 as text)
---
1.0e+20

7)  sqlite> select cast(cast(1 as text) as
int); 
cast(cast(1 as text) as int)

1   

Before you say, "don't do that", keep in mind that sum() could produce
the same integer-overflow result, as could string concatenation.  If
SQLite produces erroneous results, and places the burden of error
detection -- errors that orginate in the data -- on the application.  

It's hard to justify any of the above.  Integer arithmetic produces
floating point if the result can't be represented as an integer?
That's dubious enough.  When forced to be an integer, it's 2^63.  The
application could reasonable expect that integer arithmetic results in
an integer (as it normally does); when the double is returned instead,
it will still call sqlite3_column_int with its implicit cast.  No
error, just MAX_INT.  

In 7, we convert an integer to text and back to int, and get 1.  
I understand why, mechanically.  But I would say any form of 

T -> TEXT -> T

conversion should be lossless, for any type T, given sufficient width
of TEXT.  

The problems could be avoided by observing simple but strict rules:

1.  Integer arithmetic produces integer result.  Error on overflow. 
2.  CAST raises a range error if the value would be outside the range
of the target type.  
3.  Implicit conversion by e.g. sqlite3_column_* adhere to the same
rule as CAST.  

--jkl





[sqlite] Creating system table

2016-03-13 Thread James K. Lowden
On Sat, 12 Mar 2016 13:07:01 -0500
Igor Korot  wrote:

> My question is: what should I do if I want to create a system table?

Change the source code?  

A system table differs from a user table in how it's created.  User
tables are defined with CREATE TABLE of course, but system tables are
representations of the DBMS's data.  The underlying structure of a
system table need not be -- and normally isn't -- the same as that of a
user table.  Different system tables may have different underlying
structures.  

An analog in Linix is the /proc filesystem.  

System tables are a very old idea.  They're mentioned in Codd's 12 rules
for defining a relational system.  *All* data in the system, including
metadata, are to be represented as tables and manipulable using the
same language as ordinary tables.  One language, one datatype.  

For example, you might do us the service of creating
sqlite_constraints, with columns tablename, colname, and constraint.
When the user selects from that table, you could parse the database's
DDL and return those columns.  

A more pressing problem from this perspective is pragmas.  To discover
the value of a pragma in SQLite, one uses the pragma statement.  Better
would be a table sqlite_pragmas listing all the current values.  

--jkl



[sqlite] "Circular" order by

2016-03-10 Thread James K. Lowden
On Thu, 10 Mar 2016 21:16:28 +0200
R Smith  wrote:

> > Hmm, does this work any better?
> >
> > SELECT id FROM t
> > ORDER BY id < 'pen' desc, id;
> 
> It works, but not better. I think it was Igor who proposed similar
> (if not, apologies) which of course produces the correct result, but
> cannot take advantage of the index on id so it becomes a result-set
> walk causing longer ordering of values - exactly what the OP tried to
> avoid.

Hmm, I don't know about "cannot", but I'm not surprised by "does not",
because it's a tough inference.  

If you stand back a minute, you can see that

id < 'pen'

is a monotonic function of "id" if "id" is sorted.  The query processor
*could* include that logic, and could choose to process the rows, in
index order, starting with the first row where  id >= 'pen'  , to the
end, and wrapping back to the beginning.  

The big boys do that kind of thing.  In general "order by f(x)" will
use an index on x if f(x) has the same order.  For hard problems, they
support computed columns -- a little like a view attached to a table --
and indexes on them.

In SQLite, I guess the OP's only solution is to make an index of the
kind he needs.  

create table idx as select id < 'pen' as 'LT_pen', id from t;
select id from idx order by LT_pen;

Of course, that presupposes 'pen' is a constant.  Whether or not that's
true wasn't mentioned in the original post.  

--jkl



[sqlite] "Circular" order by

2016-03-10 Thread James K. Lowden
On Thu, 10 Mar 2016 10:17:57 +0100
Alberto Wu  wrote:

> On 03/09/16 23:30, James K. Lowden wrote:
> >> SELECT P.id FROM (
> >>   SELECT 0 AS sect, id FROM t WHERE id >= 'pen'
> >>   UNION ALL
> >> SELECT 1, id FROM t WHERE id < 'pen'
> >> ) AS P
> >> ORDER BY P.sect, P.id
> >> ;
> > 
> > This is the correct answer.  
> 
> Hi,
> 
> unfortunately the correct answer comes with an extra scan and a temp
> b-tree so I'd rather keep the two queries split and handle the case
> programmatically.

Hmm, does this work any better?  

SELECT id FROM t
ORDER BY id < 'pen' desc, id;

--jkl


[sqlite] "Circular" order by

2016-03-09 Thread James K. Lowden
On Wed, 9 Mar 2016 20:43:14 +0200
R Smith  wrote:

> SELECT P.id FROM (
>   SELECT 0 AS sect, id FROM t WHERE id >= 'pen'
>   UNION ALL
> SELECT 1, id FROM t WHERE id < 'pen'
> ) AS P
> ORDER BY P.sect, P.id
> ;

This is the correct answer.  

I'm not sure what you meant by "axiom" in your earlier post.  To the
best of my knowledge, UNION ALL makes no promise about order (and
standard SQL proscribes ORDER BY in a subquery).  If the implementation
finds it more efficient, for instance, to send alternating rows from
each element in the union, it's free to do so.  

--jkl


[sqlite] Correlated subquery refers to wrong rowid - bug?

2016-03-09 Thread James K. Lowden
On Wed, 9 Mar 2016 10:13:28 -0500
Richard Hipp  wrote:

> > which outputs one result (2), although the expected result would be
> > empty.

Sorry for my "what bug?" post.  I forgot that the output was wrong!  

--jkl



[sqlite] Correlated subquery refers to wrong rowid - bug?

2016-03-09 Thread James K. Lowden
On Wed, 09 Mar 2016 15:32:01 +0100
Jean-Christophe Deschamps  wrote:

> >   select id from a where id not in (select a.id from b);

> As I understand it, there is no more an a.id column in table b. It 
> looks like SQLite is trying to get clever ignoring the "a." qualifier.

It's not ignoring the qualifier.  It's processing an odd correlated
subquery.  This would produce the same:

select id from a where id not in (select a.id);

For each row in a, select the id that is not the id in the row.  

What might be a little suprising is that the columns named in the
SELECT need not come from the table in the FROM clause (if there is
one).  For example, I'm sure you find this valid: 

select id from a where id not in (
select 2 * (a.id / 2) from b
where a.id > b.id
);

Granted, that's a long walk for "odd ids in A greater than the smallest
id in B.  But it's not SQL's job to force succinct expression.  

--jkl


[sqlite] Correlated subquery refers to wrong rowid - bug?

2016-03-09 Thread James K. Lowden
On Wed, 9 Mar 2016 10:13:28 -0500
Richard Hipp  wrote:

> >   select id from a where id not in (select a.id from b);
> >
> > which outputs one result (2), although the expected result would be
> > empty.
> >
> 
> Thanks for the bug report.

What bug?  The query is valid SQL, and produces the correct results.
The programmer would have been better off writing a useful query, but
we've all been there.  

--jkl


[sqlite] Fastest way to find whether at least one row has a certain column value

2016-03-06 Thread James K. Lowden
On Sun, 6 Mar 2016 11:39:38 +
Paul Sanderson  wrote:

> I understand this - but, there always a but, I still would like to do
> something. Applying the limit anyway and then telling them the query
> has been limited might be a solution.
> 
> Time is usually not an issue but as the results are loaded into a grid
> for display memory can be the issue.

So execute the query, and start displaying the rows.  When you reach
"too many", ask the user if he wants to continue.  Or light up the
"next page" button, or whatever.  

(That doesn't exempt you, btw, from providing for the user to cancel
long-running queries.  But you know that.)

ISTM what you really want (other than a crystal ball) is tabular access
to the query optimizer's estimations, so you could so something like

select "estimated rows" from (
estimate query select ... 
) as A;

I've never seen a system that provides queryable optimizer metadata.  I
don't remember ever having read a paper on the idea, either.  But
that's not too surprising: the purpose of externalized query optimizer
output is normally to help the user understand what the optimizer will
do, not let the program decide whether or not to do it.  

I still think you're barking up the wrong tree.  You suppose that you
can vet the queries to prevent accidental cross joins not by finding
cross joins directly, but indirectly, by finding inordinately large
outputs.  I suggest that will lead to a lot of false positives,
because mistakes like that are actually pretty rare.  You and your user
are better off if you don't second-guess him.  Just execute the query
and show him the output in a controlled way.  He'll deal with
long-running queries or supervoluminous output the way we all have since
DBMSs were invented: cancel the query, and start looking into what
happened.  

--jkl


[sqlite] How does your sqlite script binding handle aggregate UDFs?

2016-03-06 Thread James K. Lowden
On Sat, 5 Mar 2016 21:22:23 +0100
Stephan Beal  wrote:

> i'm not aware of any aggregates which (in normal use) take no
> arguments

Nondeterministic functions need not take any arguments.  Built-in
examples include NOW().  Your UDF could implement the Dilbert RNG:

http://dilbert.com/strip/2001-10-25

--jkl





[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?

2016-03-06 Thread James K. Lowden
On Sat, 05 Mar 2016 17:16:52 -0700
"Keith Medcalf"  wrote:

> > Sometimes it's faster to recompute something than to cache it for
> > later re-use. That's rare where I/O is involved, and vanishing rare
> > where SQL is involved.
> 
> The only thing worse is retrieving the entire result set and cacheing
> the whole think in the gooey structures.  Those wheels fall off the
> bus much sooner.

You make me feel like the 8th grade teacher who complains she teaches
the same stuff every year, but the kids never get any smarter.  

I truthfully don't come across the classic client-side join of two
"select * from T" queries anymore.  I'm sure it happens, though.  Very
few shops include a team of SQL experts (or expert) whose mandate
is to design tables and queries in the service of the applications.
Costs of that shortsightedness are easy to cite if you know where to
look, but are not to be found in the salary and delivery schedules that
management usually pays attention to.  

--jkl



[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?

2016-03-05 Thread James K. Lowden
On Fri, 04 Mar 2016 00:35:47 -0800
Darren Duncan  wrote:

> > How exactly is the first way "easiest"?
> 
> If these are pages displayed to the user, they may want to scroll
> backwards at some point; 

They might, and if you say it's easier to go back to the database than
to keep track of previously fetched data for re-display then, thanks,
at least I understand your point of view.  

For myself I can't imagine such a design.  After I've gone to the work
of preparing the query fetching the results, and placing them in
whatever construct is needed for display to the user, I'd certainly
hang onto my display structures until the user was done with the
data.   If the user wants to see it again, the last thing I'd want to
do is repeat all that.  

Sometimes it's faster to recompute something than to cache it for later
re-use. That's rare where I/O is involved, and vanishing rare where SQL
is involved.  

--jkl


[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?

2016-03-03 Thread James K. Lowden
On Thu, 3 Mar 2016 10:43:26 +0800 (CST)
??  wrote:

> > Can anyone describe a situation for which this style of LIMIT &
> > OFFSET is advisable from the application's point of view?  (The
> > DBMS costs are obvious enough.)  
> 
> For me this is the easiest way to implement a scrolling cursor.
> Otherwise I will have to record for each table the condition I use
> to  scroll it. So if there is an easier way to implement the
> scrolling cursor please let me know.

You say, "record ... the condition ... to  scroll [the table]".  I'm
sure I don't know what "condition" you mean.  

You appear to be doing something like:

offset = 0
do 
sqlte3_prepare (limit, offset)
do
sqlite3_step
until SQLITE_DONE
offset += limit
... application stuff ...
until last row processed

The obvious, better alternative is

sqlte3_prepare
do 
do
sqlite3_step
until LIMIT
... application stuff ...
until last row processed

because in the second example you're using SQLite's cursor to "record
the condition of the table".  There's nothing for you to keep between
"pages"; you just start from where you stopped.  

How exactly is the first way "easiest"?  

--jkl



[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?

2016-03-02 Thread James K. Lowden
On Wed, 2 Mar 2016 14:12:04 +0100
Clemens Ladisch  wrote:

> > https://www.sqlite.org/lang_select.html talks about LIMIT & OFFSET,
> > without mentioning that is a bad idea.
> 
> Neither does it mention that it is a good idea.
> 
> > can I do that or not (will it become sluggish if I do that) ?
> 
> When you use large OFFSET values, the database must compute all these
> rows before throwing them away.

Can anyone describe a situation for which this style of LIMIT & OFFSET
is advisable from the application's point of view?  (The DBMS costs are
obvious enough.)  

>From first principles, if the application has OFFSET, it might has well
keep the connection handle instead, and just stept through the next
LIMIT rows. What put the DBMS throught through the work of re-executing
the query, when sqlite3_step is at the ready?  

My guess is the answer is going to have something to do with a web
framework, and the cost of maintaining open connections, and the fact
that OFFSET can be stored in a cookie but a handle cannot.  I wonder
about that trade-off in general, because some fraction of discarded
cursors will be reconstituted as new (inefficient) queries.  And for
SQLite in particular, it's hard to imagine so many clients that
discarding processes is better than letting them consume virtual memory
until they time out.  

But rather than speculate, I'd be interested to hear of real
motivations.  

thanks.  

--jkl



[sqlite] How to check if connection to main database is still open or closed before new query ?

2016-03-02 Thread James K. Lowden
On Tue, 1 Mar 2016 17:13:29 +
a a  wrote:

> I want to check after a while if the connection is allready closed or
> not for the simple reason not to reopen the database but if is open
> to run a query or if it is closed to reopen the database and then run
> the query.

I don't blame you for thinking that's what you want to do, but you
really don't.  

Consider that the situation you describe has many analogs.  You cannot
check if a file is closed, if a socket is closed, if free(3) has been
called on a pointer.  

More important, suppose you *could* check those things.  Could the OS
assure you that the file you opened, the socket, or the memory pointed
to is the resource you intended?  Could it say anything about the state
of that resource?  On both counts: No.  

The solution is to organize your program such that there's no question
about the state of its resources.  If you must have a flag then, as
Keith suggests, the most common way is to set the handle to NULL after
you close it.  

But simplest is best.  It's often feasible to open the database just
once at the beginning and never close it until exit.  Unless you have
hundreds of databases open concurrently or are working in an extremely
constrained environment, the overhead of carrying around a largely
unused database handle is negligible.  

HTH. 

--jkl


[sqlite] Random-access sequences

2016-03-01 Thread James K. Lowden
On Tue, 1 Mar 2016 08:15:25 -0500
Richard Damon  wrote:

> > The theoretical maximum number of rows in a table is 264
> > (18446744073709551616 or about 1.8e+19). This limit is unreachable
> > since the maximum database size of 140 terabytes will be reached
> > first. A 140 terabytes database can hold no more than approximately
> > 1e+13 rows, and then only if there are no indices and if each row
> > contains very little data.
> >
> You can hit 2^63 insertions well before hitting the size limit of the 
> database if you have also been doing deletions.

Yes.  If you manage 1,000,000 insertion/second, that's 3.15576 *
10^13/year.  You would run out of integers in 584,542 years.  

To get around that, add an "epoch" column, also integer.
Initially it is always zero.  Whenever "position" exceeds 2^63,
increment "epoch" and reset "position" to zero.  

That will give you at least twice as many years.  

--jkl


[sqlite] Encrypt the SQL query

2016-02-26 Thread James K. Lowden
On Thu, 25 Feb 2016 14:01:31 +0800
 wrote:

> Does SQLite provide a good way to encrypt the SQL query strings while
> does not affect the performance when executing the queries?

If you're worried about the user examining your program image
statically, you could encrypt your SQL by whatever means, and decrypt
it "just in time":

sqlite3_prepare(db, decrypt(sql, key), ... );

Of course, if the key is also present in the image, you're only
discouraging the uninterested.  (Something I suppose Tim Cook knows a
thing or two about...)

If you're worried about the user examining the running program -- for
example, with ltrace(1) --  then your question is moot, because at some
point the encrypted SQL wiill have to be decrypted before SQLite
interprets it.  

--jkl


[sqlite] Correlated subquery throwing an error

2016-02-16 Thread James K. Lowden
On Mon, 15 Feb 2016 14:19:12 -0700
Scott Robison  wrote:

> Each job will take some amount of time to process. The order doesn't
> matter as long as all jobs are eventually processed and you have a
> single process running the jobs. Limit 1 is a reasonable way to grab
> a single job.

Reasonable, perhaps, but not logical.  The logical approach is to use a
feature in the data to select the "single job".  One obvious way in
your example would be to use min(jobid) or somesuch.  

--jkl


[sqlite] Correlated subquery throwing an error

2016-02-16 Thread James K. Lowden
On Mon, 15 Feb 2016 14:55:34 -0700
"Keith Medcalf"  wrote:

> Pretty sure you meant:
> 
> select * from (select min(t) as t from T) as T;

Yes, thanks.  :-)

--jkl


[sqlite] Performance comparison between SQLite and SQL Server?

2016-02-15 Thread James K. Lowden
On Mon, 15 Feb 2016 11:21:06 +0800
 wrote:

> I am just curious whether there is a performance comparison between
> SQLite and SQL Server? 

Odds are you will never see a such a comparison published.  If you read
your SQL Server EULA, you'll see it specifically prohibits publishing
benchmark results.  

The restriction is not purely evil.  It could be argued that the flaws
embedded in any testing regimen render the conclusion meaningless.  

Performance of any kind is notoriously difficult to measure.  That's
especially true of DBMSs because of their complexity. Optimal DBMS
performance is a function of hardware, OS, and DBMS configuration.
That's a lot of expertise to bring to bear on making *one* DBMS perform
at its best; doing it for N is at least N times harder, especially if
you're being scrupulous.  

Back when SQL was young, people still read magazines about new
technology, and benchmarks were a popular feature.  I remember poring
over them to learn whether Oracle was faster than Sybase, etc.  But the
more I learned, the better I understood the futility of the question.
There were too many moving parts, and the journalists had nowhere near
the resources needed to do the job properly.  Any decision they
rendered was flawed, and in any case became obsolete as the technology
evolved.  

I think it was in answer to such testing that the vendors began
including restrictions on published performance measurements in their
EULAs.  

In answer to your question, I would say SQLite performs very well
provided you live with its restrictions.  In particular:

1.  Very little write contention.  
2.  All processes running on the same machine, using a local
filesystem.  
3.  Queries of limited complexity, or data of limited size.  

That is, you're lijmited to one writer at a time, you can't use a
network filesystem, and query optimization will hurt with large tables
if, for example, subqueries can't be flattened.  

SQL Server has none of those restrictions, and probably keeps pace with
SQLite even on its home turf.  But the administration of SQL Server is
nontrivial.  For that reason alone, I would never use it in situations
where SQLite would do.  

--jkl


[sqlite] Correlated subquery throwing an error

2016-02-15 Thread James K. Lowden
On Mon, 15 Feb 2016 08:56:35 +0100
Clemens Ladisch  wrote:

> I don't know why correlated subqueries cannot use values from the
> outer query in the ORDER BY or LIMIT clauses; 

ORDER BY is not part of SELECT!  It's not a relational operator.
Per the SQL standard -- ORDER BY cannot appear in a subquery. It can
appear in only one place: as a kind of post-processor that determines
the order in which the rows are delivered to the client.  

I suspect that's why Postgres ORDER BY doesn't recognize column names
in the enclosing scope.  If the day ever comes when SQLite can remove
language features, internal ORDER BY (and LIMIT) would be at the top of
my list, along with double-quotes for strings. 

--jkl


[sqlite] Correlated subquery throwing an error

2016-02-15 Thread James K. Lowden
On Mon, 15 Feb 2016 10:39:31 +0100
Clemens Ladisch  wrote:

> > you need to explicitly limit a subquery that is a field and must
> > only ever return 1 result if the where clause is ambiguous about it
> 
> Not in SQLite.  (It ignores superfluous rows, and returns NULL if
> there are no rows.)

Yes, but the right way to do is to use logic instead of brute force.
Instead of 

select * from (select t from T order by t limit 1) as T;

use

select * from (select min(t) as t from T group by t) as T;

The latter has the benefit that it can be easily modified to add 
COUNT(*) to the subquery, and check for perhaps erroneous cases where
COUNT(*) > 1.  

--jkl


[sqlite] IS a SQLite db of small size as good as reliable cache?

2016-02-02 Thread James K. Lowden
On Mon, 01 Feb 2016 06:39:05 -0700
"Keith Medcalf"  wrote:

> OS/2 had IBM cache technology in it which worked properly.  

I remember OS/2.  I remember that, like VMS, you could back up the
whole OS to ... well, floppies, I suppose, and later restore them to
brand new drive, with nothing else required.  I couldn't believe my
eyes when the equivalent "system restore" process on NT started with 

1.  Install the OS.

Twenty-five years on, that's still SOP on NT, except that nowadays no
one expects to restore a backup anyway!  

But IIRC, NTFS was a Microsoft invention, not connected to OS/2, and
borrowed quite heaviily from VMS, because David Cutler.  So I'm
inclined toward your low-wattage theory, because hardly anything Cutler
originally provided remains unsmudged.  

--jkl


[sqlite] Is the first column of a composite primary key, special?

2016-02-02 Thread James K. Lowden
On Tue, 2 Feb 2016 16:19:07 +0100
Yannick Duch?ne  wrote:

> There are also representations. Sometimes there is not really a
> value, just an identity which is the only thing offering
> sense/meaning, and what may be erroneously seen as a value is rather
> a representation. 

Representation is all.  

The database -- the whole thing, front to back -- is just a model, an
approximation of the real world.  As designer, you decide how to model
that world, the "enterprise of interest".  You decide what "entities"
there are, what properties they have, how they are known (identified).

The DBMS knows nothing of what is being modelled.  It's "just" a logic
system. That logic operates on values. It doesn't distinguish between
kinds of values, between natural and surrogate keys.  That sort of thing
contributes to the understandability (and hence utility) of the model,
but they're extralogical: outside the theory.  

> All of this, using ID in place of *some* values???however at the cost
> of more verbose queries

Back in the real world -- the computing world, where there's an actual
system implementing the math -- yes, some things are faster than
others, and some allowances have to be made for the limitations of the
implementation and the computer.  :-)  

--jkl


<    1   2   3   4   5   6   7   8   9   10   >