Re: [sqlite] Rowid alias and description of query result: suspected tiny bug

2019-03-17 Thread Yannick Duchêne
On Sun, 17 Mar 2019 14:27:19 -0600
"Keith Medcalf"  wrote:

> 
> specifies the name you want to appear for the first column.  The other names 
> for the other four columns are free to be whatever they want ... that is, 
> without an AS clause, the column names are not guaranteed to be anything 
> other than a random basket of characters though SQLite will attempt to return 
> the underlying table and column name (from the table declaration).

Good to know, thanks for that point. For SQL query text to be saved and stay 
for long, I will always use AS.

> >Aside, what also surprised me just a moment ago and I never noticed
> >before, is that although I can refer to ROWID (not aliased) anywhere
> >in a query, it seems to not work properly in `using(rowid)`, if the
> >ROWID is automatically generated in one of the table (ex. an ordinary
> >table with no “without rowid”) and it is not in the other table (ex.
> >a column named ROWID in a VIEW).  
> 
> You will have to provide an example.
> 
> t1 JOIN t2 USING (ROWID) works perfectly fine for me when ROWID is an 
> explicitly defined rowid (integer primary key in a rowid table).  Are you 
> trying to perform a join using the ephemeral ROWID?
> 

Here an example:

create table test (a int, b int);

insert into test (a, b) values (1, 2);
insert into test (a, b) values (3, 4);
insert into test (a, b) values (5, 6);
insert into test (a, b) values (7, 8);

create view testview (rowid, sum) as select rowid, a + b from test;

select rowid from test; -- OK, the column exists.

select rowid from testview; -- OK, the column exists.

select testview.sum from testview, test using (rowid) where test.a <= 3; -- 
Fails.

-- SQLite complains: “cannot join using column rowid - column not present 
in both tables”.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Rowid alias and description of query result: suspected tiny bug

2019-03-17 Thread Yannick Duchêne
Hi people, it’s a long time I did not get there.

I’m currently to create and use an SQLite DB with the Python biding APSW. For 
each row returned, I always retrieve the description with 
`Cursor.getdescription()`. Surprisingly, during a test, I noticed the name of a 
column as returned in the description, is not the same as in the query. That 
column, is an alias of the ROWID.

Here is an example to test:

-- `id` is `integer primary key`, so that it is an alias of rowid,
-- which is necessary to use it in `using (…)` clauses.

create table test (id integer primary key, a int, b int);

insert into test (a, b) values (1, 2);
insert into test (a, b) values (3, 4);
insert into test (a, b) values (5, 6);
insert into test (a, b) values (7, 8);

select rowid, id, a, b from test;

The description of the SELECT query result, returns these column names: ID, ID, 
A, B, instead of ROWID, ID, A, B.

That’s not annoying to me, since this was just to check ID is indeed an alias 
of ROWID (it’s a long time I did not use SQLite), but may be that’s a bug, 
although not a nasty one, so I wanted to tell. Unless I’m wrong and that’s the 
expected result?

Aside, what also surprised me just a moment ago and I never noticed before, is 
that although I can refer to ROWID (not aliased) anywhere in a query, it seems 
to not work properly in `using(rowid)`, if the ROWID is automatically generated 
in one of the table (ex. an ordinary table with no “without rowid”) and it is 
not in the other table (ex. a column named ROWID in a VIEW).
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suspected bug: parse error depending on platform

2017-12-24 Thread Yannick Duchêne
Finally unlike with Ubuntu, for Windows, there is not even a need to build
the library, it’s already available from https://sqlite.org/download.html




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suspected bug: parse error depending on platform

2017-12-22 Thread Yannick Duchêne
Ok, I see.

So finally using `sqlite3.sqlite_version`, it says version 3.8.11 for the
Windows version and 3.11.0 for the Ubuntu version.

May be I could solve it changing the sqlite3.dll which is in the isolated
Python environment. I will try to build it in the virtual machine.

Thanks for having pointed out my error …



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Suspected bug: parse error depending on platform

2017-12-22 Thread Yannick Duchêne
Hello,

I’em facing an issue developing a Python+SQlite3 application on Ubuntu, to
be shipped on Windows 7.

The simple example below works as expected on Ubuntu, but fails on Windows
7. Both using Python3.5 and Sqlite 2.6.0, so that’s not a version issue.

import sqlite3
conn = sqlite3.connect(":memory:")
conn.execute("create view v(c) as values(1);")

No error on Ubuntu but fails on Windows 7 with this message:

sqlite3.OperationalError: near "(": syntax error

If that matters for anything, I’m testing the application on Windows 7 32
bits running in Virtualbox, using the image provided by Microsoft. The
Python3.5 I use on Ubuntu is in a vitualenv and the one for Windows is from
a bundle archive provided by the Python community website. This archive
contains an isolated Python environment.

Someone else encountered a similar issue one year ago, in the context of a
NodeJS application, here: https://github.com/brianc/node-sql/issues/314

For the personal story, the real (above is a sample for testing the issue)
view table SQLite don’t want to create is to be used by a trigger running
some checks. So I have to drop it. It does not prevent the application from
running, but it will be less reliably without this trigger.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Kind of function out of common table expression

2017-12-20 Thread Yannick Duchêne
Indeed, a preprocessor may be an option, since all the queries are stored in
text files (which are loaded by an application or directly used with
copy/paste).

If that's better or not than a function added per‑connection, it’s a matter
of taste. I will have to decide.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Kind of function out of common table expression

2017-12-20 Thread Yannick Duchêne
Hi there,

I wonder is there is a way with SQLite3, to reuse a often used and
moderately long common table expression without the need to copy/paste it in
every query text it is used in.

I know there is a way to add a function to SQLite connexions using DB APIs,
like that of Python. Still, as much as I can, I prefer database definitions
to be independent of the environment in which it is used (ex. the function
would not be available when running the SQLite command line shell).

I tried to define a table with a trigger which I wanted to write the result
of the computation to a table, with the hope to read to write to that table
and just have to read it back, which could be a much shorter statement on
each use places.

Unfortunately, I was recalled (after I rediscovered) common table
expressions are not supported in triggers.

That’s not a big issue, if there is no way, I will go with a function added
for each connexion lifetime. That’s just if I can avoid it, it would be
cleaner to me.

Merry Christmas to all by the way :-p




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Unexpected echo when setting locking_mode

2017-11-18 Thread Yannick Duchêne
Hello there,

I believe I found a tiny bug, nothing bad, but may pollute some output.

Using SQLite version 3.20.1, if I do this:

PRAGMA locking_mode=exclusive;

It writes an unexpected reply:

exclusive

It do as if I was requesting the actual value, as in this:

PRAGMA locking_mode;

If I do this, for an example other pragma:

PRAGMA auto_vacuum=none;

It works as expected, there is no unexpected output.

-- 
Yannick Duchêne
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] json_group_array

2016-02-05 Thread Yannick Duchêne
On Fri, 5 Feb 2016 07:51:06 +
Hick Gunter  wrote:

> That is because count(a) and count(*) means two different things. The first 
> counts values, the second counts rows.

What if all columns of a row, are NULL?

-- 
Yannick Duch?ne


[sqlite] Efficient relational SELECT

2016-02-04 Thread Yannick Duchêne
On Thu, 4 Feb 2016 20:40:56 +
Simon Slavin  wrote:

> 
> On 4 Feb 2016, at 7:16pm, Luuk  wrote:
> 
> > Will this simple solution be too slow?:
> > 
> > SELECT id, room, date, time
> > FROM rooms
> > LEFT JOIN bookngs ON rooms.id=bookings.room
> > GROUP BY rooms.id, bookings.date
> > HAVING bookings.date=MAX(bookings.date) OR bookings.date IS NULL
> 
> That's the solution (a phrasing of it) that I thought of.  However, in my 
> head it's inefficient because of the test for MAX.  I was hoping for 
> something that looked better.  But it is plenty fast enough for my purposes 
> so any objection is purely for elegance and not for logic.
> 
> Thanks to you, John McKown and Dr Hipp.  I'll have a field day testing these 
> out tomorrow.
> 
> Simon.

So what about a trigger storing the last booking date/time in a dedicated table?

-- 
Yannick Duch?ne


[sqlite] a sqlite database error

2016-02-04 Thread Yannick Duchêne
Hello, and welcome, Frank,

You just forget to tell what the error was :-P .

By the way, what returned the error? Was this SQLite3 itself or a wrapper 
around it?


On Wed, 3 Feb 2016 14:22:40 +0800
"" <1072579296 at qq.com> wrote:

> Hello, my name is Frank. Two days ago, I operated sqlite database, and got an 
> error, I don't known what the error means, so I hope if you see the email, 
> please respond me. Thank you very much!
> 
> iPhone

-- 
Yannick Duch?ne


[sqlite] IS a SQLite db of small size as good as reliable cache?

2016-02-03 Thread Yannick Duchêne
On Wed, 3 Feb 2016 23:25:33 +0100
GB  wrote:

> So where have you got this wisdom from? It's just plain Bullshit!
> 
> Just as most cache managers do, Windows cache manager uses some sort of 
> LRU caching scheme. So all data once read from file is kept in memory 
> until either some memory pressure occurs or it is simply pushed out by 
> newer data. Actually, from what I've experienced, I'd recommend to size 
> sqlite's memory to fit for your largest transactions and leave most of 
> the caching to the OS.
> 
> And yes, unlike in many *NIX implementations, Windows' 
> FlushFileBuffers() call DOES issue according write commands to the 
> hardware, so setting PRAGMA synchronous ON/OFF makes a big difference in 
> write performance on Windows systems.
> 

I agree there are too much incorrect statements about Windows's behaviour, 
however, on a Linux system, I could check (just for testing it) `PRAGMA 
synchronous ON/OFF` makes a difference with at least small transactions, while 
it's less perceptible with big transactions (small and big is vague, that's 
just to keep the wording simple). I guess on Windows too, when the transactions 
are big, using synchronous mode or not, will make less difference than with 
small transactions, understandably.


-- 
Yannick Duch?ne


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-03 Thread Yannick Duchêne
On Wed, 03 Feb 2016 06:30:13 -0700
"Keith Medcalf"  wrote:

> 
> Is this on windows?  Any errors in the Eventlogs to the tune "Oooopsie -- 
> accidentally threw away your data instead of writing it to disk"?  Windows 
> does this quite commonly under some circumstances.  MicroSoft created the bug 
> in NT 4 and has been unable to locate or fix it since -- though there is no 
> problem producing an error message about 50% of the time it happens.

No, that's Ubuntu Trusty 32 bits on Ext4. For an anecdote, I personally never 
had file system issues with Windows, the only few cases I had was with Ubuntu 
(especially with Btrfs which I left for this reason).

> [?]
> > 
> > Coincidence. I just had a funny incident; may be it's related.
> > 
> > I just modified a program so that it create fours triggers in a database.
> > I ran the program, then got an error from APSW (the program uses Python)
> > complaining about an I/O or disk error. This frightened me a bit, but I
> > though this may be due to SQLiteBrowser opened on the same DB, which I
> > forget to close and I closed just after I started the program, which was
> > subject to the I/O issue. I open the DB and can see only two of the four
> > triggers, two missing. So I delete the DB and regenerate it three times,
> > without error messages, but still two triggers missing. That's only at the
> > fourth repetition the four triggers was all there.
> > 
> > The SMART data indicates zero reallocated sectors.
> > 
> > In the SMART utility, I noticed there is an hardware cache, which I
> > disabled, in case of and thinking about ?lying devices? I remembered from
> > this message.
> > 
> > This is frightening to me, as I got an error message the first time only,
> > but not the other times while something wrong seems to have happened too.
> > Also, I could notice something was missing in the DB (even when generated
> > without an error notification), because it was about something I was
> > precisely looking at, at that moment; if it had been about some rows
> > missing in any table, I would have not noticed it.
> > 
> > Hardware failure? OS failure? Software failure? Can't tell for sure??

-- 
Yannick Duch?ne


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-02 Thread Yannick Duchêne
On Thu, 28 Jan 2016 14:55:28 +
Simon Slavin  wrote:

> 
> On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
> 
> > ===
> > Like the user reading ?saving OK? and throwing away the
> > Post-It with the original information
> > ===
> > 
> > This is exactly my concern.
> > The user throwing away the Post-It is entirely reasonable if he sees a
> > message like that.
> > 
> > Do you happen to know if Linux/Debian (which I think uses a journalling
> > filesystem) carries this risk?
> 
> The problem is not at the software level.  Various operating systems and file 
> systems are correctly programmed with regard to waiting for write commands to 
> complete.  I don't know specifically about Debian but Linux has a good 
> reputation for such things, and anyone who bothers to write a journalling 
> file system would understand how to do things properly.
> 
> The problem is at the hardware level.  Standard disk drives (including their 
> motherboard if they have one, and their firmware) are designed for speed, not 
> integrity.  The assumption is that you will be using them to play games or 
> write your CV in Word, not to keep vital data.  So they are set up, using 
> their default jumper positions, to lie.  In order to keep their computer 
> running as fast as possible, instead of
> 
> 1) receive write command
> 2) perform write command
> 3) read that bit of disk to confirm the change
> 4) if not, bring SMART system into play and try writing it somewhere else
> 5) if succeed, tell the computer "I wrote that and it worked."
> 6) otherwise tell the computer "I wrote that and it failed."
> 
> they do this
> 
> 1) receive write command
> 2) tell the computer "I wrote that and it worked."
> 3) perform write command
> 4) read that bit of disk to confirm the change
> 5) if not, bring SMART system into play and try writing it somewhere else

Coincidence. I just had a funny incident; may be it's related.

I just modified a program so that it create fours triggers in a database. I ran 
the program, then got an error from APSW (the program uses Python) complaining 
about an I/O or disk error. This frightened me a bit, but I though this may be 
due to SQLiteBrowser opened on the same DB, which I forget to close and I 
closed just after I started the program, which was subject to the I/O issue. I 
open the DB and can see only two of the four triggers, two missing. So I delete 
the DB and regenerate it three times, without error messages, but still two 
triggers missing. That's only at the fourth repetition the four triggers was 
all there.

The SMART data indicates zero reallocated sectors.

In the SMART utility, I noticed there is an hardware cache, which I disabled, 
in case of and thinking about ?lying devices? I remembered from this message.

This is frightening to me, as I got an error message the first time only, but 
not the other times while something wrong seems to have happened too. Also, I 
could notice something was missing in the DB (even when generated without an 
error notification), because it was about something I was precisely looking at, 
at that moment; if it had been about some rows missing in any table, I would 
have not noticed it.

Hardware failure? OS failure? Software failure? Can't tell for sure??



-- 
Yannick Duch?ne


[sqlite] Is the first column of a composite primary key, special?

2016-02-02 Thread Yannick Duchêne
On Sun, 31 Jan 2016 18:15:48 -0500
"James K. Lowden"  wrote:

> 
> The relational model, as you well know, doesn't describe
> implementation.  It's math.  It says what relations are, and how
> they're manipulated.  One data type, and an algebra closed over that
> domain.  Math doesn't have pointers or duplication or corruption; those
> are computer concepts, and as such are entirely outside the model.  

I feel better to read this, as I had a doubt on the assertion. There is even 
another way to comment, bringing pointer interpretation into the model: an 
identity, which is fine in a relation model, is a value of some type, to which 
equality applies; a pointer is an identity, although an identity is not always 
a pointer, as a pointer is additionally a kind of array index which implicitly 
implies a relation to one. So ?pointing to? is a reasonable interpretation of 
some relations, when it reads from many to one (which may be for both ways, one 
way only, or not at all). An identity may still sometimes be interpreted as a 
pointer, with only the dereferencing method differing: either relation or array 
index.

That said, I tried to have some meditations on whether or not I should use ID 
in place of values or not, while testing it in practice.

Here is how I see it (to not say all)?? identities and general values, are not 
the same things, as while same identity implies same value, same value does not 
always implies same identity. The latter, to be assumed, needs to be asserted. 
There are also representations. Sometimes there is not really a value, just an 
identity which is the only thing offering sense/meaning, and what may be 
erroneously seen as a value is rather a representation. For the latter case, 
this is better to deal with the identity rather than with the representation 
interpreted as a ?value?, because the representation may vary, so it's cleaner 
to use the identity rather than the ?value?, to make senses. To come back to 
values, when the only thing which locally matters with a value is equality, it 
is valid to locally assert the value is the identity; this may be handy with 
composite values (save complexity folding a complex value) or often repeated 
values (*may* save storage). But this comes with an issue, at least one SQL 
construct prevents from referring to the value corresponding to an identity 
(when not just equality is needed): within indexes. Ex. table T has column "a" 
(an ID) and "b" (an associated value), table U has a column "c", where "c" 
references column "a" from T, then there is no (or I know no) way to create an 
index on U(c) ordered by the associated value "b" from T.


That's abstract, but the practical result is there: the request grouping on 
what was a second column, is now *exactly* as fast as the request grouping on 
what was the first column (there is no more technical differences between the 
two queries), and the DB size is two third of its size before this refactoring. 
All of this, using ID in place of *some* values???however at the cost of more 
verbose queries (more verbose, but not untraceable, this is just tedious).


-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-02-01 Thread Yannick Duchêne
On Thu, 28 Jan 2016 22:08:02 +0200
R Smith  wrote:

> I think you are misunderstanding the Pragma and the idea of automatic 
> indices. An automatic Index might be created on a table that doesn't 
> have an adequately assigned primary key. It might also be created during 
> a query (mostly SELECT or sub-SELECT) for which there is no useful Index 
> created by the table designer and the Query planner figures it will be 
> quicker to make an Index than to do table scans through the query. This 
> habit of creating indices during select queries can be forced to not 
> happen by setting the "PRAGMA automatic_index=0;", but this needs to 
> happen when you open the DB connection, or at a minimum, before you try 
> any query - not after the cursor is created,  by that time the index 
> might already be made. (This is why you are not seeing any speed 
> improvement).

Here is, from the documentation: https://www.sqlite.org/optoverview.html
At the very bottom, is said:
> Do not confuse automatic indexes with the internal indexes (having names like
> "sqlite_autoindex_table_N") that are sometimes created to implement a PRIMARY
> KEY constraint or UNIQUE constraint. The automatic indexes described here
> exist only for the duration of a single query, are never persisted to disk,
> and are only visible to a single database connection. Internal indexes are
> part of the implementation of PRIMARY KEY and UNIQUE constraints, are
> long-lasting and persisted to disk, and are visible to all database
> connections. The term "autoindex" appears in the names of internal indexes
> for legacy reasons and does not indicate that internal indexes and automatic
> indexes are related.

That's this legacy naming which causes confusion.


-- 
Yannick Duch?ne


[sqlite] Is the first column of a composite primary key, special?

2016-01-31 Thread Yannick Duchêne
On Sun, 31 Jan 2016 17:22:37 +
Simon Slavin  wrote:

> 
> Ignore all the above.  There are rare situations where they're useful but the 
> situation you're in is helped far more by using the phonebook analogy earlier 
> posters used than by trying to use the above.
> 
> Think about pure SQL, and about making one ideal index for each SELECT 
> command, and you'll get very good results.  Work out what you want the SELECT 
> (or UPDATE, etc.) to do.  Then work out the command.  Then work out the index 
> which would be ideal to help the command do its thing.  That's the best way 
> to get fast SQL.
> 
> Optimizing for SQLite peculiarities (some of which no longer apply because 
> inner workings of SQLite have changed since the article was written) is 
> useful only very rarely.
> 

I agree. I'm, drifting to much far from one of my concern to stick to just SQL 
(and standard SQL).

In the meantime, I was looking at what SQLite do with the queries, to see if 
it's intuitive, if it matches enough what one would devise without a DB engine. 
I mean I sometime think about SQL as a data structure modelling language, and I 
may try to re?implement in a classic procedural language, for an experiment. 
I'm also aware this point of view (SQL as a data structure modelling language) 
is less meaningful with complex queries, as with these, a DB engine adds values 
outweighing the hopes one may get from a procedural implementation, ? the same 
if the set of queries is not fixed.

That said, thanks for the recall.

-- 
Yannick Duch?ne


[sqlite] Is the first column of a composite primary key, special?

2016-01-31 Thread Yannick Duchêne
On Sun, 31 Jan 2016 10:45:59 -0700
"Keith Medcalf"  wrote:

> create table PhoneDirectory
> (
>  surname_id integer not null references Surnames,
>  given_id integer not null references GivenNames,
>  address_id integer not null references Addresses,
>  PhoneNumber text collate nocase primary key
> );
> create index PD_Surnameid on PhoneDirectory (surname_id);
> create index PD_Givenid on PhoneDirectory (given_id);
> create index PD_addressid on PhoneDirectory (address_id);
> 
> create view v_PhoneDirectory
> as
>   select Surname, Givenname, Street_No, Suffix, Streetname, PhoneNumber
> from PhoneDirectory
> natural join Addresses
> natural join Streets
> natural join GivenNames
> natural join Surnames;

Now I understand why `NATURAL JOIN` is qualified so, and why it has an implicit 
`USING` clause. That's to help with this.

> This database would be in BCNF normal form.  (Although the streetno and 
> suffix ought to be moved out to a separate table(s) if you need 4th or 5th 
> normal).  The model gets very much more complicated if you also have to 
> handle municipal names, city names, etc.

The normalized forms highest of level, are to be weighted, anyway.


-- 
Yannick Duch?ne


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-31 Thread Yannick Duchêne
On Sun, 31 Jan 2016 09:42:28 -0700
"Keith Medcalf"  wrote:

> 
> And I thought the "Object Oriented" jihad blew up when it was discovered to 
> be counter to productivity and performance in the 1990's and that it did not 
> provide a single one of the "advantages" claimed by its mujahedeen warriors.
> 
> Of course, it could be that there is not really very much of anything at all 
> (if there is anything at all) that implements an "object oriented" 
> architecture.  It is mostly just glossing appearances and wrappers around 
> inherently non-object oriented things.  But then again, that just goes to 
> show that OO is inherently flawed.
> 
> OO is a dead horse.  It is about time it was finally shot in the head and put 
> out of its misery.

I'm off?topic, but this is going a bit too far in the opposite direction.

I won't be so much affirmative about OO being entirely negative; there are 
cases where it matches well. These cases (as I experienced them) are when there 
are 1) not that much objects (I mean 100, as an example, not 10 000 or 1000 
000) 2) there each have a strong personality, that is, there differs enough 
from each other instance, while have enough in commons (both). This is 
typically the case with user interfaces (although state?machine are nice too in 
this area), as the human being is a complex thing :-D with many peculiarities, 
which are well matched by OO with it's deep inheritance and attributes 
everywhere (prototype based OO is nice too for this purpose).

A fact is that a user interface as in this example, works at a tempo which is 
the same as that of a human: quarter of second or something around; it normally 
consumes not that much resources (objects are not responsible for heavy 
graphics). It does not do computation, it behaves. That's the word I believe: 
OO is good at expressing peculiar behaviours, more than at expressing 
computation (if one does not confuse records and OO's objects).

The overall criticism I would still have against OO, is that in its typical 
implementations, it confuses interface inheritance and implementation 
inheritance, which in my opinion, should be separate (some OO languages have 
private inheritance at least, ? unfortunately, most famous languages don't). An 
other issue (however cleaner solved by Eiffel, unlike others did), is name 
conflicts in inheritance. The implementations and the model, are still two 
different things.


-- 
Yannick Duch?ne


[sqlite] Is the first column of a composite primary key, special?

2016-01-31 Thread Yannick Duchêne
On Sun, 31 Jan 2016 16:14:45 +0200
R Smith  wrote:

I'm replying to Igor too with this message, as both of you had a similar answer.

> 
> First understand what an Index is and how it works.
> 
> Imagine you are asked to find all the people whose surnames are 
> "Duch?ne" from the telephone directory. You would be able to do this 
> quite fast, because the phonebook is indexed first by Surname, then 
> name, then address. Perhaps a Telephone directory schema might look like 
> this:
>CREATE TABLE PhoneBook (Surname TEXT, Name TEXT, Address TEXT, 
> PhoneNo TEXT, PRIMARY KEY (Surname, Name, Address) );
> 
> Your query might look like this:
>SELECT * FROM PhoneBook WHERE Surname='Duch?ne';
> 
> Imagine now that you are asked to find all people named "Yannick" in the 
> phone directory, like so:
>SELECT * FROM PhoneBook WHERE Name='Yannick';
> 
> Immediately that would go very slow because you have to look at each 
> Surname and see if there are any Yannicks in there, and the same problem 
> arise if you are asked to look for a specific address.


That makes sense, I agree, and I had this in mind for a short time, until it 
was shadowed by a bad bet I made for three reasons. If it's worth the words, 
let me tell:

I saw a page (can't retrieve the URL) suggesting to order table columns by 
names. It was strange to me, as I had the idea of a hierarchical access for 
tables access. But I though ?there must be a good reason for them to say this?. 
Then in an SQLite page [1], there was a suggestion to avoid index containing 
the same data as a wider index. So after these two things, I tried to imagine 
ways of setting up an index so that this makes sense: I though a multi?column 
key could be accessed by any column, using fragments whose content are ordered.

Precisely with the case of your example, I though the "name" column would be 
partitioned into individually sorted parts. While it was also contradicted by 
the fact adding a index on a single column of a multi?column primary key, could 
help grouping (although later again, there was another surprise contradicting 
this too).

[1]: https://www.sqlite.org/queryplanner.html
Which says
> Note that Idx3 contains all the same information as the original Idx1. And so 
> if we have Idx3, we do not really need Idx1 any more.
While reading it again, I overlooked what was next:
> your database schema should never contain two indices where one index is a 
> prefix of the other.
My bad.


> You will have a bright idea right the first time you are asked to do 
> this - you will make a list of names in alphabetical order followed by 
> surnames and keep it separate, so if ever you are asked again to find 
> someone by name, you can reference this second list to quickly see the 
> name and surname, and perhaps use that info to find them in the 
> PhoneBook and get the rest of the info. This second list is what is 
> called an Index - but it is not the PRIMARY index.
> 
> If you wish for all those searched to go fast, you need 3 Indices, not 
> simply a 3-sectiion primary Index.
> 
> Perhaps this SCHEMA would better suit your needs:
> 
>CREATE TABLE t (a TEXT, b TEXT, c INTEGER, PRIMARY KEY (a, b, c))
>CREATE INDEX t_1 ON t (b);
>CREATE INDEX t_1 ON t (c);
> 
> Be careful though, every next Index will require a full datalist plus some 
> overhead worth of space, and will make INSERTs slower because it has to 
> insert more times and re-organize the B-Tree of every index a bit.

I guessed, and that's why I have no swapped index (although the a,b is a 
foreign key to another table with only a,b, which helps for some queries, but 
there is no table for b,a) and prefer an attempt to tweak the query on the 
table as?is (for now, as I will have a third refactoring).

> Best is to decide which searches you will do, make all Indices you think will 
> be needed, then try the queries (using explain query plan), see which Indices 
> are used and that the speed is good, then remove those who are not used.

That's what I did, and it shown me on a query with a single `GROUP BY`, an 
index on "b" helped and was indeed used. Then later, it shown a variant I had 
the idea to test, with two nested `GROUP BY`, is running faster (an efficiency 
not far from that of the same query on the first column) while not using this 
column index at all (which I finally removed). That's how I ended with these 
tests and the question.



May be that's the opportunity for another question I have: given a foreign key 
(a,b) where "a" and "b" are more than a few bytes (not small) or are of 
variable size (still hopefully limited), are the values for "a" and "b" 
duplicated or do the foreign key creates a kind of references? (may be with an 
hash or even a short ID for the bigger value) If it's duplicated, then I will 
use integer keys instead. A bit long ago, I questioned the habit of 
mechanically using integers PK (and also FK), feeling using the literal values 
is more readable and s

[sqlite] Is the first column of a composite primary key, special?

2016-01-31 Thread Yannick Duchêne
Hi all,

Another mystery to me. Given this test table:

CREATE TABLE t (a TEXT, b TEXT, c INTEGER, PRIMARY KEY (a, b, c))

??this query:

SELECT Sum(c) FROM t GROUP BY a

??executes faster than any of these two:

SELECT Sum(c) FROM t GROUP BY b
SELECT Sum(c) FROM t GROUP BY c

? which executes in about the same time together, proportionally to the number 
of returned rows. With `GROUP BY a`, execution times seems to be about half 
than with the two formers. Adding or not adding a `WITHOUT ROWID` gives the 
same. I give the number of rows, to show if the first one is faster than the 
second one, that's not because it would returns less rows, on the opposite, it 
returns a bit more then with grouping by `b`:

 * Grouping by `a` results into 1360 rows in about 40ms +/-3;
 * Grouping by `b` results into 1170 rows in about 65ms +/-5;
 * Grouping by `c` results into 3154 rows in about 90ms +/-4.

If the primary key declaration is removed, timing when grouping by `b` or `c` 
does not change, while timing when grouping by `a` become the same as with the 
two formers.

I feel to witness this with both SQlite3 CLI and SQLiteBrowser (a detail I must 
mention after another thread).

Is there any thing special with the first column of a composite primary key? 
From an implementation point of view, this may makes sense, but I still prefer 
to ask.

I first noticed this another way. This test was just to check on a simpler case.

Initially, I indirectly noticed this with something similar to this:

SELECT b, Sum(c) AS c 
  FROM
   (SELECT b, Sum(c) AS c 
FROM t 
GROUP BY a, b) 
  GROUP BY b -- 60 ms on average

? being faster than this second simpler alternative, something I notice with 
the test table too, just that the difference is less:

SELECT b, Sum(c) AS c
  FROM t 
  GROUP BY b -- 65 to 70 ms on average

? although the first one seems to run more operations, and it's still the same 
if I add an index on `b` for the second alternative and thus it does not use a 
temporary B?tree for grouping.


I also noticed some other cases where queries executes faster on the first 
column of a composite key (with or without indexes), but I won't expose all 
cases, as I'm already too lengthy.


Enough testing for now, I will resume the investigations on this unexpected 
results, later.


-- 
Yannick Duch?ne


[sqlite] Find out how many times does SQLite hit the disk?

2016-01-31 Thread Yannick Duchêne
On Sat, 30 Jan 2016 20:36:55 -0800
J Decker  wrote:

> On Sat, Jan 30, 2016 at 8:09 PM, J Decker  wrote:
> > could use a tool like ProcMon and filter to disk activity on a
> > specified file to see...
> > https://technet.microsoft.com/en-us/sysinternals/processmonitor.aspx?f=255&MSPPError=-2147217396
> 
> Might not actually be useful though; if the file is memory mapped (WAL
> Journal?) then you won't see those accesses...
> 
> then you'd end up having to hook into VFS stuff...

If it's memory mapped, it's less an efficiency issue, which was the original 
concern if I remember well. If he wants to know about physical accesses, this 
may be what he needs.

-- 
Yannick Duch?ne


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-31 Thread Yannick Duchêne
On Sat, 30 Jan 2016 23:03:29 +
Simon Slavin  wrote:

> 
> On 30 Jan 2016, at 8:13pm, Yannick Duch?ne  
> wrote:
> 
> > In my opinion (which some others share), OO is a bag of miscellaneous 
> > things which are better tools and better understood when accosted 
> > individually. Just trying to define what OO is, shows it: is this about 
> > late binding? (if it is, then there sub?program references, first?class 
> > functions, or even static polymorphism and signature overloading) About 
> > encapsulation? (if it is, then there is already modularity and scopes) 
> > About grouping logically related entities? (if it is, there is already 
> > modularity, and sometime physically grouping is a bad physical design).
> 
> There are a number of problems in using a relational database for 
> object-oriented purposes.  One is that to provide access to stored objects 
> you need to access the database in very inefficient ways which are slow and 
> are not helped by caching. [?]

This matches what I had in mind with the short sentence ?And especially about 
OO and DB, I know there are complaints OO hides too much the data model of 
DBs?. I heard about an example with a video game engineer explaining OO 
typically impends efficiency for their use cases, in a way which can be related 
to a DB and its tables. Say an object has an attribute `x` among others. Say a 
big set of objects of this type is often iterated, but that's not really the 
objects which are iterated, that's their `x` attribute alone and no other 
attributes are accessed during this frequent long operation. Using OO, it is 
suggested `x` should be stored in the object instances (typically a kind of 
C?struct), so there is a large set of unused data coming with each `x` in an 
iteration, which prevents efficiency (poor caching, more memory access). In 
such a case, the `x` should not be stored in the object, rather in a dedicated 
list holding only the `x` attributes. That's about the same as with a DB when 
some columns are moved from a table to a specific table, while maintaining a 
relation between both.

Where `x` should belong, that's not a conceptual picture about the attributes 
alone which says it, that's the typical operations referring to the attribute, 
which do, and that's as more important as the number of object instance grows. 
That's for a similar reason a DB may need to be refactored when new queries 
come, while from an OO point of view, this would just suggest to add a new 
method to an object, not to move `x` from the object to something else 
(something OO will never suggest).

That's what make me agree a relational DB is not well suited to store objects 
(the models differ too much), unless the model of that object is broken down 
after the requirement of its individual operations, or the object instances are 
just to be stored as monolithic blobs.

There are cases where the OO view matches well, and there are cases where it 
does not. There is no need to force one view to be the other.

-- 
Yannick Duch?ne


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-31 Thread Yannick Duchêne
On Sat, 30 Jan 2016 20:50:17 -0500
Jim Callahan  wrote:

> I am not interested in a complete ORM; what I am interested is when the
> object-oriented language supports a SQL-R-like object. In R, the object is
> called a data.frame and the package "Pandas" supplies a similar data frame
> object to Python.
> https://pypi.python.org/pypi/pandas/0.10.0/

The page says:
> Additionally, it has the broader goal of becoming the most powerful
> and flexible open source data analysis / manipulation tool available
> in any language.

There have been so much announcements of the like in many areas?? (open?source 
or not)

The page mentions HDF5, and Pandas descriptions seems similar to that of HDF5 
in the intents. If Pandas exists, this may be that its authors believe the HDF5 
library is not good enough. But HDF5 (which is itself subject to criticism) 
made the same promises as Pandas seems to do. Just to say this may be seeing 
something as universal, while it is not for every one or every use, so the 
urgency may not be that hight.

Is the concern mainly about hierarchical data? (SQL has a reputation for not be 
well suited to that) About heterogeneous data? (i.e. not easily matching a type 
or a pattern)

> I am not interested in a complete ORM; what I am interested is when the
> object-oriented language supports [?]

What is object oriented in this context? What properties of the object model 
raise hight in the picture when you think about in typical use cases?

> R as I have mentioned has fantastic interfaces to SQL databases that allow
> one to pass a query and have the result populate a data frame. The data
> frame in R or Python can be fed to a machine learning algorithm (scikit
> learn on Python) or to a plotting package such as ggplot or bokeh.
> https://pypi.python.org/pypi/ggplot/0.6.8
> http://bokeh.pydata.org/en/latest/
> http://scikit-learn.org/stable/
> 
> What I want to do is to demonstrate short scripts in R and Python to
> accomplish the same task.  I don't want the Python scripts to be longer and
> more complicated because Python has an lower level interface to SQLite. [?]

Why should this be SQLite's responsibility? What prevents a Python (or R) 
library to implement the desired interface? (providing a used library does not 
count in a script's length). Is this with the hope to get greater efficiency? 
(i.e. timing and consumed resources)

With my apologizes for the naive questions??

-- 
Yannick Duch?ne


[sqlite] Wish List for 2016: High Level API for Object Oriented Interactive Languages

2016-01-30 Thread Yannick Duchêne
On Sat, 30 Jan 2016 14:56:15 -0500
"James K. Lowden"  wrote:

> On Thu, 28 Jan 2016 16:47:40 -0500
> Jim Callahan  wrote:
> 
> > I am hopeful this new JDBC based interface will provide as
> > satisfactory high level channel between SQLite3 and Python.
> 
> As someone who's written a couple of OO DBMS libraries and uses the
> Python SQLIte module, I wonder what you're hoping to see.  What example
> do you have in mind for OO functionality not already there?
> 

I have not read the thread, and just react on this: I agree, there is not that 
much to hope from OO?? in the general case. In my opinion (which some others 
share), OO is a bag of miscellaneous things which are better tools and better 
understood when accosted individually. Just trying to define what OO is, shows 
it: is this about late binding? (if it is, then there sub?program references, 
first?class functions, or even static polymorphism and signature overloading) 
About encapsulation? (if it is, then there is already modularity and scopes) 
About grouping logically related entities? (if it is, there is already 
modularity, and sometime physically grouping is a bad physical design). This 
may be compared to AJAX, which was nothing new, simply a brand on a set of 
things known since long, and understanding AJAX is not possible, there nothing 
to understand about it, while there are things to understand about what it 
branded.

And especially about OO and DB, I know there are complains OO hides too much 
the data model of DBs (I feel to see).


-- 
Yannick Duch?ne


[sqlite] `explain query plan`: does it have any side effects?

2016-01-30 Thread Yannick Duchêne
On Sat, 30 Jan 2016 19:07:21 +
Simon Slavin  wrote:

> 
> On 30 Jan 2016, at 6:56pm, Yannick Duch?ne  
> wrote:
> > 
> > That's strange, or may be it's SQLiteBrowser specific (I'm using it to test 
> > queries and get timings).
> 
> Oh, you're not using your own code.
> 
> Download the SQLite command line tool and try the same queries.  The shell 
> tool does no caching or optimisation of its own, it just calls the SQLite API 
> directly.

Ouch, using `.timer ON` (after a quick web search), I get 90ms instead of 200 
or 120, and indeed, there is no more difference whether or not I did an 
`EXPLAIN` before.

Thanks.

I was not suspecting there were big things in the background with 
SQLiteBrowser. I will always use the command?line version for timings, since 
now.

-- 
Yannick Duch?ne


[sqlite] `explain query plan`: does it have any side effects?

2016-01-30 Thread Yannick Duchêne
On Sat, 30 Jan 2016 19:03:12 +0100
Yannick Duch?ne  wrote:

> > On 30 Jan 2016, at 4:55pm, Yannick Duch?ne  
> > wrote:
> > 
> > > Tweaking a query, I notice a query executed after I did an `explain query 
> > > plan `, executes faster than before it occured.
> > 
> > Computer reads data from disk when you do the "EXPLAIN QUERY PLAN".  Data 
> > still in cache when you run the real SELECT.
> > 
> > To see something similar run the same SELECT twice.
> 
> I forget to tell: I though about it, and tried this. Running the query 
> multiple times, does not change the timing that much (just minus 15 to 20ms).

That's strange, or may be it's SQLiteBrowser specific (I'm using it to test 
queries and get timings).

If I copy the query of the view (the query is for a view), and execute it 
as?is, I get the same 200ms, although I do this after `explain query plan 
select * from ` and `select * from ` runs in about 120ms. Then, if 
I do the same with this copy, that is, on time add `explain query plan` before 
the copied query, the same happens, later runs are in 120ms instead of 200.

That's not query caching, as both queries are the same.

-- 
Yannick Duch?ne


[sqlite] `explain query plan`: does it have any side effects?

2016-01-30 Thread Yannick Duchêne
On Sat, 30 Jan 2016 17:41:49 +
Simon Slavin  wrote:

> 
> On 30 Jan 2016, at 4:55pm, Yannick Duch?ne  
> wrote:
> 
> > Tweaking a query, I notice a query executed after I did an `explain query 
> > plan `, executes faster than before it occured.
> 
> Computer reads data from disk when you do the "EXPLAIN QUERY PLAN".  Data 
> still in cache when you run the real SELECT.
> 
> To see something similar run the same SELECT twice.

I forget to tell: I though about it, and tried this. Running the query multiple 
times, does not change the timing that much (just minus 15 to 20ms).

> > I could go from 300ms to 200ms (still too much to me) using an appropriate 
> > index
> 
> You may be able to improve it further using a better index, better COLLATEs, 
> or a better schema.  We're happy to help you.

I don't want to abuse :-P . Half?joking, if I'm stuck, I may.

-- 
Yannick Duch?ne


[sqlite] Best way to store only date

2016-01-30 Thread Yannick Duchêne
On Sat, 30 Jan 2016 14:22:06 +0100
"E.Pasma"  wrote:

> With respect to Igor's suggestion, mmdd (as integer), why not leave out
> the century? I prefer the oldfashoned yymmdd.

And what about a planned Y3K bug? -:D (teasing)

> The diagram got broken in my email and here is another try:
> 
>  Needs to be light | Needs to be| Needs to do  |
>  (small footprint) | Human-Readable | calculations |
>  - | ---|  |
>  YES   | YES| NO   | Integer as
>||  | Igor's suggestion
>||  |
>  YES   | NO | YES  | Float/Int
>||  | Julianday
>||  |
>  NO| YES| YES  | Datetime/Numeric
>||  | ISO Standard
> 

Why ?NO? for ?Needs to do calculations? with Igor's suggestion? Well, depends 
on what is doing the calculation, may be.

-- 
Yannick Duch?ne


[sqlite] `explain query plan`: does it have any side effects?

2016-01-30 Thread Yannick Duchêne
Funny real thing or illusion?

Tweaking a query, I notice a query executed after I did an `explain query plan 
`, executes faster than before it occured.

I checked multiple times in case it could be due to something else in the 
environment or due to since how long the SQLite connexion is opened, I can just 
see it's really always after an `explain query plan` it executes faster: 120ms 
compared to 200ms (averages).

I could go from 300ms to 200ms (still too much to me) using an appropriate 
index, and am inquisitive to know what is this side effect which I seem to 
witness, in order to make use of it explicitely.

Or is this an illusion due to something else? At least, I really feel to see a 
unexplainable correlation??

-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-01-29 Thread Yannick Duchêne
On Thu, 28 Jan 2016 19:18:06 -0500
"Keith Medcalf"  wrote:

> 
> When you create a table thusly:
> 
> create table x (x primary key, y, z);
> 
> you are creating a rowid table with columns x, y, an z.  You are also saying 
> that you want x to be the primary key.  Therefore, you will get a table 
> called x containing (ROWID, X, Y, Z).  This is a btree where the index is the 
> rowid.  In order to implement you PRIMARY KEY, an index must be constructed 
> containing the column X and ROWID in the table where the value of X is found. 
>  This index will be called sqlite_autoindex_x_1 because it is the first index 
> created automatically on table x.

Close to what I was suspecting. Thanks for these points.

> You could achieve the same result as follows:
> 
> create table x (x, y, z);
> create unique index pk_x_x on x(x);
> 
> except that now you have given the index on x an explicit name and one does 
> not have to be manufactured for you.
> 

I checked these indexes gets created even if I manually define similar indexes. 
But I don't mind. After some experiments, this seems more related to ROWID, and 
you seem to confirm a relation, above.


-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-01-29 Thread Yannick Duchêne
On Fri, 29 Jan 2016 01:34:01 +0200
R Smith  wrote:

> 
> I think the dictionary would be faster for this use-case (mostly cause 
> it runs in-memory and we do not require all the DB data guarantees for 
> the initial sort). However, the fastest way in SQL would be to use a 
> temporary table, do you have any problems with that? It would mean you 
> would do the inserts in an insert loop and then simply calculate the 
> incidence values and insert to the real table - so it would require a 2 
> - step process.
> 
> In fact, let me write the SQL quick here. The table create is just like 
> before, and then:
> 
> BEGIN TRANSACTION;
> 
>-- Make Temporary table
> CREATE TEMPORARY TABLE eav (
>e TEXT COLLATE NOCASE,
>a TEXT COLLATE NOCASE,
>v TEXT COLLATE NOCASE
> );
> 
>-- The insert loop:
> INSERT INTO eav VALUES (:element, :attribute, :value);
>-- end of Insert loop
> 
> 
> CREATE INDEX Idx_eav ON eav(e,a,v);
> 
> INSERT OR IGNORE INTO element_attribute_values (element,attribute,value,cnt)
>SELECT e,a,v,COUNT(*)
>  FROM EAV
> WHERE 1
> GROUP BY e,a,v;
> 
>-- Cleanup
> DROP TABLE eav;
> 
> COMMIT;
> 
> 
> If that is not a LOT faster then I will be very surprised...

I tried this, using a single transaction for the whole and I measured 15 
seconds instead of 22. Given that the program takes 6 seconds without 
insertions, that's good.

I finally use another way: I use a Python counter dictionary (the `Counter` 
class in the `collections` module) to compute the counts on a per file basis, 
then there is an insert?or?ignore?then?update request for each triplet, which 
add to "cnt" instead of incrementing it by just one. In fewer words, there is 
only one insert/update per triplet for a given file. Instead of 103K requests 
to give 15K rows, there is now 23K requests to give the same 15K rows. Using 
this with a transaction per file (as the program always did), I measured 17 
seconds. So I devised to group transactions transparently, a transaction is 
really committed only each N times (it takes care of any reminder), and using 
25 for N, I measure 11 seconds. I believe that's good enough (I guess it may be 
good to avoid big transactions).


-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-01-28 Thread Yannick Duchêne
On Thu, 28 Jan 2016 22:08:02 +0200
R Smith  wrote:

> I think you are misunderstanding the Pragma and the idea of automatic 
> indices. An automatic Index might be created on a table that doesn't 
> have an adequately assigned primary key. It might also be created during 
> a query (mostly SELECT or sub-SELECT) for which there is no useful Index 
> created by the table designer and the Query planner figures it will be 
> quicker to make an Index than to do table scans through the query. This 
> habit of creating indices during select queries can be forced to not 
> happen by setting the "PRAGMA automatic_index=0;", but this needs to 
> happen when you open the DB connection, or at a minimum, before you try 
> any query - not after the cursor is created,  by that time the index 
> might already be made. (This is why you are not seeing any speed 
> improvement).
> 
> I only mention all the above so you understand what the automatic 
> indexing is about, but it has almost certainly nothing to do with your 
> query slowness, and even if you switch it off at a more opportune time, 
> I would be surprised if it changes the query speed.

Indeed, I was misunderstanding. Are these indexes visible? Is there any 
relation with the ones I see from sqlitebrowser?

> 103k insertions to produce 15k rows... that is ~15% efficiency - the 
> opposite of good design. Perhaps we can help you find better SQL to 
> solve your problem. Let's see...

(red-face)

> 
> I do not know the shape of your data (it matters), but I'm guessing 
> "element" represents standard HTML tags with "attribute" and "value" 
> giving basic expansion of the attributes list. it'd probably be safer to 
> use non case-sensitive values and use standard equation tests in Selects.
> 

The shape will change, it's far from final. The tables at that step depends on 
the next steps in the overall procedure, which is not entirely fixed for now.

> Could you try these in your Query loop perhaps:
> 
> 
> -- Table: Renamed field count to cnt because "count" is an SQL reserved 
> word (though it will work, just a better habit)
> CREATE TABLE element_attribute_values (
>element TEXT COLLATE NOCASE NOT NULL,
>attribute TEXT COLLATE NOCASE NOT NULL,
>value TEXT COLLATE NOCASE NOT NULL CHECK ((attribute <> '') OR (value 
> = '')),
>cnt INTEGER NOT NULL DEFAULT 0,
>PRIMARY KEY (element, attribute, value)
> );
> 
> -- Insert Loop start:
> -- This will simply fail if the PK already exists, else start the line 
> with 0 count.
> INSERT OR IGNORE INTO element_attribute_values VALUES (:element, 
> :attribute, :value, 0);
> 
> -- This might be faster since it uses only one lookup loop, but it might 
> also not be.
> -- If you share a list of example data to be inserted, we can find a 
> faster way. Try it and let us know...
> WITH EAV(id,icnt) AS (
>SELECT rowid,cnt+1
>  FROM element_attribute_values
> WHERE (element = :element) AND (attribute = :attribute) AND (value = 
> :value)
> LIMIT 1
> )
> UPDATE element_attribute_values SET cnt = (SELECT icnt FROM EAV)
>   WHERE rowid = (SELECT id FROM EAV)
> ;
> -- Insert Loop end.
> 

The default may even be omitted, and may be the `LIMIT 1` too, as each triplet 
is unique.

I tried it, with the variant (no `LIMIT 1`) and it's about the same, even a bit 
more slow (not much, between one and two seconds more long). While speed is not 
the only concern, or perhaps I should not care that much about the DB file size 
(the other matters).

It's close to what I had at the beginning, which gave similar timings (just 
changed to match your recommendation about "cnt"'s name):

INSERT OR IGNORE INTO element_attribute_values
  VALUES (:element, :attribute, :value, 0);
UPDATE element_attribute_values 
  SET cnt = cnt + 1
  WHERE (element = :element)
AND (attribute = :attribute)
AND (value = :value);

> -- If you share a list of example data to be inserted, we can find a 
> faster way. Try it and let us know...

I don't mind, I can upload an archive somewhere. I guess you mean table's 
content?

I'm aware this use case may be a bit pathological, as I could use Python's 
dictionary. However, I decided to not to, for three reasons:

 * I wanted to see what it's like to use an SQLite DB as an application data 
container (file or memory)?;
 * Using a persistent DB is better for incremental process (may stop and resume 
later)?;
 * Persistent data is welcome for human review (I think about defining views in 
sqlitebrowser to dissect the results)?;

For persistence, I first tried CSV files, but this shows to be a inadequate. An 
SQL DB and a DB browser, looks better than CSV for this use?case.

-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-01-28 Thread Yannick Duchêne
On Thu, 28 Jan 2016 19:59:02 +0100
Dominique Devienne  wrote:

> Not sure to understand what you expect this pragma to do, but inserts
> typically don't involve automatic indexes, which are used only in queries
> (selects).

I though it was responsible for the `sqlite_autoindex_"table_name"_N` which 
gets generated and augments the DB size. That's the name and the documentation 
which makes me believe this. I can see these index in sqlitebrowser and at the 
file size.

> Your insert does have a query, but it fits the PK, so no automatic index is
> needed. (and you can likely use AUTOINCREMENT to avoid that query in the
> first place).

"count" is not a primary key, and AUTOINCREMENT may be used only on primary key.

It indeed does not need a primary key, a unique constraint is enough. Whether 
or not I define a PK or a unique constraint, it ends into the same speed and DB 
size. I just wanted to test if it differs when a PK is defined (it does not).


> The 2.4x size difference is kinda expected, since your PK includes your
> biggest column I suspect (i.e. value), so both the table, and its PK index
> most store it, doubling at least the memory, and with page waste, clearly
> can account for the difference.
> 
> And the time difference can be in-part at least attributed to the increased
> IO.

That was my conclusion too, and that's why I would like to prevent the creation 
of this extraneous index. I can achieve it using `WITHOUT ROWID`, and not with 
the pragma, and that's what I don't understand. So, I can achieve it, just that 
it's in a way I don't understand, which give me the feeling I'm using a trick 
and I don't like it. Unless ROWID is indeed what's responsible for the creation 
of these extraneous indexes??

> It seems unusual to use the HTML text in the PK. Are you sure you need it?

It's short texts (one to ten characters for an average), just element name, 
attribute name and attribute value, unique triplets with a count of their 
occurences. There are only a few attribute values which may be a bit long, the 
href of  elements. I must say I simplified my explanations, it's not really 
the index, it's a prior step which makes some statistics which are later used 
to decide what elements and attributes will be focused on to index the 
contents. The text the final index stores is reasonably short title-like texts.


-- 
Yannick Duch?ne


[sqlite] Is there an index of conformances and deviations against standard SQL?

2016-01-28 Thread Yannick Duchêne
On Wed, 27 Jan 2016 20:10:01 +
Simon Slavin  wrote:

> There are later standards for SQL and SQLite and other SQL implementations 
> include some of the features in them.  But at this point different 
> implementations start to diverge from one-another far more.

With this, you fully answered the question :-)

> Given what you wrote you should definitely read SQL1992.  It is the core 
> standard for SQL and you will find some ways in which SQLite is not SQL1992.  
> However most SQL implementations implement most things in SQL1992.

Given what you wrote, I definitely should, indeed (will just have to print it, 
as it is a text file).

By the way, I learned about another divergence: with ROWID, SQLite allows NULL 
in PRIMARY KEY columns.

I wonder if this is the same with SQLite4 (I don't know it, as Python does not 
support it)


-- 
Yannick Duch?ne


[sqlite] Pragma automatic_index and ROWID: which one is really responsible for automatic indexes?

2016-01-28 Thread Yannick Duchêne
Hi people,

I though automatic index was controlled by the pragma `automatic_index`, which 
seems to do nothing for me, unless I'm doing it the wrong way, while `WITHOUT 
ROWID` on table creations, seems to effectively prevent automatic indexes, the 
latter with or without the `PRAGMA automatic_index=0`.

Is this expected or is this me doing something wrong?

I'm concerned about it mainly for file size, more than for speed, while it 
matters too, as the program is far more slow than expected. For a program I'm 
running I got this:

  * With ROWID: execution time is 22 seconds and DB file size is 1.9MB?;
  * Without ROWID, execution time is 19 seconds and DB file size is 788KB (2.4 
times smaller)?;
  * With insertion disabled, execution time is 6 seconds, so most of the 
execution time is spent in rows insertion.

The program indexes HTML files (*), the test runs on only 52 average sized HTML 
files, so the difference will be even more important with bigger files set.

I'm feeling lost about this: the pragma seems to do nothing and it's `WITHOUT 
ROWID` on table creations, which seems to do it instead. The pragma is executed 
right after the cursor creation. I'm using APSW (**). Pragma seems to works, as 
I tested with `PRAGMA synchronous=0` (just for a test, I really don't want to 
use this).


I don't know if it's OK to ask for this kind of question here. If it's not, 
just tell me, I won't bother.


(*) There are bout 103 000 insertions queries ending into 15 600 rows, using 
this:

CREATE TABLE element_attribute_values
  (element TEXT NOT NULL,
   attribute TEXT NOT NULL,
   value TEXT NOT NULL
 CHECK ((attribute IS NOT '') OR (value IS '')),
   count INTEGER NOT NULL DEFAULT 1,
   PRIMARY KEY (element, attribute, value))

INSERT OR REPLACE INTO element_attribute_values
  (element, attribute, value, count)
  VALUES (:element, :attribute, :value,
(SELECT count + 1 FROM element_attribute_values
  WHERE (element IS :element)
AND (attribute IS :attribute)
AND (value IS :value)))

(**) Standard Python SQLite3 binding seems perceptibly slow. With APSW, the 
program is 17 times faster than with standard Python SQLite3 
binding???execution time is around 370 seconds in the latter case, whatever how 
I manage transactions.


-- 
Yannick Duch?ne


[sqlite] `UNIQUE` constraint allows multiple NULL: is this expected?

2016-01-27 Thread Yannick Duchêne
On Wed, 27 Jan 2016 14:38:41 -0500
Richard Hipp  wrote:

> 
> See https://www.sqlite.org/nulls.html
> 


The most important part to me, is this one:
> The fact that NULLs are distinct for UNIQUE columns but are indistinct for 
> SELECT DISTINCT and UNION continues to be puzzling. It seems that NULLs 
> should be either distinct everywhere or nowhere. And the SQL standards 
> documents suggest that NULLs should be distinct everywhere. Yet as of 
> this writing, no SQL engine tested treats NULLs as distinct in a SELECT 
> DISTINCT statement or in a UNION.

SQLite is pragmatic, that's fair enough and this paragraph shows this decision 
was with awareness of the semantic issue. Hard to do better??

Thanks for your insight to both of you.


-- 
Yannick Duch?ne


[sqlite] `UNIQUE` constraint allows multiple NULL: is this expected?

2016-01-27 Thread Yannick Duchêne
On Wed, 27 Jan 2016 14:17:49 -0500
Igor Tandetnik  wrote:

> On 1/27/2016 2:10 PM, Yannick Duch?ne wrote:
> > `UNIQUE` constraint allows multiple NULL: is this expected?
> 
> http://www.sqlite.org/lang_createtable.html
> "For the purposes of UNIQUE constraints, NULL values are considered 
> distinct from all other values, including other NULLs."
> 
> I'm reasonably, but not 100%, sure this is standard-conforming.

Thanks Igor. I overlooked it (and now feel to remember I use to know it and was 
as much surprised as I am now).

Searching the web, it seems most DB don't allow multiple `NULL` on a column 
with a `UNIQUE` constraint. If the standard says any `NULL` differs from any 
`NULL`, so be it and these DB are wrong. Just that now I wonder the why.

Is this related to something I don't know about, named `UNKNOWN` in SQL? If 
it's related, then I may understand.

-- 
Yannick Duch?ne


[sqlite] Is there an index of conformances and deviations against standard SQL?

2016-01-27 Thread Yannick Duchêne
I like SQLite because it's simple and also because I believe it sticks to the 
standard, therefore I get the habit to use SQLite to learn standard SQL.

Today, I came to a page from the documentation, about a construct which is 
supported for compatibility with MySQL and elsewhere I learned (if not wrong) 
`INSERT OR REPLACE` is not standard SQL, which (the latter) defines a different 
while close `MERGE` instead.

So I feel may be I should not see SQLite as a strict standard SQL 
implementation and learn where it conforms to it (I care less about partial 
support) and where it deviates.


[sqlite] `UNIQUE` constraint allows multiple NULL: is this expected?

2016-01-27 Thread Yannick Duchêne
Hi people out there,

I initially subscribed for another question (which I will post later), when I 
came to something which surprised me.

Given this:

CREATE TABLE test (value TEXT UNIQUE)

On this:

INSERT INTO test (value) VALUES('A');
INSERT INTO test (value) VALUES('A')

? the second statement triggers an error due to a `UNIQUE` constraint 
violation, as expected.

However, this can be repeated many times:

INSERT INTO test (value) VALUES(NULL)
INSERT INTO test (value) VALUES(NULL)
INSERT INTO test (value) VALUES(NULL)
-- And so on??

? it does not trigger any error.

Is this expected? Well, one may understand it as ?is this expected for SQLite?, 
so I prefer two questions instead:

 * Is this expected for SQLite?
 * it this expected for standard SQL?

(the latter is also a link to another other question to be posted)


-- 
Yannick Duch?ne