Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2014-01-23 Thread Elefterios Stamatogiannakis

On 23/1/2014 7:12 μμ, Drake Wilson wrote:

Quoth Eleytherios Stamatogiannakis , on 2014-01-23 14:37:23 
+0200:

Let me describe a use case where a not unique key and without rowid
are most welcome. We have a distributed big data system here which
uses SQLite for the partitions. To be able to efficiently execute
join queries on splited partitions, we need to assemble the
partitions of one side of the query to create an index on them.


Do you really need bag rather than set semantics?  That is, can there
be a case where rows that are identical in _all_ columns need to be
treated as separate and (e.g.) have both copies show up in queries?



As we need to emulate (non table backed) indexes, yes. In an index you 
can have the same key with multiple "covering values" accompanying it. 
Consider the case where you want a covering index that "covers" the 
whole table. And you know that you'll only ever hit the index (e.g. for 
joins), and not the table that backs the index. In that case, the only 
way to store the data only once is using something like what i've 
described in my previous email.


Also using the whole row as a primary key isn't a viable solution. There 
are many kinds of data that may have duplicate rows in the index. Like 
pre-graph data (co-occurency lists), on which, for example, you want to 
calculate the frequency of the links before you group by them.


Our data is mainly scientific data (from digital libraries). In which we 
do article text mining (finding citations, funders, classification, 
protein references, ...). We also deal with graph data (graph 
isomorphisms, graph mining...).


All of the above processes are done using madIS [*], which is 
essentially SQLite + extensions (multivalued row and aggregate 
functions, virtual table composition, ...).


l.

[*] http://madis.googlecode.com


Most of the time, the way data is represented in relational databases,
this winds up requiring an arbitrary identity key anyway to be
practical (so one can manipulate a specific instance of an otherwise
identical row), or else it's equivalent to adding a count column to
turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has
a similar slight complexity hitch in the merge case to what you were
doing.

If you do require the above, I'm curious what data is being handled
here, since it's a rare case (but I understand if you don't wish to
say).  If not, then you may actually have a primary key of the whole
row, in which case I'm not sure why inventing a rowid is needed.

---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2014-01-23 Thread Drake Wilson
Quoth Eleytherios Stamatogiannakis , on 2014-01-23 14:37:23 
+0200:
> Let me describe a use case where a not unique key and without rowid
> are most welcome. We have a distributed big data system here which
> uses SQLite for the partitions. To be able to efficiently execute
> join queries on splited partitions, we need to assemble the
> partitions of one side of the query to create an index on them.

Do you really need bag rather than set semantics?  That is, can there
be a case where rows that are identical in _all_ columns need to be
treated as separate and (e.g.) have both copies show up in queries?

Most of the time, the way data is represented in relational databases,
this winds up requiring an arbitrary identity key anyway to be
practical (so one can manipulate a specific instance of an otherwise
identical row), or else it's equivalent to adding a count column to
turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has
a similar slight complexity hitch in the merge case to what you were
doing.

If you do require the above, I'm curious what data is being handled
here, since it's a rare case (but I understand if you don't wish to
say).  If not, then you may actually have a primary key of the whole
row, in which case I'm not sure why inventing a rowid is needed.

   ---> Drake Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2014-01-23 Thread Eleytherios Stamatogiannakis
First of all please excuse me for bringing forth again a conversation 
that is months old. I've just discovered the "without rowid" ability of 
SQLite and i have to say many many thanks for it.


Let me describe a use case where a not unique key and without rowid are 
most welcome. We have a distributed big data system here which uses 
SQLite for the partitions. To be able to efficiently execute join 
queries on splited partitions, we need to assemble the partitions of one 
side of the query to create an index on them.


So in the query:

select y,z from A, B where A.x = B.x;

We assemble the pieces of B using this query:

create table B as select * from ALL("b1.db", "b2.db", "b3.db");

and then we create an index on them:

create index idx_b_x on B(x,y);

Above essentially creates 2 copies of table B's data (because all our 
indexes are always covering indexes).


So to avoid these 2 copies, what we would like to have are indexes 
*without backing tables*. These are essentially tables "without rowid" 
having plain keys (as opposed to primary keys).


We could work around the primary key limitation of "without rowid" 
tables by creating a dummy rowid column and putting it at the right of 
the table's key that we care about:


CREATE TABLE B(
x, y, rowid, PRIMARY KEY(x, rowid)
) WITHOUT ROWID;

and then insert into it the data:

insert into B select * from ALLWITHDUMMYROWID("b1.db", "b2.db", "b3.db");

but having this capability inside SQLITE would be a lot better.

Nevertheless above comments, thank you again for "without rowid". 
They'll be very useful for our use case.


Regards,

l.

On 19/11/13 21:53, Richard Hipp wrote:

On Mon, Nov 18, 2013 at 5:45 PM, Nico Williams wrote:


Obviously a B-Tree-based table will need *some* key, but it won't need
that key to be UNIQUE.



Yeah it does.  Every storage system has a unique key, be it an address in
memory or a filename and offset on disk.  There must be some way to
distinguish between two record.  Two records with the same key are
indistinguishable from one another and are hence the same record.

The question is whether or not the unique identifier is exposed to the
application.

SQLite allows the unique identifier to be a PRIMARY KEY ... WITHOUT ROWID.
Or it allows it to be the rowid.  You get to choose.  But every table has
to have one or the other.

You can argue that it is theoretically possible to create a table where the
key is something other than PRIMARY KEY or rowid (perhaps it is the
filename+offset suggested above) that is simply not exposed to the
application layer.  Yes, that is theoretically possible.  But SQLite does
not do that and I do not see any reason to add the capability.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-19 Thread Richard Hipp
On Mon, Nov 18, 2013 at 5:45 PM, Nico Williams wrote:

> Obviously a B-Tree-based table will need *some* key, but it won't need
> that key to be UNIQUE.
>

Yeah it does.  Every storage system has a unique key, be it an address in
memory or a filename and offset on disk.  There must be some way to
distinguish between two record.  Two records with the same key are
indistinguishable from one another and are hence the same record.

The question is whether or not the unique identifier is exposed to the
application.

SQLite allows the unique identifier to be a PRIMARY KEY ... WITHOUT ROWID.
Or it allows it to be the rowid.  You get to choose.  But every table has
to have one or the other.

You can argue that it is theoretically possible to create a table where the
key is something other than PRIMARY KEY or rowid (perhaps it is the
filename+offset suggested above) that is simply not exposed to the
application layer.  Yes, that is theoretically possible.  But SQLite does
not do that and I do not see any reason to add the capability.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-19 Thread Nico Williams
On Tue, Nov 19, 2013 at 11:50 AM, RSmith  wrote:
> Hi Nico, thanks for the discussion - although this thread is dangerously
> close to becoming a stale debate rather than a helpful forum discussion.
> In this light, I will try to be brief and then fight the urge to reply
> again.

Thanks for your opinion.

I'd like to hear from the SQLite3 team as to whether the overloading
of PRIMARY KEY was intentional or accidental, whether there's any
interest in adopting the MySQL INDEX USING syntax, whether there's any
interest in deferred UNIQUE constraint enforcement, and so on.

They may choose to not answer any of those questions; my asking does
not -and is not intended to- oblige them to answer; I have no
expectations either way as to whether those questions will be
answered.  But those are the questions I'm interested in.  You're
having an argument with me that doesn't directly address those
questions; I agree that we should not pursue that sub-thread, but feel
free to help us discover more direct answers to my questions.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-19 Thread RSmith
Hi Nico, thanks for the discussion - although this thread is dangerously close to becoming a stale debate rather than a helpful 
forum discussion.

In this light, I will try to be brief and then fight the urge to reply again.

You still seem to try and sell the usability of non-unique tables to me, I'm with you on that, no need to keep pushing it. What I'm 
saying is even though you can conjur up valid use-cases, I am not convinced it merrits the changeover for SQLite.


A B-Tree quite clearly imposes no need for uniqueness; the uniqueness requirement in SQLite3's WITHOUT ROWID almost certainly 
derives from PRIMARY KEY being a convenient syntactic source of columns for the primary index. There are alternatives to 
overloading existing syntax; MySQL has a suitable syntax, and even without it there are alternatives. Overloading syntax is not 
always free... Nico -- 


MySQL use internal row indexers for both INNO and ISAM table types (there are others of which I do not know the workings, though), 
the only difference is that A - they are not BTrees, which SQLite chose to use, and B - there is no way to access the internal row 
pointers in MySQL through SQL statements and the only reason it's accessible in SQLite is because it mimics a normal PKey - for the 
sole reason that it essentially is one.


Again, I might be wrong about this and rowids may have been designed in lieu of internal pointers specifically to be used by people 
for some reason or the other.  A Dev with more insight weighing in on this might be a more trustworthy source.


Either way, for whatever reason(s) rowids were used, now the PKey is chosen to replace rowid since for the most part, tables that 
can perform well sans said rowid are typically those that already have PKeys and if we drop the rowid, SQLite can STILL directly 
dereference any row inside a table because it still has a unique Key to go on. I very much doubt your contention that it was merely 
a "convenient source of columns" - in fact, I would like to postulate that it would be impossible, as it stands now, for SQLite to 
directly access table rows missing both a Primary key and rowid - not merely inconvenient but I could be wrong.


Cheers!
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-19 Thread Nico Williams
On Tue, Nov 19, 2013 at 6:35 AM, RSmith  wrote:
>  I also do not think (or imagine) that unique constraints "ought" to be
> required - I am saying that in the case of SQLite or any mainstream RDBMS it
> becomes necessary to have some unique reference to single out a row should
> you wish to adjust it somehow (often, adjustment are not required, such as

Data which is apparently non-sensical to you can nonetheless be used
in SQL queries and data modification statements that are meaningful to
me.  In all the cases I care about such data exist only ephemerally or
not at all, but the cases where they exist ephemerally are important
to me.

In the ephemeral dups case I only need to be able to count duplicate
rows, which is quite obviously trivial to do.  This case is where I'm
implementing application-defined uniqueness constraints that are
substantially different from standard SQL uniqueness (more on this
some other time) or where I'm implementing deferred uniqueness
constraints.  Deferred constraint checking is a valuable feature
(evidence: SQLite3 added deferred FK enforcement).

> for logs), and that (because of that) it is good that SQLite decided to have
> rowids, if a little archaic.  It's even better with the new optimisation
> being possible, and certainly to keep all admin things working as it should,
> I do not think it should drop the requirement.

The case where the application needs either deferred uniqueness
constraints handling or application-defined uniqueness constraints
that are not [easily] expressible in SQL, but where the application
uses SQL under the covers, is very much a real-world case.  In that
case I really need to have the RDBMS impose no unique constraints.

A B-Tree quite clearly imposes no need for uniqueness; the uniqueness
requirement in SQLite3's WITHOUT ROWID almost certainly derives from
PRIMARY KEY being a convenient syntactic source of columns for the
primary index.  There are alternatives to overloading existing syntax;
MySQL has a suitable syntax, and even without it there are
alternatives.  Overloading syntax is not always free...

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-19 Thread RSmith

On 2013/11/19 08:37, Nico Williams wrote:
More generally however, it's clear that a unique constraint is not necessary to make rowid-less tables work (see the MySQL 
example), SQL doesn't require unique constraints, and it's not clear that just because you (or I) lack imagination that unique 
constraints ought to be required. On the contrary, SQLite3 has never required a unique constraint -- the rowid is an 
implementation detail, one that turns out to not be necessary. Nico -- 


I think you misunderstand me - I am definitely sure that unique-key-less datasets can be useful and minimising the cpu cycles to 
deal with them is a good idea. I do think these are fringe-cases though. You however argued that SQLite should drop the unique 
constraint requirement - which led me to lament the virtues of unique constraints - as opposed to declaring them essential for all 
datasets.


 I also do not think (or imagine) that unique constraints "ought" to be required - I am saying that in the case of SQLite or any 
mainstream RDBMS it becomes necessary to have some unique reference to single out a row should you wish to adjust it somehow (often, 
adjustment are not required, such as for logs), and that (because of that) it is good that SQLite decided to have rowids, if a 
little archaic.  It's even better with the new optimisation being possible, and certainly to keep all admin things working as it 
should, I do not think it should drop the requirement.


This is very different to your suggestion that you (or I) may lack the imagination to fathom a unique-constraint-less arena - I'm 
sure anyone can name a couple use cases off-hand - but I would like SQLite to not drop the requirement altogether.


This is a statement of considered outcomes though, not a lecture of dogma.  If the efficiency gains can be shown to be significant, 
then I will change my mind very quickly.


(Let me add: ... and very much appreciate the efforts of whomever took the time 
to study and show such.)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-18 Thread Nico Williams
Tomorrow I may describe two real-world cases where logical uniqueness
constraints are difficult or extremely difficult to enforce naturally using
standard SQL uniqueness constraints.  Both are or can be generalizations of
the id,attr,val model, one of them using a timeseries approach to permit
point-in-time queries and in-the-future updates.   Once more these are
cop-outs in that uniqueness on some level is desired, just not the SQL
level, but since that's what we're talking about... they seem like fair
examples.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-18 Thread Nico Williams
On Mon, Nov 18, 2013 at 8:57 PM, RSmith  wrote:
> Can you honestly use tables without a single unique reference to them?

Plain piles of data abound in quantities so voluminous that there's no
time to index them (of course, one would not apply SQLite3 to such a
dataset, I'm only pointing out that they exist).  In the sciences
there are many cases where enormous datasets are produced and consumed
(analyzed) on the spot, with no chance of persistent storage of the
raw data (much less indexing).  There's bound to be datasets where
unique keys are not useful or necessary.

> Consider your suggestion in carnate here in the following Table on which
> some indices may exist but none are Unique:

In one case where I've used this column-oriented approach there's no
schema for me to enforce: the data is already clean and there will
only be multiple attribute value pairs (triples) with the same values
where that makes sense (to some consumer, possibly not me) or where
collisions are harmless (think of a "comments" attribute).  And in
this case, because I have no use for such things I can safely accept
the UNIQUE constraint and use INSERT OR REPLACE|IGNORE or equivalent
constructions such as WHERE NOT EXISTS (...).

In another case there's a schema to enforce and I have no use for
"sequence multiplicity", only "set multiplicity", therefore a UNIQUE
constraint would be fine, but I need to defer enforcement of UNIQUE
constraints to the end of each transaction, something that SQLite3
can't do, so I resort to using non-UNIQUE indexes and checking that a
transaction adds no collisions as the last step.  Even if SQLite3 had
deferred UNIQUE constraint enforcement I can think of data where I
might like to allow for sequence multiplicity (e.g., as an indirect
way of counting things without first having to total them up, though
obviously only for things with small cardinality as normally
aggregation is to be preferred at all costs; it might be faster to
append to a dataset than to update it randomly).

> The column to the far left exists so that I can point out rows to you, but
> it does not exist in the table, only the ones with headings do. There are no
> unique indices nor any primary key.  How would you construct a query to fix
> the second monkey reference in row 5 (say you want it to be another animal)
> without affecting the other one in row 4?  Or just delete the second one and
> [...]

That seems like the wrong questions to ask.  If you chose to allow
this dataset then when updating (9, 'monkey') you'd want to update
both those rows.  You could count them, so you know there's two, and
you could delete them and re-add only one (or three) if you like, and
this would be true even without any kind of index as long as you could
scan the table.

> Unless a Table is just a list of values for which you never want to adjust
> them, I cannot imagine how you would ever manage a unique-key-less table.

See above.  The deferred UNIQUE constraint checking case is a bit of a
cop-out: there's still a unique constraint, logically, but implemented
at the application layer, with conflicts allowed to exist in
uncommitted (but not committed) transactions (but an application
transaction might well consist of more than one SQLite3 transaction).
The downstream-of-app-that-allows-conflicts case is also a cop-out.

More generally however, it's clear that a unique constraint is not
necessary to make rowid-less tables work (see the MySQL example), SQL
doesn't require unique constraints, and it's not clear that just
because you (or I) lack imagination that unique constraints ought to
be required.  On the contrary, SQLite3 has never required a unique
constraint -- the rowid is an implementation detail, one that turns
out to not be necessary.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-18 Thread RSmith

On 2013/11/19 00:45, Nico Williams wrote:

Thinking about how SQLite3 would implement WITHOUT ROWID, it seems
logical that a set of columns to be used as the primary indexed is
required, but that they be unique shouldn't be.

..and...  The implied UNIQUE constraint is just an unfortunate side-effect.

..and...
I can live with this for now, but I can imagine that this will become
a real problem for someone eventually.  I can envision SQLite3
requiring an INDEX be declared before any INSERTs on a WITHOUT ROWID
table,  etc.


Forgetting for a moment that WITHOUT ROWID is a very optional optimisation - Can you honestly use tables without a single unique 
reference to them?


Consider your suggestion in carnate here in the following Table on which some 
indices may exist but none are Unique:
---|  id  | attr_name   | attr_val  |
---|--|-|-|
1 |  4| ape  | Gray   |
2 |  7| ape  | Silver  |
3 |  12  | Lemur | Black  |
4 |  9| monkey   | Brown|
5 |  9| monkey   | Brown|
6 |  4| ape  | Gray   |
7 |  7| Lemur | White |


The column to the far left exists so that I can point out rows to you, but it does not exist in the table, only the ones with 
headings do. There are no unique indices nor any primary key.  How would you construct a query to fix the second monkey reference in 
row 5 (say you want it to be another animal) without affecting the other one in row 4?  Or just delete the second one and not the 
other? Or say make the first monkey white and the second black? Or detect a human error in the work of the researchers who collected 
the data?


Unless a Table is just a list of values for which you never want to adjust them, I cannot imagine how you would ever manage a 
unique-key-less table.  Especially when you make admin software that is supposed to edit tables made by other people and you run 
into a gem like the above. What do you tell the user? "Sorry mate, I have no clue how to change the 5th line for you.. either change 
both or none... sorry."  These rows are still proximitely close and a bit of skullduggery with limits etc. might get you changing 
only one of them (albeit indeterminate which one), but can you imagine editing a large table and some row somewhere else changes out 
of the view of the user?


It will be a mess.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-18 Thread Nico Williams
I can also imagine a day when

PRAGMA without_rowid_by_default = 1;

will cause tables created subsequently to lack implicit rowid columns.
 It'd be easier to enable that if the selection of columns for the
b-tree index were less consequential than requiring a PRIMARY KEY with
its attendant implied unique constraint.

To make that feasible would require a decent default column selection
when there isn't a primary key.  My proposal would be to take the
first of the following at first INSERT time:

 - first index declared on the table
 - first UNIQUE constraint declared on the table
 - first column declared UNIQUE with a column constraint

The theory being that without syntax for defining a non-unique index
at table creation time, the schema author's best chance to implicitly
specify the primary b-tree index is with the first index created
before the first insert.  Whereas if the schema author has any unique
constraints then the ones where they get to pick the declaration order
should be most preferred for this purpose (the schema author may not
get to pick column declaration order).

And perhaps SQLite3 should copy MySQL's syntax for declaring
non-unique primary b-tree index:

CREATE TABLE ... (..., INDEX USING BTREE ());

This also might set the stage for adding hash tables as b-trees where
the key is derived by calling a UDF:

CREATE TABLE ... (..., INDEX USING BTREE (udf());

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?

2013-11-18 Thread Nico Williams
I should add that WITHOUT ROWID ended up making a very small
difference in bulk-load performance of my application, but a bigger
and noticeable difference in incremental operation (where it's most
welcomed and desired anyways) performance.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users