Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On 23/1/2014 7:12 μμ, Drake Wilson wrote: Quoth Eleytherios Stamatogiannakis, on 2014-01-23 14:37:23 +0200: Let me describe a use case where a not unique key and without rowid are most welcome. We have a distributed big data system here which uses SQLite for the partitions. To be able to efficiently execute join queries on splited partitions, we need to assemble the partitions of one side of the query to create an index on them. Do you really need bag rather than set semantics? That is, can there be a case where rows that are identical in _all_ columns need to be treated as separate and (e.g.) have both copies show up in queries? As we need to emulate (non table backed) indexes, yes. In an index you can have the same key with multiple "covering values" accompanying it. Consider the case where you want a covering index that "covers" the whole table. And you know that you'll only ever hit the index (e.g. for joins), and not the table that backs the index. In that case, the only way to store the data only once is using something like what i've described in my previous email. Also using the whole row as a primary key isn't a viable solution. There are many kinds of data that may have duplicate rows in the index. Like pre-graph data (co-occurency lists), on which, for example, you want to calculate the frequency of the links before you group by them. Our data is mainly scientific data (from digital libraries). In which we do article text mining (finding citations, funders, classification, protein references, ...). We also deal with graph data (graph isomorphisms, graph mining...). All of the above processes are done using madIS [*], which is essentially SQLite + extensions (multivalued row and aggregate functions, virtual table composition, ...). l. [*] http://madis.googlecode.com Most of the time, the way data is represented in relational databases, this winds up requiring an arbitrary identity key anyway to be practical (so one can manipulate a specific instance of an otherwise identical row), or else it's equivalent to adding a count column to turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has a similar slight complexity hitch in the merge case to what you were doing. If you do require the above, I'm curious what data is being handled here, since it's a rare case (but I understand if you don't wish to say). If not, then you may actually have a primary key of the whole row, in which case I'm not sure why inventing a rowid is needed. ---> Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
Quoth Eleytherios Stamatogiannakis, on 2014-01-23 14:37:23 +0200: > Let me describe a use case where a not unique key and without rowid > are most welcome. We have a distributed big data system here which > uses SQLite for the partitions. To be able to efficiently execute > join queries on splited partitions, we need to assemble the > partitions of one side of the query to create an index on them. Do you really need bag rather than set semantics? That is, can there be a case where rows that are identical in _all_ columns need to be treated as separate and (e.g.) have both copies show up in queries? Most of the time, the way data is represented in relational databases, this winds up requiring an arbitrary identity key anyway to be practical (so one can manipulate a specific instance of an otherwise identical row), or else it's equivalent to adding a count column to turn {(x, y, z), (x, y, z)} into {(x, y, z, 2)}, though the latter has a similar slight complexity hitch in the merge case to what you were doing. If you do require the above, I'm curious what data is being handled here, since it's a rare case (but I understand if you don't wish to say). If not, then you may actually have a primary key of the whole row, in which case I'm not sure why inventing a rowid is needed. ---> Drake Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
First of all please excuse me for bringing forth again a conversation that is months old. I've just discovered the "without rowid" ability of SQLite and i have to say many many thanks for it. Let me describe a use case where a not unique key and without rowid are most welcome. We have a distributed big data system here which uses SQLite for the partitions. To be able to efficiently execute join queries on splited partitions, we need to assemble the partitions of one side of the query to create an index on them. So in the query: select y,z from A, B where A.x = B.x; We assemble the pieces of B using this query: create table B as select * from ALL("b1.db", "b2.db", "b3.db"); and then we create an index on them: create index idx_b_x on B(x,y); Above essentially creates 2 copies of table B's data (because all our indexes are always covering indexes). So to avoid these 2 copies, what we would like to have are indexes *without backing tables*. These are essentially tables "without rowid" having plain keys (as opposed to primary keys). We could work around the primary key limitation of "without rowid" tables by creating a dummy rowid column and putting it at the right of the table's key that we care about: CREATE TABLE B( x, y, rowid, PRIMARY KEY(x, rowid) ) WITHOUT ROWID; and then insert into it the data: insert into B select * from ALLWITHDUMMYROWID("b1.db", "b2.db", "b3.db"); but having this capability inside SQLITE would be a lot better. Nevertheless above comments, thank you again for "without rowid". They'll be very useful for our use case. Regards, l. On 19/11/13 21:53, Richard Hipp wrote: On Mon, Nov 18, 2013 at 5:45 PM, Nico Williamswrote: Obviously a B-Tree-based table will need *some* key, but it won't need that key to be UNIQUE. Yeah it does. Every storage system has a unique key, be it an address in memory or a filename and offset on disk. There must be some way to distinguish between two record. Two records with the same key are indistinguishable from one another and are hence the same record. The question is whether or not the unique identifier is exposed to the application. SQLite allows the unique identifier to be a PRIMARY KEY ... WITHOUT ROWID. Or it allows it to be the rowid. You get to choose. But every table has to have one or the other. You can argue that it is theoretically possible to create a table where the key is something other than PRIMARY KEY or rowid (perhaps it is the filename+offset suggested above) that is simply not exposed to the application layer. Yes, that is theoretically possible. But SQLite does not do that and I do not see any reason to add the capability. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On Mon, Nov 18, 2013 at 5:45 PM, Nico Williamswrote: > Obviously a B-Tree-based table will need *some* key, but it won't need > that key to be UNIQUE. > Yeah it does. Every storage system has a unique key, be it an address in memory or a filename and offset on disk. There must be some way to distinguish between two record. Two records with the same key are indistinguishable from one another and are hence the same record. The question is whether or not the unique identifier is exposed to the application. SQLite allows the unique identifier to be a PRIMARY KEY ... WITHOUT ROWID. Or it allows it to be the rowid. You get to choose. But every table has to have one or the other. You can argue that it is theoretically possible to create a table where the key is something other than PRIMARY KEY or rowid (perhaps it is the filename+offset suggested above) that is simply not exposed to the application layer. Yes, that is theoretically possible. But SQLite does not do that and I do not see any reason to add the capability. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On Tue, Nov 19, 2013 at 11:50 AM, RSmithwrote: > Hi Nico, thanks for the discussion - although this thread is dangerously > close to becoming a stale debate rather than a helpful forum discussion. > In this light, I will try to be brief and then fight the urge to reply > again. Thanks for your opinion. I'd like to hear from the SQLite3 team as to whether the overloading of PRIMARY KEY was intentional or accidental, whether there's any interest in adopting the MySQL INDEX USING syntax, whether there's any interest in deferred UNIQUE constraint enforcement, and so on. They may choose to not answer any of those questions; my asking does not -and is not intended to- oblige them to answer; I have no expectations either way as to whether those questions will be answered. But those are the questions I'm interested in. You're having an argument with me that doesn't directly address those questions; I agree that we should not pursue that sub-thread, but feel free to help us discover more direct answers to my questions. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
Hi Nico, thanks for the discussion - although this thread is dangerously close to becoming a stale debate rather than a helpful forum discussion. In this light, I will try to be brief and then fight the urge to reply again. You still seem to try and sell the usability of non-unique tables to me, I'm with you on that, no need to keep pushing it. What I'm saying is even though you can conjur up valid use-cases, I am not convinced it merrits the changeover for SQLite. A B-Tree quite clearly imposes no need for uniqueness; the uniqueness requirement in SQLite3's WITHOUT ROWID almost certainly derives from PRIMARY KEY being a convenient syntactic source of columns for the primary index. There are alternatives to overloading existing syntax; MySQL has a suitable syntax, and even without it there are alternatives. Overloading syntax is not always free... Nico -- MySQL use internal row indexers for both INNO and ISAM table types (there are others of which I do not know the workings, though), the only difference is that A - they are not BTrees, which SQLite chose to use, and B - there is no way to access the internal row pointers in MySQL through SQL statements and the only reason it's accessible in SQLite is because it mimics a normal PKey - for the sole reason that it essentially is one. Again, I might be wrong about this and rowids may have been designed in lieu of internal pointers specifically to be used by people for some reason or the other. A Dev with more insight weighing in on this might be a more trustworthy source. Either way, for whatever reason(s) rowids were used, now the PKey is chosen to replace rowid since for the most part, tables that can perform well sans said rowid are typically those that already have PKeys and if we drop the rowid, SQLite can STILL directly dereference any row inside a table because it still has a unique Key to go on. I very much doubt your contention that it was merely a "convenient source of columns" - in fact, I would like to postulate that it would be impossible, as it stands now, for SQLite to directly access table rows missing both a Primary key and rowid - not merely inconvenient but I could be wrong. Cheers! Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On Tue, Nov 19, 2013 at 6:35 AM, RSmithwrote: > I also do not think (or imagine) that unique constraints "ought" to be > required - I am saying that in the case of SQLite or any mainstream RDBMS it > becomes necessary to have some unique reference to single out a row should > you wish to adjust it somehow (often, adjustment are not required, such as Data which is apparently non-sensical to you can nonetheless be used in SQL queries and data modification statements that are meaningful to me. In all the cases I care about such data exist only ephemerally or not at all, but the cases where they exist ephemerally are important to me. In the ephemeral dups case I only need to be able to count duplicate rows, which is quite obviously trivial to do. This case is where I'm implementing application-defined uniqueness constraints that are substantially different from standard SQL uniqueness (more on this some other time) or where I'm implementing deferred uniqueness constraints. Deferred constraint checking is a valuable feature (evidence: SQLite3 added deferred FK enforcement). > for logs), and that (because of that) it is good that SQLite decided to have > rowids, if a little archaic. It's even better with the new optimisation > being possible, and certainly to keep all admin things working as it should, > I do not think it should drop the requirement. The case where the application needs either deferred uniqueness constraints handling or application-defined uniqueness constraints that are not [easily] expressible in SQL, but where the application uses SQL under the covers, is very much a real-world case. In that case I really need to have the RDBMS impose no unique constraints. A B-Tree quite clearly imposes no need for uniqueness; the uniqueness requirement in SQLite3's WITHOUT ROWID almost certainly derives from PRIMARY KEY being a convenient syntactic source of columns for the primary index. There are alternatives to overloading existing syntax; MySQL has a suitable syntax, and even without it there are alternatives. Overloading syntax is not always free... Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On 2013/11/19 08:37, Nico Williams wrote: More generally however, it's clear that a unique constraint is not necessary to make rowid-less tables work (see the MySQL example), SQL doesn't require unique constraints, and it's not clear that just because you (or I) lack imagination that unique constraints ought to be required. On the contrary, SQLite3 has never required a unique constraint -- the rowid is an implementation detail, one that turns out to not be necessary. Nico -- I think you misunderstand me - I am definitely sure that unique-key-less datasets can be useful and minimising the cpu cycles to deal with them is a good idea. I do think these are fringe-cases though. You however argued that SQLite should drop the unique constraint requirement - which led me to lament the virtues of unique constraints - as opposed to declaring them essential for all datasets. I also do not think (or imagine) that unique constraints "ought" to be required - I am saying that in the case of SQLite or any mainstream RDBMS it becomes necessary to have some unique reference to single out a row should you wish to adjust it somehow (often, adjustment are not required, such as for logs), and that (because of that) it is good that SQLite decided to have rowids, if a little archaic. It's even better with the new optimisation being possible, and certainly to keep all admin things working as it should, I do not think it should drop the requirement. This is very different to your suggestion that you (or I) may lack the imagination to fathom a unique-constraint-less arena - I'm sure anyone can name a couple use cases off-hand - but I would like SQLite to not drop the requirement altogether. This is a statement of considered outcomes though, not a lecture of dogma. If the efficiency gains can be shown to be significant, then I will change my mind very quickly. (Let me add: ... and very much appreciate the efforts of whomever took the time to study and show such.) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
Tomorrow I may describe two real-world cases where logical uniqueness constraints are difficult or extremely difficult to enforce naturally using standard SQL uniqueness constraints. Both are or can be generalizations of the id,attr,val model, one of them using a timeseries approach to permit point-in-time queries and in-the-future updates. Once more these are cop-outs in that uniqueness on some level is desired, just not the SQL level, but since that's what we're talking about... they seem like fair examples. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On Mon, Nov 18, 2013 at 8:57 PM, RSmithwrote: > Can you honestly use tables without a single unique reference to them? Plain piles of data abound in quantities so voluminous that there's no time to index them (of course, one would not apply SQLite3 to such a dataset, I'm only pointing out that they exist). In the sciences there are many cases where enormous datasets are produced and consumed (analyzed) on the spot, with no chance of persistent storage of the raw data (much less indexing). There's bound to be datasets where unique keys are not useful or necessary. > Consider your suggestion in carnate here in the following Table on which > some indices may exist but none are Unique: In one case where I've used this column-oriented approach there's no schema for me to enforce: the data is already clean and there will only be multiple attribute value pairs (triples) with the same values where that makes sense (to some consumer, possibly not me) or where collisions are harmless (think of a "comments" attribute). And in this case, because I have no use for such things I can safely accept the UNIQUE constraint and use INSERT OR REPLACE|IGNORE or equivalent constructions such as WHERE NOT EXISTS (...). In another case there's a schema to enforce and I have no use for "sequence multiplicity", only "set multiplicity", therefore a UNIQUE constraint would be fine, but I need to defer enforcement of UNIQUE constraints to the end of each transaction, something that SQLite3 can't do, so I resort to using non-UNIQUE indexes and checking that a transaction adds no collisions as the last step. Even if SQLite3 had deferred UNIQUE constraint enforcement I can think of data where I might like to allow for sequence multiplicity (e.g., as an indirect way of counting things without first having to total them up, though obviously only for things with small cardinality as normally aggregation is to be preferred at all costs; it might be faster to append to a dataset than to update it randomly). > The column to the far left exists so that I can point out rows to you, but > it does not exist in the table, only the ones with headings do. There are no > unique indices nor any primary key. How would you construct a query to fix > the second monkey reference in row 5 (say you want it to be another animal) > without affecting the other one in row 4? Or just delete the second one and > [...] That seems like the wrong questions to ask. If you chose to allow this dataset then when updating (9, 'monkey') you'd want to update both those rows. You could count them, so you know there's two, and you could delete them and re-add only one (or three) if you like, and this would be true even without any kind of index as long as you could scan the table. > Unless a Table is just a list of values for which you never want to adjust > them, I cannot imagine how you would ever manage a unique-key-less table. See above. The deferred UNIQUE constraint checking case is a bit of a cop-out: there's still a unique constraint, logically, but implemented at the application layer, with conflicts allowed to exist in uncommitted (but not committed) transactions (but an application transaction might well consist of more than one SQLite3 transaction). The downstream-of-app-that-allows-conflicts case is also a cop-out. More generally however, it's clear that a unique constraint is not necessary to make rowid-less tables work (see the MySQL example), SQL doesn't require unique constraints, and it's not clear that just because you (or I) lack imagination that unique constraints ought to be required. On the contrary, SQLite3 has never required a unique constraint -- the rowid is an implementation detail, one that turns out to not be necessary. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
On 2013/11/19 00:45, Nico Williams wrote: Thinking about how SQLite3 would implement WITHOUT ROWID, it seems logical that a set of columns to be used as the primary indexed is required, but that they be unique shouldn't be. ..and... The implied UNIQUE constraint is just an unfortunate side-effect. ..and... I can live with this for now, but I can imagine that this will become a real problem for someone eventually. I can envision SQLite3 requiring an INDEX be declared before any INSERTs on a WITHOUT ROWID table, etc. Forgetting for a moment that WITHOUT ROWID is a very optional optimisation - Can you honestly use tables without a single unique reference to them? Consider your suggestion in carnate here in the following Table on which some indices may exist but none are Unique: ---| id | attr_name | attr_val | ---|--|-|-| 1 | 4| ape | Gray | 2 | 7| ape | Silver | 3 | 12 | Lemur | Black | 4 | 9| monkey | Brown| 5 | 9| monkey | Brown| 6 | 4| ape | Gray | 7 | 7| Lemur | White | The column to the far left exists so that I can point out rows to you, but it does not exist in the table, only the ones with headings do. There are no unique indices nor any primary key. How would you construct a query to fix the second monkey reference in row 5 (say you want it to be another animal) without affecting the other one in row 4? Or just delete the second one and not the other? Or say make the first monkey white and the second black? Or detect a human error in the work of the researchers who collected the data? Unless a Table is just a list of values for which you never want to adjust them, I cannot imagine how you would ever manage a unique-key-less table. Especially when you make admin software that is supposed to edit tables made by other people and you run into a gem like the above. What do you tell the user? "Sorry mate, I have no clue how to change the 5th line for you.. either change both or none... sorry." These rows are still proximitely close and a bit of skullduggery with limits etc. might get you changing only one of them (albeit indeterminate which one), but can you imagine editing a large table and some row somewhere else changes out of the view of the user? It will be a mess. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
I can also imagine a day when PRAGMA without_rowid_by_default = 1; will cause tables created subsequently to lack implicit rowid columns. It'd be easier to enable that if the selection of columns for the b-tree index were less consequential than requiring a PRIMARY KEY with its attendant implied unique constraint. To make that feasible would require a decent default column selection when there isn't a primary key. My proposal would be to take the first of the following at first INSERT time: - first index declared on the table - first UNIQUE constraint declared on the table - first column declared UNIQUE with a column constraint The theory being that without syntax for defining a non-unique index at table creation time, the schema author's best chance to implicitly specify the primary b-tree index is with the first index created before the first insert. Whereas if the schema author has any unique constraints then the ones where they get to pick the declaration order should be most preferred for this purpose (the schema author may not get to pick column declaration order). And perhaps SQLite3 should copy MySQL's syntax for declaring non-unique primary b-tree index: CREATE TABLE ... (..., INDEX USING BTREE ()); This also might set the stage for adding hash tables as b-trees where the key is derived by calling a UDF: CREATE TABLE ... (..., INDEX USING BTREE (udf()); Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
I should add that WITHOUT ROWID ended up making a very small difference in bulk-load performance of my application, but a bigger and noticeable difference in incremental operation (where it's most welcomed and desired anyways) performance. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users