Re: [sqlite] I keep getting seg faults building my database using python sqlite3
On Fri, 18 Nov 2016 08:55:11 -0800 "Kevin O'Gorman"wrote: > All of the python code is a single thread. The closest I come > is a few times where I use subprocess.Popen to create what amounts to > a pipeline, and one place where I start a number of copies of a C > program in parallel, but each is a separate process with its own > input and output files. These C programs have been in use for a > number of months for earlier stages of this project, and I regard > them as quite reliable. None of them uses threads, and they are > mostly very simple filters. As you know, a process started with Popen cannot corrupt the Python process's memory. If you're not doing anything to defeat the GIL, a segfault inside the Python interpreter would be considered a bug. But is it happening in the interpreter, or in SQLite for that matter? ISTM that's what you need to know. To know that, you're going to need to run a debug version of the interpreter under gdb. When it faults, a backtrace will tell you where. That's not definititive proof; memory corruption is often detected far from where it was caused. But if the fault is at a consistent place in SQLite code, for example, you can use a hardware watchpoint to discover what's writing to it. I don't know what more to suggest. I would be surprised if you find a fault in Python, in the Python standard library, or in SQLite. I'm sure it won't be in anything on the other side of a popen call. Are there non-standard libraries or Python modules in use that you haven't mentioned? The most likely culprit in my mind is RAM. You're exercising new memory pretty hard, running a bunch of processes at it at full tilt. Any defect in the chips or DMA could explain what you're seeing. An easy test, not necessarily cheap, would be to replace the RAM (or, if possible, run with some removed). I have two war stories related to rotten I/O hardware, where the device appeared to work for all intents and purposes, but was actually a high-speed bit munger. Those were both over 20 years ago. It will be interesting to hear if that turns out to be your issue. HTH. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Changing ID's to UUID
On Wed, 16 Nov 2016 08:59:03 -0600 Paul Egliwrote: > > Using the systemid sequence and the recordid sequence directly > > however, has a 0% probability of collision, so any rational person > > would use that directly and forgo entirely the introduction of > > uncertainty and bugs using "UUID" type crappola will cause. > > > > As Dominique said, the issue here is decentralization... Decentralization, you say, but not no centralization. If the data on those disconnected devices never came together, their keys would never conflict. We've handled this before, more than once. Ethernet cards have unique addresses. The Domain Name System supports a certain amount of wackiness, but doesn't rely on randomness. Of course, those systems were designed by competent engineers > and i would add, particularly in a disconnected environment and/or > one with no central authority. The method you describe does not > handle device rollbacks or cloning. I don't see how the method you describes solves anything. If two devices represent the same data -- or data belonging to the same entity, or whatever -- they need a shared identifier to represent that ownership. If that identifier is chosen randomly or taken from the devices, it will not join the information. The owner of the information will at some point have to assert their commonality: will have to apply a known, common identifier to both sets of information. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3 crashes mysteriously on 3.6.20-1
On Mon, 14 Nov 2016 20:30:57 -0500 "pisymbol ."wrote: > One last thing: This is during initialization and I access the > database through that query several times before hitting this crash. > > I thought it was memory corruption but it always the same line. Trying to be helpful, even if it doesn't sound like it: That doesn't exonerate your code! Presumably you do something with the results of those several queries. Probably what you do is highly deterministic, maybe identical, run upon run. Likely is you're just corrupting memory in the same way each time. Not corrupting as in "writing to random memory", but as in "writing in a determistic way to memory you don't mean to". That the error is repeatable suggests it's not related to a race condition, but where threading is concerned that's never a culprit to be dismissed. I would run your code under valgrind first. If that doesn't find anything, trap the segfault in gdb and find the basis for it, even if it's deep in the parser. Some offset/pointer is wrong. Find out what it is, and set a watchpoint on it. If it's as determistic as you say, I'll lay odds gdb will stop at a *very* surprising place, in your code. HTH, really. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Autoincrement sequence not updated by UPDATE
On Tue, 1 Nov 2016 11:01:24 + Simon Slavinwrote: > attempts to change a value in that column using UPDATE always > generate an error. I didn't know that. I looked it up. Apparently > Microsoft's SQLSERVER blocks it Blocks but does not prevent. https://msdn.microsoft.com/en-us/library/ms188059.aspx --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Default ordering of SELECT query
> > > > Any DB update or insert or vacuum or analyze might alter the > > > > row order produced where the ordering isn't explicit. > I am interested only in the same ordering between two subsequent > query executions. The things which can happen in between are: > * checkpointing > * close/open in with different journal mode: DELETE->WAL/WAL->DELETE You can't assume that it'll give you the same ordering even if nothing at all happens in between. You can't assume even that two executions on exactly the same database, under exactly the same conditions, will return in the same order. There's no contractual requirement for the implementation to be deterministic. If you write code and it works today, it might not work tomorrow. Imagine: perhaps the internals of sqlite will be rewritten tomorrow; perhaps the query will be executed on two threads, and the results of the two threads will be merged; perhaps they'll be merged arbitrarily if there's no reason to return the results in a specified order. This won't be a breaking change as far as the sqlite API is concerned; but it'll break your code. The only safe way to write the code is to go by the strict semantics of the statements you're executing; i.e., trust it to fulfil its contract, but nothing else. James Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this query much slower when I index the columns?
On Sat, 2016-10-01 at 13:57 +0200, Clemens Ladisch wrote: Heather, James (ELS-LON) wrote: I have ... ... asked this question elsewhere: http://dba.stackexchange.com/questions/150858/why-is-this-sqlite-query-much-slower-when-i-index-the-columns Yes, I didn't mean this to be subterfuge. It seemed an appropriate question for both fora. James Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this query much slower when I index the columns?
Without any indexes, SQLite is probably creating its own transient index to use for the join. And when SQLite creates an index, it will be creating a single index on all three columns being joined. But if you have your own (inferior, single-column) indexes, then SQLite figures you know what you are doing and so it tries to use your indexes rather than create its own. But the single-column indexes are far less useful at solving the problem. Rather than creating 6 new indexes, I suggest creating just one index: CREATE INDEX gofast ON fakenames_uk (givenname, middleinitial, surname); If that doesn't work. Please run ANALYZE, then send in the output of ".fullschema" for your database and I'll look again. Thanks very much for this. This is odd. I had rather assumed it would be possible to speed up the searching using those individual indexes... in fact I know it's possible, because MySQL performs the same query very much faster when I add the individual indexes. How MySQL differs from sqlite in terms of its optimiser, I've no idea; but am I right that MySQL stands as an existence proof that the individual indexes can be used to solve the problem efficiently? Unrelated stylistic notes: (1) the grave accent quotes (`givenname`) is a MySQL-ism. SQLite understands this for compatibility, but it is not standard SQL. No systems other than MySQL and SQLite understand it. Better to use double-quotes: "givenname". Right. In fact I didn't use any quoting when I created the indexes. When I grabbed the index creation statements for pasting into the email, I got sqlite-browser to give me the statements. I'm surprised it used backticks. I don't know whether that's something sqlite-browser has done for itself or not. (I did try all the queries from the command line, though, so I know that the main issue I've raised isn't a sqlite-browser one.) (2) There exist many people in the world whose names do not parse neatly into "givenname", "middleinitial", "surname". And those people tend to become grumpy when presented with a form that wants a name in that format. :-) Also true, but for my application, it didn't matter. I just wanted to demonstrate to some people in a tutorial that the queries would run much faster if the columns were indexed, so I grabbed some fake names etc. from a web site that generates them, and stuffed them into a database. The column names were generated by the web site. It's not used for anything real. James Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Why is this query much slower when I index the columns?
I have a sqlite database with two tables, each with 50,000 rows in, containing names of (fake) people. I've constructed a simple query to find out how many names there are (given name, middle initial, surname) that are common to both tables: select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial; When there are no indexes except on the primary keys (irrelevant to this query), it runs quickly: [james@marlon Downloads] $ time sqlite3 generic_data_no_indexes.sqlite "select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial;" 131 real0m0.115s user0m0.111s sys 0m0.004s But if I add indexes to the three columns on each table (six indexes in all): CREATE INDEX `idx_uk_givenname` ON `fakenames_uk` (`givenname` ) //etc. then it runs painfully slowly: [james@marlon Downloads] $ time sqlite3 generic_data.sqlite "select count(*) from fakenames_uk inner join fakenames_usa on fakenames_uk.givenname=fakenames_usa.givenname and fakenames_uk.surname=fakenames_usa.surname and fakenames_uk.middleinitial=fakenames_usa.middleinitial;" 131 real1m43.102s user0m52.397s sys 0m50.696s Is there any rhyme or reason to this? I'm running 3.13.0, but I've also tried with 3.11.0, and got the same problem. Here's the query plan for the version with the indexes: addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 26000 Start at 26 1 Null 0 1 100 r[1..1]=NULL 2 OpenRead 0 2 0 6 00 root=2 iDb=0; fakenames_uk 3 OpenRead 1 5208 0 6 00 root=5208 iDb=0; fakenames_usa 4 OpenRead 2 11215 0 k(2,,) 02 root=11215 iDb=0; idx_us_middleinitial 5 Rewind 0 19000 6 Column 0 4 200 r[2]=fakenames_uk.middleinitial 7 SeekGE 2 182 1 00 key=r[2] 8 IdxGT 2 182 1 00 key=r[2] 9 Seek 2 0 100 Move 1 to 2.rowid 10Column 0 3 300 r[3]=fakenames_uk.givenname 11Column 1 3 400 r[4]=fakenames_usa.givenname 12Ne 4 173 (BINARY) 51 if r[4]!=r[3] goto 17 13Column 0 5 500 r[5]=fakenames_uk.surname 14Column 1 5 600 r[6]=fakenames_usa.surname 15Ne 6 175 (BINARY) 51 if r[6]!=r[5] goto 17 16AggStep0 0 0 1 count(0) 00 accum=r[1] step(r[0]) 17 Next 2 8 100 18Next 0 6 001 19Close 0 0 000 20Close 1 0 000 21Close 2 0 000 22AggFinal 1 0 0 count(0) 00 accum=r[1] N=0 23Copy 1 7 000 r[7]=r[1] 24ResultRow 7 1 000 output=r[7] 25Halt 0 0 000 26Transaction0 0 260 01 usesStmtJournal=0 27TableLock 0 2 0 fakenames_uk 00 iDb=0 root=2 write=0 28TableLock 0 5208 0 fakenames_usa 00 iDb=0 root=5208 write=0 29Goto 0 1 000 Thanks very much for your help! The database is moderately large (45MB or so), but contains no private information, so I can provide it if it would help. James Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.15.0 scheduled for 2016-10-14
On Fri, 23 Sep 2016 16:35:07 + Quan Yong Zhaiwrote: > Quote << > A "row value" is an ordered list of two or more scalar values. In > other words, a "row value" is a vector.>> > > A ?row value? is a tuple, not a vector. When your using a tuple, you > know how many items in it, and the type of each item of it. That's correct, and addresses Dominique's point, albeit obliquely: to use a row-value as a list-argument to IN would be to confuse rows and columns. A row-value has 1 or more columns, but only one row. It would be nice to use row-values correctly in IN: select * from T where (a,b) IN ( (1, 'a'), (2, 'b') ) Is that valid? --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite 3.15.0 scheduled for 2016-10-14
On Thu, 22 Sep 2016 12:43:29 -0700 Darren Duncanwrote: > single-element row could be done with say a trailing comma; eg > "(42,)" All hail the Python tuple! "Tuples of two or more items are formed by comma-separated lists of expressions. A tuple of one item (a ?singleton?) can be formed by affixing a comma to an expression (an expression by itself does not create a tuple, since parentheses must be usable for grouping of expressions). An empty tuple can be formed by an empty pair of parentheses." https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Complicated join
On Thu, 15 Sep 2016 15:53:10 + (UTC) David Bickingwrote: > (1) The CombinedKeyFields must always match in each table(2) Match > using the EvtNbr, but if no match, use the lowest M.EvtNbr that > matches the CombinedKeyFields > > (3) Match using the TransDate but if no exact match, match on the > M.TransDate that is less than the E.TransDate but greater than the > prior E.TransDate I think this is what you describe: select E.CombinedKeyFields, max(M.EvtNbr) as EvtNbr, max(M.TransDate) as TransDate from E left join M on E.CombinedKeyFields = M.CombinedKeyFields and E.EvtNbr >= M.EvtNbr and E.TransDate >= M.TransDate --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] When is data committed on one connection seen on another?
On Wed, 14 Sep 2016 18:29:36 + (UTC) Alex Wardwrote: > Perhaps our schema needs a rework, would one table with a million > rows be better than 500 tables with 2000 rows each? 500 tables isn't right or wrong, but *counting* tables is. Table count is not a design-quality metric. There is no rule of thumb except Boyce-Codd Normal Form. Table design reflects the entities you choose to represent your domain of discourse. You should strive, as Einstein advised, to make your model as simple as possible, but no simpler. Sometimes we see schemas on this list with sets of identical tables, each set dedicated to a different client or somesuch. Those folks buy themselves trouble, because otherwise identical SQL has to vary by tablename, adding nothing but complexity to the application layer. At the opposite end of the spectrum is the classic entity-attribute-value design error. One table conquers all, including the programmers when they discover how slow things are when SQL is used before learned. Number of columns? Many widely used and correctly normalized financial market databases maintain thousands of rows on hundreds of columns. Complex models can easily have a hundred tables. Before I would agree a design has "too many" tables, I would want an affirmative answer to one of two questions: 1. Is there a process in place to dynamically extend the schema? If so, that indicates data -- some change over time -- has found its way into the metadata (the table names). 2. Do you frequently find yourself looking in more than one place for what -- to you -- substantially the same information. That would indicate the tables do not reflect your mental model of the problem domain. There are forces that drive intelligent table design other than the logical model. Things like performance and convenience inevitably cause tables that would otherwise pass textbook muster to be split and combined. Those things count, too. One thing that doesn't count is the count. Regards, --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
On Fri, 16 Sep 2016 07:29:28 -0400 Richard Hippwrote: > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory > than a full-up "ORDER BY" because is only keeps track of the top N > entries seen so far, discarding the rest. But it also only uses a > single thread. My immediate thought was that this is an optimization opportunity. As the OP alludes to, N is the sum of LIMIT and OFFSET. Would you have information on how these are typically used? My guess is that the LIMIT argument is typically small, less than 20, but that OFFSET marches on, and grows to be a significant fraction of the table. If LIMIT N is small and OFFSET is not used, a memory-efficient, nonlocking parallel algorithm would reserve N slots for each thread, and divide the table among the threads, each processing 1/threads rows. Then merge-sort their outputs. Humbly submitted, --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow Sqlite3 Select Query while Insert/Update/Delete has no issue
On Wed, 14 Sep 2016 16:27:37 +0530 SinhaKwrote: > strlen(MyString.str().c_str()) BTW, as a matter of style, MyString.str().size() gets you to the same place sooner. > MyString<<"select TokenNo,Price ,sum(QTY) from 'Stream0' where > TokenNo=?1 and Side=66 group by Price order by Price desc limit 5"; You should group by TokenNo, Price. You should not have 'Stream0' in single quotes; that makes it a string. But I agree with Teg: SQLite is providing you with transactions you don't need, and puts an interpreted language exactly where you don't want it: in a performance-critical spot. The C++ standard library has all the bits you need, and is almost as convenient to use. You have only one table, and probably just a few simple queries. std::set gives you lower_bound and upper_bound. Hand those two iterators to std::accumulate, and you have GROUP BY. Call that for 5 prices. Not very much code, and I bet 100x faster than SQL. If more than one thread is updating the table, obviously protect your set with a mutex. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] WHERE col IN tab
On Fri, 16 Sep 2016 16:59:17 +0200 Dominique Deviennewrote: > Is that <> SQL standard? No. The two most frequently used pointless words in SQL are "select *". The SELECT clause (not statement) chooses columns; in relational algebra terms, it's a project operator. If "all columns" is what you need, what you don't need is projection. Requiring "select *" makes as much sense as requiring "WHERE TRUE" if there is no restriction. The strict select-from-where construct in SQL is an artifact of its roots in IBM's 1970s-era "4th generation" languages. That's why the language looks so much like Cobol and so little like math. But it is what it is. In SQL, a tablename is a parameter for FROM (and nowadays, JOIN). Predicates -- IN, EXISTS -- take SELECT. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bug in CREATE INDEX
On Mon, 8 Aug 2016 10:48:58 -0700 "Kevin O'Gorman"wrote: > Very cool. But 4? I will be running this on machines with 8 and 16 > cores. Does going beyond 4 not help much? Four doesn't seem like a bad starting point. I don't have any information specific to SQLite, but in general adding cores to a sort helps only until I/O bandwidth is saturated (or RAM is exhausted). And there's always the possibility the system may have other things to do besides sorting. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] C API - Parameterized Atomic Transactions
On Tue, 9 Aug 2016 17:09:39 -0300 Paulo Robertowrote: > I would like something like this: > > "BEGIN EXCLUSIVE TRANSACTION;" > "SELECT counter FROM mytable WHERE counterid = ?;" > "UPDATE mytable SET counter=? WHERE counterid = ?;" > "COMMIT TRANSACTION;" begin transaction; UPDATE mytable SET counter = ( select 1 + max(counter) from mytable where counterid = ? ) WHERE counterid = ?; select counter - 1 as counter from mytable where counterid = ?; commit transaction; Standard SQL. Doesn't rely on BEGIN EXCLUSIVE. Should be just as fast. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] REFERENCES from TEMPORARY table to main database table?
On Sat, 23 Jul 2016 01:06:23 + "Smith, Randall"wrote: > Using "REFERENCES main.Symbols" appears to be a syntax error. I think you got caught by a special case. Any kind of DRI would be hard to enforce across database boundaries. In general, if we have two databases, there's no requirement they be used together, no requirement they both be attached. Although the DBMS could conceivably implement a rule that says any constraint referencing a nonexistent (unattached) database yields False, I haven't used one that works that way. To a one, they simply prohibit schema-qualified names in DRI rules. (I suppose that's one justification for the use of triggers.) --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] insert or replace on PK and UNIQUE INDEX
On Tue, 12 Jul 2016 15:35:20 +0200 Dominique Deviennewrote: > Now we know OR REPLACE is never what we want (in our use cases), Besides being nonstandard, REPLACE is not atomic. I haven't seen the use case that benefits from those characteristics but, judging from this list, you're not the first person bitten by them.. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] builtin functions and strings with embedded nul characters
On Mon, 4 Jul 2016 13:07:18 +0200 R Smithwrote: > I think you are missing an important bit in all of this - the strings > in C is the problem, they think a Null character indicates > termination. It has nothing to do with how SQL stores data - SQLite > will store it with all bytes intact, but you typically retrieve or > set it via some C calls using a C api.. and this is where the problem > is. Dijkstra: On anthropomorphism in science https://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD936.html C doesn't have strings, and they don't think. C has some standard functions that by convention treat byte arrays as strings. The convention is to signify EOS with a NUL bytes. Using those functions on arrays with non-terminating NULs will probably lead to undesired results. IIUC, there are some string functions in SQLite (including in the SQL itself) that behave unpredictably if presented with strings that include embedded NULs. That needs no defense: it is a defect. The DBMS keeps (as it should) explicit lengths for all data. Treating NUL specially is only problematic. Saying their behavior is undefined in that case at least gives the user fair warning; better would be make it defined. A few people in this thread mentioned something along the lines of "SQLite data are encoded as UTF-8". That's not true. It does not check that text is correctly or uniformly encoded, nor does it record what encoding is in force for a given database. It would be more accurate to say that SQLite supports 4 Unicode encodings. The default encoding is "binary", which is to say unencoded bytes-as-text. If the comparison function is memcmp(3), NUL needs no special treatment. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint violation
On Tue, 28 Jun 2016 19:19:43 -0700 J Deckerwrote: > Duplication can also result as part of the - in process - moving of > rows. To change the order of [1,2,3,4] to > [1,3,2,4] there is(can be) a state that is [1,2,2,4] before the > second part that sets three back into 2. I'd just like to point out to the OP that, while SQLite does behave in the way described above, it's a flaw. SQL semantics are per-statement, not per-row. From an SQL perspective, any "state" that might occur within a statement is meaningless. In SQLite, an update to a primary key K fails update T set K = K + 1; if, for any value v in K, there exists v + 1. By the rules of SQL, it should succeed and, in most other DBMSs, does. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_column_origin_name for INSERT and UPDATE
On Fri, 17 Jun 2016 19:59:56 +0200 Rapin Patrick <rapin.patr...@gmail.com> wrote: > 2016-06-17 18:24 GMT+02:00 James K. Lowden <jklow...@schemamania.org>: > > > You are encoding type information in the name. If you move the type > > information into the data, SQLite can manage the unit dimension. > > You could use a CHECK constraint to require that speed was in m/s, > > or a trigger to make it so. > > That would indeed be an option. I suppose that you mean to use a > string like "2.34 m/s" or a BLOB of typically 9 bytes (a 'double' and > a unit enum). My first instinct is to add a Units column to any table with a measurement. The value in Units would be a foreign key to a Units table. Another table, Conversions, could hold conversion factors. That would support multiple units for a given column in a table or, with a trigger, enforce (and coerce) all values in a column to use a single unit. It would also permit arbitrary conversion in SELECT, simply by reference to the Conversions. A more sophisticated DBMS would have stored procedures for inserts, user-defined types, and INFOMATION_SCHEMA. With those, you could define a type domain for each measurement. You could forgo the Units column, and instead use just a Units parameter. The stored procedure would look up the units for the measurement's domain, and apply the any conversion or reject the insert. That seems be roughly what you're doing with your C++ wrapper. > But this would slightly increase the database size, and slow down > access due to formatting / parsing. I doubt you would notice. > In addition, my approach had the big advantage to being backward > compatible: older versions of our application would just ignore the > column unit type, and read/write regular 'double'. If I were working with you, I would object to the requirement that the column names encode anything. I would want the units expressed explicitly in the database, however they might be applied or enforced. >From an application-programming perspective, I would prefer to treat the column names as opaque identifiers. Rather than parsing them for "x_per_y", I'd prefer to look up the units by column name. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Correct, best, or generally accepted database structure for groups of things
On Fri, 17 Jun 2016 07:37:16 +0100 Chris Lockewrote: > I fail to see what any of this has to do with sqlite. I thought this > was a mailing list for sqlite? Seeing queries (no pun intended) on > sql statements is very subjective, especially with the limited data > provided by the original poster. A query question frequently exposes design choices, either logical or physical. Both of those can have great effect on the utility and performance of the system. Answering SQL questions helps people use SQLite more effectively, and to understand where it differs from other DBMSs. Queries occasionally provoke changes in SQLite itself, either because the output was wrong (or unexpected), or because it presented a case for optimization. I have to believe that real queries from users on this list serve to inform the developers in how SQLite is used. (I have never seen a homework question on this list.) > it won't stop there, and as soon as the original poster has another > query No, it won't, because it hasn't. I've been hanging out here for 18 months, and I remember only one annoying trivial-query participant. Out of 13,693 messages, that doesn't amount to much. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Correct, best, or generally accepted database structure for groups of things
On Thu, 16 Jun 2016 20:53:25 + "Drago, William @ CSG - NARDA-MITEQ"wrote: > CREATE TABLE Apples ( > ID INTEGER PRIMARY KEY, > Color TEXT COLLATE NOCASE, --Could be Red, Green, or Yellow check Color in ( 'Red', 'Green', 'Yellow' ), -- FTFY > Height REAL, --Measured in cm > Width REAL --Measured in cm > Weight REAL --Measured in grams > ); > > And say I had a function that looks at the Apples table and finds > groups of 4 apples that match in color, dimensions, and weight. create view FourApples as select max(ID) as ID , Color, Height, Width, Weight from Apples group by Color, Height, Width, Weight having count(*) = 4 ; Why bother with a table? --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Trouble coding conditional UNIQUE
On Thu, 16 Jun 2016 23:23:33 +0200 Dominique Devienne <ddevie...@gmail.com> wrote: > > > On Behalf Of James K. Lowden > > > > > > create view vParts as > > > select 1 as Matched, * from Parts > > > UNION > > > select 0, * from UnmatchedParts > > > > Why UNION instead of UNION ALL here? > Thanks to the 1 and 0, there can't be any dups, so the dedup step of > UNION wastes cycles, no? Well spotted, yes! (Maybe one of these days we'll have a relational query language without the strange "select distinct / union all" contradiction.) I think you're right that, by construction, these two sets are distinct. I was going to suggest that perhaps SQLite could detect that case, and skip the de-duplication step. But I think it's a bridge too far. To begin with, there's no way in SQL to *declare* two tables represent distinct sets. If we can't even do that, how can the DBMS be expected to deduce that any two derived sets are distinct? --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_column_origin_name for INSERT and UPDATE
On Fri, 17 Jun 2016 10:56:32 +0200 Rapin Patrickwrote: > And my C++ wrapper then knows that this column is a speed expressed > in meters per second. So when making a SELECT on t1 table, the > wrapper will output number objects with unit dimension of Speed > expressed in m/s ! You are encoding type information in the name. If you move the type information into the data, SQLite can manage the unit dimension. You could use a CHECK constraint to require that speed was in m/s, or a trigger to make it so. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Trouble coding conditional UNIQUE
On Tue, 14 Jun 2016 16:27:29 + "Drago, William @ CSG - NARDA-MITEQ"wrote: > Once the part has been grouped into a set (Matched=1) it receives a > unique permanent serial number and the temporary serial number can be > reused, so (Model, TemporarySerialNumber) doesn't have to be unique > anymore. Information about what parts belong to what set is stored in > a different table. > > So is using a conditional constraint in this case okay, or is there a > better way? It's OK, Bill; it was OK before I posted. I'm only suggesting an alternative you might like better. You pasted your particulars into my generic description, and they fit perfectly. You have two sets: 1. Parts with a unique, permanent serial number. 2. Parts with a temporary, reusable serial number. You could remove the latter set to a new table, perhaps "UnmatchedParts", having the characteristics you want, namely a primary key or unique constraint on (Model, TemporarySerialNumber). Then you don't need the Matched bolean column in either table, solving your original problem (a conditional constraint). Then you have a view, create view vParts as select 1 as Matched, * from Parts UNION select 0, * from UnmatchedParts ; I think there's a natural tendency to put similar things in one table when they share common properties, sometimes by adding a discriminator column. I've seen lots of tables like that, and designed some myself upon a time. SQL encourages it, because that way you can write just one INSERT, etc., and just set the flag right. When you push hard on such a table, though, by trying to do things right, you wind up with little conundrums (conundra?) like the one you posted. They grow out of the fact that the things aren't the same. They're just a smidgen different, so they need to be tweaked just so, and before you know it you either have to lean on some oddball feature of the DBMS, or punt. If you separate them, the you might have more query text, but each one will be simpler and easier to understand. HTH. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Apple announces new File System with better ACID support
On Tue, 14 Jun 2016 10:49:05 +0900 ??wrote: > > On 13 Jun 2016, at 10:13pm, Richard Hipp wrote: > > > > The rename-is-atomic assumption is so wide-spread in the Linux > > world, that the linux kernel was modified to make renames closer to > > being atomic on common filesystems such as EXT4. > > http://man7.org/linux/man-pages/man2/rename.2.html rename(2) *is* atomic. That doesn't mean it's synchronous with respect to external storage. It only means that no two processes will ever see the file "in flight" in two places. If process A calls rename(N,M), at no point will process B have acceess to both N and M. Once M is available, N is extinquished. That's a useful property for a process that succeeds, and for which the OS successfully flushes the data to disk. When Richard says rename isn't atomic, he means that it's not synchronous with respect to the disk. It makes no guarantee that the directory entries were updated on disk. The rename happens in the kernel's filesystem memory structures, which *eventually* are persisted to disk. I have heard that that time lag may be measured in seconds. > I am interested to know what it would take to make linux renames > fully atomic. Reading it as is it feels like the action of rename > would be the most important piece to making rename atomic. The docs > claim this is atomic. What other aspects would be necessary? To make Linux rename fully synchronous is technically infeasible and politically impossible. On the political side, the preference in Linux is invariably for performance, often at ever-finer divisions of responsibility. As an example, Unix fsync(2) traditionally updated both the file and its metadata; Linux divided those into fsync and fdatasync, and added the requirement to call fsync on the directory. What was once a single call became 2 or 3. As a technical matter, it's really infeasible because there are too many moving parts: kernel, filesystem driver, and hardware. It is possible for a human being to know what kind of disk is installed and how configured, and to know the semantics of a given filesystem. It is not possible for the kernel to patrol all those things, and hence the kernel cannot make any guarantees about them. (To take an extreme example: NFS.) By the way, every DBMS I know anything about (and SQLite no exception), tends to eschew OS services except at the most minimal level. The internals of a DBMS carry a lot of state information unavailable to the kernel that the DBMS uses to prioritize how memory is used and when and where I/O is required. That's why every DBMS has its own logging mechnism, and some bypass the filesystem altogether. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Trouble coding conditional UNIQUE
On Mon, 13 Jun 2016 19:11:29 + "Drago, William @ CSG - NARDA-MITEQ"wrote: > I need UNIQUE(B, C) only when E=0. A conditional constraint is evidence that you have two kinds of things represented in one table: those E=0 types that are identified by {B,C}, and the rest. They're represented in a single table because they seem to have the same columns, although the E=0 types don't need an E column. A better solution might be to separate the two types into to two tables, each with its own constraints, and use a UNION to represent them as one. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Managing SQLite indices.
On Tue, 14 Jun 2016 01:04:27 +0100 Simon Slavinwrote: > When your application runs fast enough not to annoy you, you're > done. If you're not willing to do step (1), don't bother with > anything else. Simon's entire post is excellent advice. To the OP: print it, and frame it. I would only add to > Plan your schema properly. the "properly" is code for BCNF or at least 3NF. It's the gift that keeps on giving. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE statement without FROM clause
On Sat, 4 Jun 2016 18:18:36 +0200 skywind mailing listswrote: > At the moment I have to run something like: > > UPDATE A SET item1=(SELECT B.item FROM B WHERE B.ID=A.ID),... > itemN=... WHERE EXISTS (SELECT 1 FROM B WHERE B.ID=A.ID); > > Using a FROM clause I just need one scan through B (at least in > principle). Now, I need N+1 scans. Nonsense. SQL provides no instruction to the implementation on how to organize or traverse the data. SQLite is free to scan B once, twice, or not at all. Syntax has nothing to do with performance. A correlated subquery is an expression of logic; it's not meant to be taken literally, and often isn't. This particular form "just" needs to be recognized by the optimizer. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQL / SQLite for Beginners
On Thu, 26 May 2016 10:54:30 -0400 r.a.n...@gmail.com wrote: > FWIW, since it's inception, S.Q.L has been pronounced allot like > CICS. This may be more true than you know. It's not too hard to find old-timers who pronounce it "kicks". --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Sqlite incompatibility with Postgres
On Fri, 20 May 2016 14:17:25 +1000 "dandl" wrote: > Every aggregation function is at least second order: a function that > applies a function to the set. So for MIN the function is 'less > than', for SUM() the function is 'plus' and so on. In Andl > aggregation functions are provided by fold(), which takes a function > as an argument. I want you to know that you hijacked my Saturday. I was bothered about what "first order" and "second order" mean, suspecting that we meant different things. After an afternoon with the Oracle of All Knowledge, I think we were talking about different things, and you had a better handle on your end than I did on mine. I was concerned that we were treading in territory outside first-order predicate logic. On review, as Wikipedia explains, HOL deals in another beast, namely the quantification of sets of sets. You were talking about something much simpler, second-order *functions*. The input is still a value -- an individual member of a set -- plus another function. As you say, there are many such in SQL. In keeping with the language's purpose, the primitive components are not exposed, so it's not possible to reconstruct min as FOLD(MIN,X). We can do similar things with subqueries, e.g. select sum(N) from (select count(*) as N from T group by a) as A One can imagine that restated as select F(sum, count, t) from T where F is defined as taking two functions and a value. I guess that would make F a third-order function. APL is instructive in this regard. What we usually call operators -- + - x ? -- are termed *functions* in APL, in keeping with their mathematical definition. A function that takes a function is called an operator. One such is "/", the reduction operator; SUM(t) could be expressed as +/t > > 2. Limit, as currently implemented, lies outside the theory > > because it doesn't operate on sets. > > I'll put that one on hold pending a suitable reference or detailed > mathematical treatment. I think I can accept "first(N)" could be a set function, and if SQL dealt in sets, LIMIT would be a deterministic function. But SQL deals in bags, and with a couple of odd exceptions -- random(), now() -- all its functions are determistic. LIMIT is not a deterministic function. I'm not sure what happens to first order predicate logic in the face of nondeterminism, but I'm sure it's not good. > Sorry. Your nth() is a second order function OK. > The (single-pass) implementation would maintain a temporary table of > rows that are 'minimum so far seen', to a maximum of N. It would be an > implementers decision what to do with a row equal to one in that > table once N has been reached: add it or ignore it? nth() acts on a single column; it keeps the set of N smallest values, as you say. The answer to your question is "ignore it" because a value equal to one in the set is already a member. Given the input C {1, 1, 2, 2, 2, 3} min(C) = 1 nth(C, 1) = {1} nth(C, 2) = {1, 2} I'm not claiming any deep insight, only that nth() would be handy and can be defined mathematically (even if I can't do it). --jkl
[sqlite] Sqlite incompatibility with Postgres
On Thu, 19 May 2016 10:29:48 +1000 "dandl" wrote: > > Restriction is applied to the values of the tuple. The number of > > tuples is not a value of the tuple. > > No, I can't agree. Restriction is a membership test, a function on > members: should this tuple be included in the result set or not? > Cardinality of a set is a second order function on the members of the > set, obtainable simply by examining all the tuples in the set at the > same time. There is no a priori reason not to use cardinality in a > membership functions. That's an interesting perspective. If you're dealing with genuine sets, and you define your language in terms of second-order operations, then something like LIMIT could be included. Would have to be, I guess. But that's not what SQL is, or what LIMIT is. You were rather dismissive of my nth() function, but that approximates what LIMIT does (approximation is all that's possible) with a first-order operation. BTW, I still think you're agreeing with me. I'm insisting on using the "values of the tuple", implicitly restricted to first-order operations. Cardinality, as you say, as a second order *function*, hardly a "value". But at least I understand your argument now. > To that you can successively add negation, recursion, higher order > functions and fixpoint/while. Each of those allows operations that > others do not, but there is disagreement about which should be > considered 'relational'. OK, I see. It's fitting that the debate is about the definition of the set of relational operators. I'm conservative in that regard. I'm wary of the complexity that higher-order operations bring to the language. Each higher level brings (I suspect) more complexity than the prior, while solving fewer new problems. I think recursion is a good extension, and a good example. It permits the expression of hierarchies. It's indispensable ... for maybe 1% of queries. I guess you could convince me it makes SQL Turing Complete, but that's a very dense thicket. Recursive structures are useful. If they could be manipulated without making the language Turing Compiete, I'd consider that a plus. > Thank you for the reference -- I didn't have that one. I'm familiar > with the material. You're welcome, and it shows. I think we've managed to hash out some agreement: 1. Second order functions are "relational", or can be, depending on one's definition. We have support for them already in SQL. 2. Limit, as currently implemented, lies outside the theory because it doesn't operate on sets. Regards, --jkl
[sqlite] foreign_key_check mystery
On Wed, 18 May 2016 19:06:30 +0200 R Smith wrote: > > I'm not convinced the requirement that the referenced columns be > > unique is justified > > How do you see a parent-child relationship possible where the parent > is not Unique? I think I can convince you that uniqueness is a good rule of thumb, but that enforcing it ahead of RI is undesirable. But the price of making me think about it is reading a long answer. The referenced table represents the domain of the foreign key relationship. When we say foreign key R(A) references S(B) we're saying every value in A appears in B. We're not saying anything about B: not its type, not its cardinality. The statement describes A only. We're also saying something logical, not enforced by the DBMS: that R is a part of S. S can exist without R, but R without S is meaningless. But that's a modelling question, and I can't think of another SQLite feature that enforces any aspect of database design. Why start here of all places? Was that even the intention? So lets's look at my table T and its FK , foreign key (SegName, DbdName) references Segm(Name, DbdName) as it happens, Segm was defined with , Nametext not NULL primary key , DbdName text not NULL One day, Segm may have a 2-column PK: primary key (Name, DbdName). But today, Name uniquely identifies it. (Note that therefore {Name, DbdName} also uniquely identifies it!) T extends Segm, and requires that its {SegName, DbdName} pair appear in Segm. Let's assert that's *correct*, even though Segm.Name is unique today. What is *wrong* with saying the FK relationship refers to more columns than are in the domain table's PK? After all, the above assertions are still true: 1. T{SegName, DbdName} must be in Segm{Name, DbdName} 2. T extends Segm Even more -- though not required IMO -- Segm{Name, DbdName}is unique (because Segm{Name} is unique). You could probably get me to agree that the relationship is anomalous. I suppose if Segm.Name is unique, the FK should refer only to it. In general, though, not every domain is manifested in a table. One might have these PKs: S {A, B} R {B} Now let me assert that R extends S: that is, for any S there could be an R. The rule: If an R exists for S, there is only one, regardless of A. If that's logically incoherent, I don't understand why. Remember, there could be a missing domain table, say, T {B}, and the real rule would be that for some T there must be an R. But T is missing because it has no non-key attributes, and S serves in its stead. That's where "not sure justified" comes from. foreign_key_check nudges the user in the right direction most of the time, and as a linter I have no argument with it. However, as implemented, "foreign key mismatch" prevents reporting of a genune error, namely "FOREIGN KEY constraint failed". By my lights that's putting a design recommendation before a data error, definitely cart before horse. I hope that one day FK enforcement becomes a property of the database, not of the connection. If that comes to pass, this issue needs careful consideration. As things stand, I think it might be better if "foreign key mismatch" were demoted to a warning. --jkl
[sqlite] Sqlite incompatibility with Postgres
On Wed, 18 May 2016 20:29:26 +1000 "dandl" wrote: > > 2. Otherwise, if exactly the number of specified rows must be > > returned without other restrictions, then the result is possibly > > indeterminate. > > I agree, with one tiny tweak. The SQL standard already notes that > certain queries of this kind are "implementation-dependent". Here is > an example. > > "If the does not contain an , or > contains an that [...] Anything implementation-dependent depends on the implementation. Ergo, it is not mathematicaly defined. It has no theoretical basis. Darren Duncan wrote: > The options with point 1 are not only deterministic but fully > relational. Darren, I agree one could define LIMIT to be deterministic. But, as you know, deterministic does not imply relational. The point I've made consistently is that the input to LIMIT is not a value in the database. As such, it couldn't be a relational operator, and it's no surprise it appears nowhere in the literature. Crowning it "fully relational" is stuff and nonsense. I personally don't see any value in making it deterministic. The best use of LIMIT is to control command-line output while inspecting the data. Pretty much every other use invites error. Deterministic error doesn't look like much of an improvement to me. --jkl
[sqlite] Sqlite incompatibility with Postgres
On Wed, 18 May 2016 10:41:21 +1000 "dandl" wrote: > > You lost me at "subset S of N tuples". Which relational operator > > takes N as an argument? > > Restriction determines whether a tuple should be included or not; you > also need cardinality and less than (for comparing members). Restriction is applied to the values of the tuple. The number of tuples is not a value of the tuple. Neither of us is stupid, David. I've boiled this down to something very simple. If you look at it algebraically, I think you'll come to the same conclusion I have. I wouldn't persist except that you're worth convincing. Andl holds promise, and seeks higher ground than SQL holds. Insofar as possible, if I can I want to help you get it right. > For this query: calculate the average of that set of numbers after > excluding the 2 largest and 2 smallest values. Again, a pure set > operation. > > A reasonable solution would be to use two subqueries with ORDER BY > ASC/DESC and LIMIT 2, followed by an aggregation. Sadly, no. If we're talking about a "pure set operation", and the set is {1, 1, 2}, the "two smallest" is {1, 2} but LIMIT 2 would yield {1, 1}. Here again, my putative nth() function *does* give the right answer, simply because it's a function of the values, and not of the number of values. > > There's no debate about the relational operators. By which I meant: there's no debate about what they do. > You might be surprised to learn that there is considerable academic > uncertainty as to exactly which operators should be included. There's no uncertainty. Some operators are defined in terms of others. No suprise: even under De Morgan you don't need OR if you have NOT and AND. The redundancy makes the language more expressive. LIMIT doesn't belong in this part of the discussion, btw, because it is not defined relationally. > For example, is CTE RECURSIVE relational, or not? http://wiki.epfl.ch/provenance2011/documents/foundations%20of%20databases-abiteboul-1995.pdf Cf. Chapter 14. Adding recursion changes the language. It adds power; if memory serves permits answering second-order queries. > What about LEAD and LAG in the windowing functions? What about string > concatenation as an aggregation operator? AFAIK there's no debate about those, either. They can be expressed in terms of simpler operations, and exist for convenience, such as it is. --jkl
[sqlite] foreign_key_check mystery
On Wed, 18 May 2016 08:32:24 +0200 Clemens Ladisch wrote: > You get "foreign key mismatch" if you do not have the required > indexes, i.e., according to a comment in the source, > 1) The named parent key columns do not exist, or > 2) The named parent key columns do exist, but are not subject to a >UNIQUE or PRIMARY KEY constraint, or Thank you, Clemens. The FK declaration was , foreign key (SegName, DbdName) references Segm(Name, DbdName) but the Segm PK was only Name. I'm not convinced the requirement that the referenced columns be unique is justified, but at least now I understand the message. In this case it uncovered an error, so I got two for the price of one. :-) --jkl
[sqlite] foreign_key_check mystery
I seem to be getting a foreign key check anomaly. I've checked the constraint mentioned in the error message (and the other one, just in case). Am I overlooking something, or has this been fixed since 3.8.4.1? sqlite> pragma foreign_key_check; Error: foreign key mismatch - "Field" referencing "Segm" sqlite> pragma foreign_key_list(Field); id seq table fromto on_update on_delete match -- -- -- -- -- -- -- -- 0 0 SegmSegName NameNO ACTION NO ACTION NONE 0 1 SegmDbdName DbdName NO ACTION NO ACTION NONE 1 0 Datatypes DatatypeNameNO ACTION NO ACTION NONE sqlite> select count(*) from Field as f where not exists ( select 1 from Segm where Name = f.SegName and DbdName = f.DbdName); count(*) -- 0 In case it's interesting, the FK declaration in the CREATE TABLE for Field is , foreign key (SegName, DbdName) references Segm(Name, DbdName) --jkl
[sqlite] Sqlite incompatibility with Postgres
On Tue, 17 May 2016 11:09:53 +1000 "dandl" wrote: > Any disagreement so far? Full agreement; your description is perfectly sound. I am quite certain nevertheless that LIMIT has no relational basis. Nothing based on Order By could. And I'll try to clear up what I meant by a cursor. > So the "3" is a perfectly valid argument for a set-oriented theory: > find a subset S of N tuples with the following test for set > membership: that each member of S is greater than each member not in > S when compared by certain attributes, for N = 3. Pure set logic with > a membership function. You lost me at "subset S of N tuples". Which relational operator takes N as an argument? You could be right vis a vis set theory. But strictly within relational theory, I'll cede your point when you demonstrate it relationally. The N in "N tuples" is not to be found in the relation's extension. Even if we include aggregation, all relational functions operate on the *values* of the set, or functions of the values of the set (e.g. min() or avg()). N is not among them. > > "Order by 1" is always valid. > > By analogy, not because they're the same. In order to apply LIMIT 3 > the query parser should require a test of set membership that is fully > determined for every member. It can do that by either requiring all > select list columns to appear in the ORDER BY, or by applying other > constraints such as a unique key. Unless your point is constrained to the LIMIT operator, you're making a mistake here. Normally the Select list is a superset of the Order By list. If they need to be equal for LIMIT to work, that's just one more nail in LIMIT's coffin, a byproduct of ts nonrelationality. Given a table T {A,B}, what's wrong with "select A from T order by B" (even though that's not valid SQL)? The system has access to the full table, can sort by B and project A. Nothing to do with indexes or constraints. The query is invalid not because it couldn't be executed, but because it's undefined: there's no "sort" operator to apply before "project". Order By, not being a relational function, is a filter applied to the relational product. ( Yes, "filter"; filters don't alway remove: $ echo hello | rev olleh ) All members of the Order By list must appear in the Select list because the Select list is input to Order By, see next. > > > Order By just produces a cursor for convenient traversal of the > > > results. > > Not so. In standard SQL ORDER BY establishes a comparison function > between tuples and is part of the DECLARE CURSOR syntax, but the > cursor exists regardless. Regarding Order By and cursors, I'm referencing CJ Date. The reason Order By cannot appear in a subquery is that its output is not a table, but a cursor. Whether Order By "establishes a comparison function between tuples" is irrelevant; so too does Where. It's the product that's different. If you accept that Order By can appear only in the outermost query, it's clear that it has no relational role at all. It affects only the order in which the rows are returned to the caller. Other than syntax, how is that different from a cursor? > The point of LIMIT is that it is a complete query; the rows can > be returned in a single network round trip; the result set can be > discarded. So it's just a tiny communication optimization? After all, compared to a network round trip, sorting the result (in order to apply LIMIT rationally) is usually far more expensive. I bet no study has ever shown LIMIT to improve performance measurably, not that that would justify its existence. > > LIMIT causes harm. Bad results come of bad math. > > Disagree. The problem (if there is one) is that it is not > well-defined. Then I think you mean you agree! Because LIMIT is nonrelational, it's *undefined*. We have a long thread here that might be titled "what should LIMIT do?" There's no debate about the relational operators. It's only the ad hoc add-ons that present problems, precisely because they lie outside the theory and provide (unwittingly) idiosyncratic behavior. Like NULL, LIMIT is unspecified by the theory. Like NULL, LIMIT is hard to get "right" because each implementation has to make its own decision about what it means, instead of relying on the math. --jkl
[sqlite] Sqlite incompatibility with Postgres
On Tue, 17 May 2016 11:09:53 +1000 "dandl" wrote: > > I'll invent here and now to replace LIMIT: nth(). > > The issue is find the "top N". This does not solve the problem. nth() does find "top N". For any query, nth(c, N) returns N rows. It also exposes the arbitrariness of LIMIT. To use nth() correctly for Simon's data select * from T as out where exists ( select 1 from T where out.first = first and out.second = second group by second having nth(second, 2) <= out.second ); produces 4 rows. Deterministically. Unambiguously. The "problem" is to produce 3 rows where, relationally, the only answers have 2 or 4 rows. There is no right answer to the problem because there is no answer to the problem. --jkl
[sqlite] Sqlite incompatibility with Postgres
On Mon, 16 May 2016 16:17:35 +1000 "dandl" wrote: > > > All true. But it brings up a question. Suppose the following: > > > > > > first second > > > - -- > > > Mark Spark > > > Emily Spark > > > Mary Soper > > > Brian Soper > > > > > > SELECT first,second FROM members ORDER BY second LIMIT 3 First, hat tip to Simon for providing a motivating example. :-) The question illustrates what I mean when I say Limit is not "rooted in the data": in this case, "3" is not in the data, and is not a function of the data. Having introduced an extraneous arbitrary element, ambituity and contradiction are inevitable. It's practically the definition of a hack, right? Does the job, albeit incorrectly. > I would say that this is an invalid query. As already applies for > DISTINCT and GROUP BY, the query parser should require that every > column in the column list should appear in the ORDER BY list. If it > does not, then the result is indeterminate. Order By does not requre Group By, and the Select list is a *superset* of the Order By list. I'm not sure where you got the notion that the the Select and Order By sets are equal. "Order by 1" is always valid. David, permit me to elaborate on my indictment of LIMIT. You said earlier: > You can't sort the relation, but you can certainly apply an order when > performing a query. How else would MIN() work? I'm not disputing that. Window functions even require multiple sorts in the same query. Whether or not "LIMIT is perfectly relational", we do know relational algebra has no Sort operator, and that Order By is never part of an input to a relational operation (because of course relatational operands have no order). Order By just produces a cursor for convenient traversal of the results. I'd be perfectly fine with a function I'll invent here and now to replace LIMIT: nth(). It's a generalization of min(); the construction nth(C, 1) is equivalent to min(C). You use it this way: SELECT first,second FROM members where second < nth(second, 2) That query is based in the data. It's unambiguous. Given Simon's input, it produces 2 rows; with "< 3" it produces 4 rows. It can be used without Order By (for the same reason min() can). While it *implies* a sort, it doesn't require one (because indexes), as LIMIT does. And, like min() and unlike Order By, it can be used in a subquery. LIMIT is a hack. It's an "obvious" extension to SQL, so simple it needn't even be part of it, because the program reading the rows from the DBMS can always stop wherever it wants. Simple things are always implemented freely -- even if unnecessary or misbegotten, simply because they're easy to do and understand -- and LIMIT was no exception. Ironically, though, seemingly simple things are very hard, sometimes impossible, to explain mathematically. In that way, LIMIT shelters under the same roof as NULL and SQL's use of bags instead of sets. While that's an abstract argument, it's at the root of very practical problems. LIMIT is a FAQ on this mailing list. Given the number of SQLite programmers, we can bet every day someone uses limit in a subquery, getting or concealing a nondeterministic result. Which is to say: LIMIT causes harm. Bad results come of bad math. --jkl
[sqlite] Any way to do inexpensive SQLite database/table versioning? (I.e. cheap snapshotting & jumping between)
On Sun, 15 May 2016 10:42:37 -0500 mikeegg1 wrote: > I was once told of an idea (decades ago) of versioning data within a > table where one column has a real/float value that is the version > number. You can have a point-in-time database if: * each transaction has an id * DELETE is redefined not remove but simply to mark the row * UPDATE is redefined as DELETE + INSERT * every table includes both a transaction id and a deleted flag Every query would need "where transaction_id <= N and is_deleted = 'N'". There's your structure. Just a small matter of programming. ;-) --jkl
[sqlite] 64bit DLL vs 32bit
On Thu, 12 May 2016 00:36:31 +1000 "dandl" wrote: > But I think if you compile code for the x64 processor chip and call > it from x86 or vice versa then either it doesn't work or you pay a > high price for thunking from one to the other. I think that's > unavoidable regardless of OS. Right: doesn't work. There's no performance penalty because there's no 32-64 bit thunking layer. https://blogs.msdn.microsoft.com/oldnewthing/20081020-00/?p=20523 --jkl
[sqlite] 64bit DLL vs 32bit
On Wed, 11 May 2016 11:30:34 +1000 "dandl" wrote: > > more about DLLs than it is about SQLite. > > Actually, it's everyone using a language other than C/C++, plus a > proportion of those too. I use C#, but if you want to call Sqlite > from Java, Python, etc or even some generic C/C++ app that supports > plug-ins, then at some point there is a DLL How does that follow? Any higher-than-C language has its own binding system, and SQLite is a module of some kind, where the C library is wrapped in the module that exposes its own API. If the module statically links in libsqlite3.a -- as, arguably, it should -- then there's no version ambiguity, no DLL, and no chance of conflict. --jkl
[sqlite] Sqlite incompatibility with Postgres
On Fri, 13 May 2016 15:13:01 +0100 Simon Slavin wrote: > On 13 May 2016, at 3:07pm, dandl wrote: > > > I have no deep knowledge of standard SQL. > > I used to know SQL92 very well. There's no facility for doing > anything like LIMIT or OFFSET in it. You had to use your programming > language to work your way through all the results and skip the ones > you didn't want. I suggest the reason LIMIT hasn't been standardized is that it's contrary to the fundamental idea that rows in a table have no meaningful order. SQL doesn't honor relational theory with complete fidelity, but at least that horse is still in the barn. The problem with LIMIT is it's not based in the data. Cutting off results at some arbitrary N tells you *nothing* about the data other than that N or more rows met the criteria. Note that predicate logic has constructs for "for all" and "there exists" , but not "are some"! I have yet to see a query using LIMIT 1 posted on this list that cannot be expressed -- better, IMO -- with min(). Queries that limit the results to "top N" to support things like pagination inevitably include assumptions about transactionality (or lack thereof) that are either invalid or ill-considered. Every one would be better served either by just fetching the needed rows as required (and letting pending rows pend), or by supplying the last "high" value as a minimum for the WHERE clause instead of an OFFSET. Were I a fan of conspiracies, I'd suspect the LIMIT-OFFSET constructs were invented by antilogicians to prevent learning and hobble performance. By the way, i'm also a LIMIT club member, with limits. I use it for convenience on the command line while exploring data. It's great for that, in the absence of pager support. Maybe keeping it a little "weird" will help remind new developers to use it as a convenience instead of a crutch. --jkl
[sqlite] Working with blob
On Thu, 28 Apr 2016 20:27:17 +0200 "deltagamma1 at gmx.net" wrote: > If I store the blob directly in the sqlite, is there a way to open the > blob directly with the respective programm (e.g. irfanview or a pdf > with acroread) ? I have heard of a FUSE filesystem implemented with SQLite. I don't know if it exposes tables as files, or anything like that. If you're going to use an external program like acroread to view the blob, you're pretty much forced to provide it with a filename, hence you'll have to write your blob to a file, and give the application that filename. A few such programs might be willing to read from standard input, in which case you could provide a pipe. Or you might be able to use a named pipe (mkfifo) instead of a temporary file. I don't know a way to write binary data to a file using the command-line tool. Maybe there's one out there someone else knows of. --jkl
[sqlite] No datasize field - why?
On Mon, 25 Apr 2016 02:31:25 +0100 Simon Slavin wrote: > > These are different concerns, and they don't really pose any > > difficulty. Given an encoding, a column of N characters can take > > up to x * N bytes. Back in the day, "x" was 1. Now it's something > > else. No big deal. > > No. Unicode uses different numbers of bytes to store different > characters. You cannot tell from the number of bytes in a string how > many characters it encodes, and the programming required to work out > the string length is complicated. "up to", I said. You're right that you can't know the byte-offset for a letter in a UTF-8 string. What I'm saying is that given an encoding and a string, you *do* know the maximum number of bytes required. >From the DBMS's point of view, a string of known size and encoding can be managed with a fixed length buffer. > I would definitely be reading the documentation for the SQL engine I > was using. Well, yeah. :-) It's well to know how the software you're using works, whether it's the DBMS or something else. Although I have to say I've never had to worry about the size of my database as a function of string size. When size matters, rows dominate, and large numbers of rows never seem to come with big strings. --jkl
[sqlite] No datasize field - why?
On Sun, 24 Apr 2016 08:51:09 -0400 Carlos wrote: > But, with very fast CPUs and RAM memory buffers for the directory > entries in the disks, the variable length records would probably > result in gain for much less I/O for the data.
[sqlite] No datasize field - why?
On Sun, 24 Apr 2016 14:09:50 +0100 Simon Slavin wrote: > > On 24 Apr 2016, at 1:51pm, Carlos wrote: > > > But, with very fast CPUs and RAM memory buffers for the directory > > entries in the disks, the variable length records would probably > > result in gain for much less I/O for the data. > > Agreed. Which is one reason why fixed-length string columns are less > important and less used now. When the bottleneck is the speed of the > backing store, storing fewer characters can mean the difference > between having to write one sector or two. You still have only two choices: compute or seek. The physical structure is either like an array, and you can compute the record's location, or it's like a list, and you have to iterate. > Another reason is that we use Unicode not ASCII/SIXBIT/EBCDIC, and in > Unicode different characters take different numbers of bytes. So > even if you're storing a fixed number of bytes the convenience of > always knowing exactly how many characters to display no longer > exists. These are different concerns, and they don't really pose any difficulty. Given an encoding, a column of N characters can take up to x * N bytes. Back in the day, "x" was 1. Now it's something else. No big deal. Note that SQL still defines lengths in terms of characters. It's up the DBMS how to store them (regardless of the agreed-on encoding). --jkl
[sqlite] No datasize field - why?
On Sat, 23 Apr 2016 19:22:04 -0600 Scott Robison wrote: > So if you could make your table up of integers, floats, and text > with character limits on them you could get fixed-length rows, which > might reduce your access time by 60% or more. Such a decrease in > access time could mean the difference between being able to update a > database live or being able to update only during an overnight run. As I tried to make clear in my reply to Keith, efficiency concerns are beside the point. The theory underpinning SQL rests on predicate logic and set theory. Freeing the user from concerns of physical storage and addressing were objectives Codd cited in his first paper. In point of fact, unbounded-length records have been supported for decades. Sybase called them TEXT and IMAGE types. You could not search them, though, only retrieve them once the row had been located by other means. Why? Just as you posit: for efficiency. The physical row kept a "pointer" to the TEXT data and, yes, every physical row had the same length, for efficiency reasons. --jkl
[sqlite] No datasize field - why?
On Sat, 23 Apr 2016 14:50:45 -0400 "Keith Medcalf" wrote: > > On Sat, 23 Apr 2016 08:56:14 -0400 > > "Keith Medcalf" wrote: > > > > > Those things that those other DBMSes do are holdovers to maintain > > > backwards compatibility with the good old days when dinosaurs > > > ruled the earth > > > As amusing as your rant is, it's not accurate. Treating columns as > > types is a "holdover" from mathematics and logic. It has nothing > > to do with maintaining backwards compatibility, or the cost of > > RAM. > > The specification of "Length/precision" as well as the use of > "varchar", "varbinary" and the like are holdovers from the days when > files had to have fixed record lengths so BDAM would work. They might have originated in that way on System R. Ingres was developed contemporaneously on Unix, which then and now had no record-based file types. As I pointed out, > > Many new DBMS engines have been written in recent years > > (notably columnar stores) and, of those that support SQL, none > > abandoned strict column types. > (contrary to common mis-belief, varchar(100) does not mean a > "variable length character field with a length up to 100 characters", SQL does *not* define implementation. It defines semantics: user provides X, system replies Y. It makes no statement about how a column is stored. You know that, surely. Why paint yourself into a corner with an argument you know is lost before it begins? I guess I should remind you that length-limited character strings have lots of utilty irrespective of storage concerns. Some strings *are* fixed length, for example cusip, ssn, drivers licence, employee id, phone number. Length-checking is a simple aspect of validation. There are also more banal concerns about external representation. An address might be limited to 60 characters so that it fits in the billing envelope window. Maybe 60 characters is arbitrary, but we both know that 6000 characters will be too many. Length limits help keep the ruby on the rails. > In order for Codd and Date to conceive of a world wherein duck-typing > existed, a duck-typed language would have to exist first. Since such > was inconceivable before its invention, it is no wonder that it never > occurred to anyone that a database column could be ducky too. You are seriously underestimating them. You're misinterpreting the meaning and purpose of column types. Your rant about VSAM is, as physicists sometimes say, not even wrong. If the theory seems abstruse, you also utterly ignore observed detrimental effects of the lack of type enforcement, namely the complexity that arises at SELECT time, when the application has to cope with whatever detritus got parked in the database. On a SQLite scale, many times that's not a problem because writers are highly constrained. But in general it's a central concern, and was one of the motivations for the invention of the relational model. --jkl
[sqlite] No datasize field - why?
On Sat, 23 Apr 2016 08:56:14 -0400 "Keith Medcalf" wrote: > Those things that those other DBMSes do are holdovers to maintain > backwards compatibility with the good old days when dinosaurs ruled > the earth As amusing as your rant is, it's not accurate. Treating columns as types is a "holdover" from mathematics and logic. It has nothing to do with maintaining backwards compatibility, or the cost of RAM. Many new DBMS engines have been written in recent years (notably columnar stores) and, of those that support SQL, none abandoned strict column types. Granted, duck-typing is/was a design choice. The good news is it makes simple things simple. But it's also the source of quite a bit of confusion and perplexity, not to mention errors and complexity: *inserting* is easy, but selection is made more complicated. The application has to be prepared to cope with inconsistencies (and errors) in the data that are prevented by type-checking. In SQLite, the "dba" has to exercise care, in a very reduntant way, to effect type-enforcement of a kind he gets "for free" with a standard DBMS. --jkl
[sqlite] No datasize field - why?
On Fri, 22 Apr 2016 19:11:46 -0700 Darren Duncan wrote: > The general case of a data type definition is an arbitrarily complex > predicate expression whose parts vary on the base type and other > factors. Given this, if component details of type definitions were > split out into their own table_info() columns, you'd have a large > number of columns where most are inapplicable in any given case, eg > some only apply to strings, some only to numbers, etc. And so, just > expressing the type definition as a SQL fragment like table_info() > currently does provides a compact generic representation with all the > details, same as in CREATE TABLE. INFORMATION_SCHEMA. It might not be "a compact generic representation", but it is something CREATE TABLE is not: a tabular representation. Something that can be queried instead of parsed. --jkl
[sqlite] Caveat entry
On Fri, 15 Apr 2016 22:53:57 +0100 Simon Slavin wrote: > On 15 Apr 2016, at 10:05pm, Cecil Westerhof > wrote: > > * SQLite datatypes and how SQLite decides which datatype you want > * SQLite uses affinities not column types > * thinking you should index each column instead of indexes for queries > * consequences of various PRAGMAs which might appear to speed SQLite > up > * consequences of the two different journal types (including file > size) That's a nice list. I might add using LIMIT and ORDER where MIN would serve better. > The most common mistake which leads to posts to this list is -- still > the winnah after five years ! -- premature optimization. I can't > believe how much coding time and software complication is wasted by > people who spend hours on multi-thread, queueing and partitioning > code instead of redesigning their schema or thinking through how > indexes work. Yes, except it's not *premature* optimization, is it? It's misdirected optimization, worrying about the wrong thing because not measuring. Knuth's message is often misunderstood and oversimplified. Perhaps he could be better paraphrased as "don't sweat the small stuff", because he was most certainly not saying, "just make it work and worry about design choices later". He did not mean we should ignore O(1) versus O(n^2) choices when those choices matter. His advice doesn't imply that orders of magnitude are unworthy of our concern. The best database performance advice I ever got still applies: think about how the DBMS will traverse the data, and minimize the work it will do. It's all about I/O. No number of additional threads will bring the data into memory sooner. --jkl
[sqlite] Avoid duplicate sets with join table
On Sat, 16 Apr 2016 01:20:55 +0200 Ketil Froyn wrote: > I have two tables and a join table, in principle like this: > > CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT); > CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE); > CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord > INTEGER); (As a style suggestion, consider not reduplicating plurals. It's a one-man band, a 3-star general, and a 5-finger discount. What you have is a set of record_features.) > Later, if I come across a new record that has exactly features > "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want > to add a new record for this. So given a new set of feature IDs, how > can I best check if there's another record that has these exact > features before I insert it? > > To be clear, a record with features "feature1", "feature2", "feature4" > would be ok. So would "feature1", "feature2". Subsets are ok, but not > exact duplicates. This is an application of relational division: you want to know if two sets are equal. Well, almost. You really want to know if the "new" set is a subset of an existing one. You're also a little stuck for lack of syntax. There are two sets in question: the extant one and the one to be inserted. You can insert a set in SQLite, insert into T values (a), (b), (c); but because that set of rows (a, b, and c) doesn't have a name, you can't refer to them again in the same query. So you have to dance a little jig, something like: begin transation insert into record features ... delete from record features id = new_id where exists ( select 1 from record features as rf where rf.id_r = record_features.id_r and rf.id_f = record_features.id_f group by id having count(*) = ( select count(*) from record_features where id_r = new_id ) ); commit transaction That puts the rows into the database -- where they can be examined -- and deletes them if they turn out to be a duplicate. You could also apply the same logic in an insert & update trigger, and probably should if the rule applies to the *data*, and not just the application's expectations. --jkl
[sqlite] Working with booleans
On Thu, 14 Apr 2016 20:10:08 -0400 "Keith Medcalf" wrote: > select from where isActive; vs. > select from where isActive = 'T'; AFAIK, the SQL standard requires the second form. You can't simply say "WHERE variable"; you must say "WHERE expression". OP: I personally usually use the T/F or Y/N versions over 1/0 because my interactive query results are just a little easier to interpret. Use what you like, because size and performance differences are likely to be negligible. --jkl
[sqlite] Primary key values can be NULL
On Fri, 15 Apr 2016 14:13:12 +0200 Cecil Westerhof wrote: > 2016-04-15 1:19 GMT+02:00 J Decker : > > > I would total expect any column I created without NOT NULL (double > > negative) to allow NULL whether INDEX or UNIQUE or PRIMARY is > > applied additionallywhat database does otherwise? MSSQL? > > > > ?Every database that is not SQLite, because it is the standard.? Last I checked, Microsoft SQL Server raised an error if you declared a table to have a primary key with a NULL column. Whether NULL or NOT NULL was the default depended on database and connection options, but if the effect was to use a NULL column in a primary key, it failed. I stopped worrying about the default a long time ago. I figure every time I type NOT NULL, an angel gets its wings. --jkl
[sqlite] FOREIGN KEY constraint failed
On Tue, 5 Apr 2016 23:56:53 +0200 R Smith wrote: > On 2016/04/05 11:15 PM, Keith Medcalf wrote: > > Are we confusing immediate constraints (checked per statement) with > > DEFERRED constraints (checked at COMMIT time) again? In SQLite some constraints are checked per row, not per statement. > We might be - though I assume the OP implicated only deferred > constraints - since immediate constraints will fail on contact, and > as such, no mystery surrounds their origins. One table may have more than one constraint. Primary key violations are reported as such, but CHECK constraint and FK constraint messages don't mention the column involved. OP: I investigated the problem a year ago or more and concluded it's not easy to remedy. Today violations are simply counted. To report them in detail would require carrying much more state, and externalizing constraint conditions in human-readable form. Unless you have a patch that does all that, I doubt PRAGMA DEBUG_MODE will be realized. --jkl
[sqlite] regression in 3.12.0 vs. 3.11.0, column type information in PRAGMA missing
On Wed, 6 Apr 2016 06:13:01 + Hick Gunter wrote: > You are hopefully aware of the fact that SQLite associates type with > the actual values and not the containers(columns) used to hold these > values? This means that a data object of any type may be > held/returned in a column, irrespective of the declared type (which, > for expressions, is NULL). Yes, but some of us are careful to include CHECK constraints to enforce type-checking. Even when not that careful, many follow the practice of restricting each column to a single type. > What would your date routine do with the string 12.17.9.17.15? It would never see it. Problem solved by prevention: sqlite> create table T(t TEXT not NULL check (strftime('%m', t) is not NULL)); sqlite> insert into T values ('2016-04-06'); sqlite> insert into T values ('2016-14-06'); Error: CHECK constraint failed: T sqlite> insert into T values ('12.17.9.17.15'); Error: CHECK constraint failed: T --jkl
[sqlite] regression in 3.12.0 vs. 3.11.0, column type information in PRAGMA missing
On Tue, 5 Apr 2016 13:19:50 -0400 Richard Hipp wrote: > CREATE TABLE t2(w SHORT INT, x DECIMAL, y BIGINT, z REAL); > CREATE VIEW v3 AS SELECT w+x+y+z FROM t2; > > What should "PRAGMA table_info('v3')" report as the column type? It should report it as for a table, with values consistent with how SQLite will treat the column. A fundamental rule for views is that -- for purposes of SELECT -- they *are* tables. The fact that tables have physical representation in the database and views are derived is, er, immaterial to the relational algebra implemented by the DBMS. --jkl
[sqlite] C API reference manpages
On Thu, 31 Mar 2016 10:21:53 -0400 Richard Hipp wrote: > On 3/31/16, Kristaps Dzonsons wrote: > > > > Is there any interest in integrating this tool to have manpages in > > the doc distribution without downstream bits? > > > > I think that would be cool. Integrating your tool into the source > tree would mean that as the comment formats evolve, your tool would > evolve in lock-step. +1. I'm another who's wanted SQLite man pages for a long time. I hope you can see your way to integrating Kristaps's tool. --jkl
[sqlite] Help needed for COPY Command.
On Fri, 25 Mar 2016 17:18:16 +0100 "Domingo Alvarez Duarte" wrote: > Why not have direct command ".export table_name" and internally it > does all commands you mention in one go, simple and intuitively. Importing has unique requirements. .import filename tablename is quite limited, but necessary: SQL has no "read from file" feature. You could have .export filename tablename but SQL already has a "read from table" feature, and shell already has a "write to file" feature. So an export command would be a redundant subset of what's already available. --jkl
[sqlite] sqlite fixed data loading extension
On Fri, 25 Mar 2016 06:49:22 -0500 Don V Nielsen wrote: > I have a need for something that can parse and load into sqlite tables > fixed length data. Insert your own separators. $ cat input 12345678910111213141516171819202122232425 Print two 5-byte ranges separated by ", ". $ awk '{ OFS=", "; print substr($0, 1, 5), substr($0, 6, 5); }' \ input 12345, 67891 --jkl
[sqlite] Minor documentation improvement
On Fri, 25 Mar 2016 15:06:46 + Simon Slavin wrote: > Could something be added to indicate that it pays attention to > '.separator' ? If you don't already know, you might think it is > fixed to one file format. It could be as simple as > > ".import FILE TABLE Import data from FILE into TABLE. > See .separator ." The most confusing part of .import for me was figuring out that the FILE needs quotes. --jkl
[sqlite] Article about pointer abuse in SQLite
On Tue, 22 Mar 2016 11:00:24 -0500 "Marc L. Allen" wrote: > I don't think compilers "run" your code. Provided we're talking about a C compiler, you're right. Optimizers don't run the code, they reason about it. > The fact that the code never actually allows that path to occur is > beyond the scope of most compilers, isn't it? Yes and no. If the compiler can prove a particular branch can never be taken, it can remove it because the logic of the program will not be affected. If it cannot prove that, the code will remain. For example, given int foo = 0; if (foo) exit(0); the compiler can delete lines 2 & 3. If there's no other reference to foo, it can delete line 1, too. However, extern int foo; if (foo) exit(0); and int foo = 0; extern int *pfoo; pfoo = if (foo) exit(0); both leave most optimzers flat-footed. The potential for another module to affect the value of foo means the code could run, and thus must remain. --jkl
[sqlite] Article about pointer abuse in SQLite
On Tue, 22 Mar 2016 09:58:52 -0400 Adam Devita wrote: > I don't know the reasoning, but it seems that VS6 often > initialized things to 0xcd in debug mode and (usually) had memory > uninitialized to 0x00 when complied in Release (perhaps 0x00 just > happens to be what was on the stack or heap). I would be talking out of school here if you're talking about C#. For C and C++, the 0xcd initialization helps make (mis)use of uninitalized objects more obvious. If the allocated buffer happens to be zero-initialized, things like printf will make them appear empty when they're actually invalid. This link has a nice discussion: http://stackoverflow.com/questions/2769247/controling-crt-memory-initialization and includes a link to the documented behavior: https://msdn.microsoft.com/en-us/library/Aa270812 --jkl
[sqlite] Article about pointer abuse in SQLite
On Tue, 22 Mar 2016 09:56:57 +0100 "Cezary H. Noweta" wrote: > On 2016-03-22 00:35, James K. Lowden wrote: > >[...] An example from Clang's discussion is > > > > int i = 10 << 31; > > Could you provide a link for that discussion? (Or google's phrase to > retrieve such link?) I'm sorry, no. Not for the first time I wish my browser had a feature like "find links in history with documents matching regex". I didn't read it on the Clang mailing list. I think I saw it by reference in Regehr's discussion of "friendly C". It specifically mentioned 10 << 31 as an example of an "integer" requiring 35 bits, something gcc assigns silently and clang diagnoses with a warning. If you haven't seen it, http://blog.regehr.org/archives/1180 is a good starting point. It mentions "Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior" (http://pdos.csail.mit.edu/papers/stack:sosp13.pdf), which is where I learned that sharp-edged optimization is not a brand-new phenomenon. DJB provides a properly justified grumpy, frustrated view, https://groups.google.com/forum/m/#!msg/boring-crypto/48qa1kWignU/o8GGp2K1DAAJ wherein he mentions one of the defenses for the status quo, "that a boring C compiler can't possibly support the desired system _performance_. Even if this were true (which I very much doubt), why would it be more important than system _correctness_?" That should be the only argument needed. DJB is concerned about security. DRH is concerned about correctness. The serious C programmer doesn't breath who prizes performance over correctness, yet that is the license the compiler writers have granted themselves. --jkl
[sqlite] Reserved column names
On Mon, 21 Mar 2016 11:32:28 +0100 Dominique Devienne wrote: > > Explicitly documented by SQLite: > > > > And? That's still non-SQL standard. > > SQLite tries to be compatible with non-standard extensions from > various popular RDBMS', but when a standard alternative exists, it > should be preferred IMHO. Roger's APSW is SQLIte specific. It's pretty easy to imagine, isn't it, that char sql[] = "select [col] from [foo]"; is easier for him to use than char sql[] = "select \"col\" from \"foo\""; even if he's not using C? I would certainly advise (and often do) anyone using SQL to learn to distinguish between standard SQL and any given product's deviations from it. Favoring standard constructs helps avoid weird corners and style. But machine-generated code inside a driver specifically for SQLite? Hard to see who benefits, one way or the other. --jkl
[sqlite] Article about pointer abuse in SQLite
On Mon, 21 Mar 2016 13:48:06 -0700 Scott Perry wrote: > Compilers allow you to choose your standard; --std=c11 means > something very specific (and unchanging) They do. And that covers what the standard covers. The standard also has limits. It includes constructs that are syntactically permitted but whose behavior is left undefined, known by the scarred as "UB" for "undefined behavior". An example from Clang's discussion is int i = 10 << 31; The standard says << is a shift operator. It places no limit on the number of bits to be shifted. If that number is so large that the product cannot be represented by the assigned variable, that is *not* an error. The standard allows the compiler to do anything or nothing with it. As you may imagine, the varieties of anything and nothing are many. Compiler writers are well aware that "nothing" is faster done than "something". Over time, they have gotten more aggressive in simply deleting UB code. As a consequence, programmers who thought they wrote standards-conforming code get burned when they upgrade/change compilers. Mysterious and sometimes subtle errors are introduced by the compiler for the user's benefit. Your googlefu will turn up lots of discussion. One I liked that wasn't on Page 1: http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer --jkl
[sqlite] Article about pointer abuse in SQLite
On Sat, 19 Mar 2016 02:04:35 -0600 Scott Robison wrote: > As he says, there's not real choice between fast and > > correct > > Except that testing can verify something is correct for a given > environment. That's actually not true, on a couple of levels. "[T]esting can be used very effectively to show the presence of bugs but never to show their absence." -- EWD303 I think that should be called Dijkstra's Dictum. It's not just quip; it's a concise insight into limits of testing versus proving correctness. Second, you can't test the future. If the correctness of the code is subject to change by the compiler's interpretation of the language, how is the programmer to prevent it? > > finally drive gcc & friends in the direction of working > > with their users for a change. Or make them irrelevant. > > I think they'd continue to be popular with people looking to eek out > as much performance as possible. You may be right. As a consultant I've often felt I was hired to sprinkle magic pixie performance dust on the system. People want to believe that performance is found in tools. How come there's no -O5? In truth, every performance problem I've encountered was a design problem, often obvious, always unnecessary. "Use a better compiler" has never been the solution. Unloading mounds of unnecessary processing with a pitchfork is. Doubtless there are some well tested, highly stable applications run at scale, for which 5% is a measurable and meaningful gain. IMO they're actually the ones driving UB treatment by compiler writers. The other 99% stand to gain from a compiler that emphasizes correctness and predictable behavior. --jkl
[sqlite] Article about pointer abuse in SQLite
On Fri, 18 Mar 2016 16:33:56 -0600 Scott Robison wrote: > I'd rather have code that might use some "undefined behavior" and > generates the right answer than code that always conformed to defined > behavior yet was logically flawed. Code that falls under undefined behavior *is* logically flawed, by definition. Whether or not it works, it's not specified to. The compiler may have generated perfectly correct machine code, but another compiler or some future version of your present compiler may not. You might share my beef with the compiler writers, though: lots things that are left undefined shouldn't be. Because hardware architecture varies, some practices that do work and have worked and are expected to work on a wide variety of machines are UB. A recent thread on using void* for a function pointer is an example: dlsym(2) returns a function pointer defined as void*, but the C standard says void* can only refer to data, not functions! Machines exist for which the size of a function pointer is not sizeof(void*). Source code that assumes they are the same size is not portable to those architectures. Fine. But a particular compiler generates code for a particular architecture. On x86 hardware, all pointers have always been and will always be the same size. All Linux/Posix code relies on that, too, along with a host of other assumptions. If that ever changed, a boat load of code would have to be changed. Why does the compiler writer feel it's in his interest or mine to warn me about that not-happening eventuality? For the machine I'm compilng for, the code is *not* in error. For some future machine, maybe it will be; let's leave that until then. I was looking at John Regehr's blog the other day. I think it was there that I learned that the practice of dropping UB code on the floor has been going on longer than I'd realized; it's just that gcc has been more aggressive in recent years. I think it was there I saw this construction: if( p < p + n) error where p is a pointer. On lots of architectures, for large n, p + n can be negative. The test works. Or did. The C standard says that's UB, though. It doesn't promise the pointer will go negative. It doesn't promise it won't. It doesn't promise not to tell your mother about it. And, in one recent version, it doesn't compile it. Warning? No. Error? No. Machine code? No! It's UB, so no code is generated (ergo, no error handling)! Even though the hardware instructions that would be -- that used to be -- generated work as implied by the code. Postel's Law is to be liberal in what you accept and conservative in what you emit. The compilers have been practicing the opposite, thwarting common longstanding practice just because they "can". Dan Bernstein is calling for a new C compiler that is 100% deterministic: no UB. All UB per the standard would be defined by the compiler. And maybe a few goodies, like zero-initialized automatic (stack) variables. Such a compiler would enjoy great popularity, even if it imposed, say, a 5% performance penalty, because C programmers would have greater confidence in their code working as expected. They'd have some assurance that the compiler wouldn't cut them off at the knees in its next release. As he says, there's not real choice between fast and correct If the "always defined befavior" compiler got off the ground, may it would finally drive gcc & friends in the direction of working with their users for a change. Or make them irrelevant. --jkl
[sqlite] CAST STRING => INTEGER
On Wed, 16 Mar 2016 01:53:59 -0600 Scott Robison wrote: > > For example, even the operation "select cast(pow(2,65) as integer)" > > and > "select cast(-pow(2,65) as integer)" should return NULL rather than > MAXINT and MININT respectively. > > The $64 bit question ;) is how much existing code might break if such > changes were made. One can argue that the existing implementation is > broken, but a lot of software has been written to use it as it is. > What happens to them if such an improvement is made? Doubtless there is code out there that relies on division by zero producing NULL. I personally would like to see a pragma MATH=ON to change that. Many of the outputs the OP has enumerated are considered undefined behavior. Any computation relying on them would produce incorrect results. Changing that to NULL, as Keith recommends, would arguably make them "correct", and wouldn't violate the promise not to change *defined* behavior. IMO the user would be better served by raising an error, so that it can be dealt with at its source. --jkl
[sqlite] CAST STRING => INTEGER
On Tue, 15 Mar 2016 19:33:32 -0600 "Keith Medcalf" wrote: > > Yes, if the string cannot be represented as an integer, CAST should > > raise a range error. That spares the application from applying the > > same test in an ad hoc and inconsistent way. > > Since there is no way to "trap" such errors, out-of-bounds > conversions should return NULL. J No way to trap an error condition for a SELECT statement? Is that what you mean? To my way of thinking, a CAST range error is a constraint violation. Why can sqlite3_step not return SQLITE_CONSTRAINT_CHECK? > ust as attempting to divide by zero returns NULL. In SQLite, yeah. Other DBMSs don't make that mistake. It's at the top of my list for why SQLite is unsuitable for quantitative work. --jkl
[sqlite] SQLite Pronunciation
On Wed, 16 Mar 2016 14:09:08 -0500 Jay Kreibich wrote: > although if you trace SQL back to the IBM days of SEQUEL, there is a > strong argument that the term ?sequel? makes more sense. IBM insisted "SQL" be pronounced as three letters for exactly that reason: to distinguish it from its forerunner, SEQUEL. IIRC, SEQUEL was trademarked by another firm, and pronouncing SQL to sound like "sequel" risked trademark infringement. > ?an ess-cue-ell lite database? Yup, that's what I write, too. But I've given up my Pedant Pin in conversation because "sequelite" trips off this English-speaker's tongue. Practically saves a second every time I say it, must have added up to hours by now. --jkl
[sqlite] CAST STRING => INTEGER
On Tue, 15 Mar 2016 01:02:17 +0100 "Cezary H. Noweta" wrote: > 2nd row: why REALs can have trailing spaces, while INTEGERs cannot? > 3rd row: why REALs can have trailing trash, while INTEGERs cannot? I think we know now that string->integer conversion is pathologically broken for inputs that cannot be represented as a 64-bit integer. Appending '0' to any such string produces a different, illogical result, too. > While fixing, I spotted a problem mentioned by you: > > > 2) sqlite> select cast('1' as int); ... > > 7766279631452241920 > > It would be: > > 1. left as is, i.e. CAST(manydigitstext AS INTEGER) == > MAX(TEXT2INT(manydigitstext)%2^64,LARGEST_INT64) --- Clemens Ladisch' > opinion; That is not acceptable IMO. Conversions should be reversible, including string->integer->string. Conversions should also be equivalent, and no one is claiming '1' == 2^64. > 2. CAST(manydigitstext AS INTEGER) == {SMALLEST_INT64,LARGEST_INT64} > --- your opinion as I understood you well; Yes, if the string cannot be represented as an integer, CAST should raise a range error. That spares the application from applying the same test in an ad hoc and inconsistent way. To my way of thinking, SQLite's handling of giant integers per se is an edge case. Because such huge numbers don't normally arise, the non-error path (inputs in bounds) almost always produces correct results. The reason to have CAST raise a range error when the output would be invalid is to guard against erroneous inputs creating spurious outputs that, when used in further computation, produce inexplicable results. It's better for the user and programmer to detect the error early -- nearest the cause, where it can be diagnosed -- than to produce a bogus answer and be forced into a manual recapitulation of the processing. --jkl
[sqlite] CAST STRING => INTEGER
On Mon, 14 Mar 2016 13:25:09 +0100 Clemens Ladisch wrote: > > that ``SELECT CAST(col AS INTEGER);'' should return (not so) random > > result set, and receiving any INTEGER should mean that a source > > string could have trillion or more possible values? > > The documentation does not specify how the textual representation of > an out-of-range integer is to be converted, so anything SQLite does > can be considered correct. While I'm sympathetic to the GIGO notion that invalid input (a domain error, in this case) is bound to produce unpredictable output, you have to admit there's some inconsistency here. I would argue the inconsistency is harmful, and would be better thought of as a constraint that needs to be enforced. Consider: 1) sqlite> select cast('1' as float); cast('1' as float) -- 1.0e+20 2) sqlite> select cast('1' as int); cast('1' as int) 7766279631452241920 3) sqlite> select 1; 1 - 1.0e+20 4) sqlite> select 100 * 100 ; 100 * 100 - 1.0e+20 5) sqlite> select cast(100 * 100 as int) ; cast(100 * 100 as int) -- 9223372036854775807 6) sqlite> select cast(1 as text); cast(1 as text) --- 1.0e+20 7) sqlite> select cast(cast(1 as text) as int); cast(cast(1 as text) as int) 1 Before you say, "don't do that", keep in mind that sum() could produce the same integer-overflow result, as could string concatenation. If SQLite produces erroneous results, and places the burden of error detection -- errors that orginate in the data -- on the application. It's hard to justify any of the above. Integer arithmetic produces floating point if the result can't be represented as an integer? That's dubious enough. When forced to be an integer, it's 2^63. The application could reasonable expect that integer arithmetic results in an integer (as it normally does); when the double is returned instead, it will still call sqlite3_column_int with its implicit cast. No error, just MAX_INT. In 7, we convert an integer to text and back to int, and get 1. I understand why, mechanically. But I would say any form of T -> TEXT -> T conversion should be lossless, for any type T, given sufficient width of TEXT. The problems could be avoided by observing simple but strict rules: 1. Integer arithmetic produces integer result. Error on overflow. 2. CAST raises a range error if the value would be outside the range of the target type. 3. Implicit conversion by e.g. sqlite3_column_* adhere to the same rule as CAST. --jkl
[sqlite] Creating system table
On Sat, 12 Mar 2016 13:07:01 -0500 Igor Korot wrote: > My question is: what should I do if I want to create a system table? Change the source code? A system table differs from a user table in how it's created. User tables are defined with CREATE TABLE of course, but system tables are representations of the DBMS's data. The underlying structure of a system table need not be -- and normally isn't -- the same as that of a user table. Different system tables may have different underlying structures. An analog in Linix is the /proc filesystem. System tables are a very old idea. They're mentioned in Codd's 12 rules for defining a relational system. *All* data in the system, including metadata, are to be represented as tables and manipulable using the same language as ordinary tables. One language, one datatype. For example, you might do us the service of creating sqlite_constraints, with columns tablename, colname, and constraint. When the user selects from that table, you could parse the database's DDL and return those columns. A more pressing problem from this perspective is pragmas. To discover the value of a pragma in SQLite, one uses the pragma statement. Better would be a table sqlite_pragmas listing all the current values. --jkl
[sqlite] "Circular" order by
On Thu, 10 Mar 2016 21:16:28 +0200 R Smith wrote: > > Hmm, does this work any better? > > > > SELECT id FROM t > > ORDER BY id < 'pen' desc, id; > > It works, but not better. I think it was Igor who proposed similar > (if not, apologies) which of course produces the correct result, but > cannot take advantage of the index on id so it becomes a result-set > walk causing longer ordering of values - exactly what the OP tried to > avoid. Hmm, I don't know about "cannot", but I'm not surprised by "does not", because it's a tough inference. If you stand back a minute, you can see that id < 'pen' is a monotonic function of "id" if "id" is sorted. The query processor *could* include that logic, and could choose to process the rows, in index order, starting with the first row where id >= 'pen' , to the end, and wrapping back to the beginning. The big boys do that kind of thing. In general "order by f(x)" will use an index on x if f(x) has the same order. For hard problems, they support computed columns -- a little like a view attached to a table -- and indexes on them. In SQLite, I guess the OP's only solution is to make an index of the kind he needs. create table idx as select id < 'pen' as 'LT_pen', id from t; select id from idx order by LT_pen; Of course, that presupposes 'pen' is a constant. Whether or not that's true wasn't mentioned in the original post. --jkl
[sqlite] "Circular" order by
On Thu, 10 Mar 2016 10:17:57 +0100 Alberto Wu wrote: > On 03/09/16 23:30, James K. Lowden wrote: > >> SELECT P.id FROM ( > >> SELECT 0 AS sect, id FROM t WHERE id >= 'pen' > >> UNION ALL > >> SELECT 1, id FROM t WHERE id < 'pen' > >> ) AS P > >> ORDER BY P.sect, P.id > >> ; > > > > This is the correct answer. > > Hi, > > unfortunately the correct answer comes with an extra scan and a temp > b-tree so I'd rather keep the two queries split and handle the case > programmatically. Hmm, does this work any better? SELECT id FROM t ORDER BY id < 'pen' desc, id; --jkl
[sqlite] "Circular" order by
On Wed, 9 Mar 2016 20:43:14 +0200 R Smith wrote: > SELECT P.id FROM ( > SELECT 0 AS sect, id FROM t WHERE id >= 'pen' > UNION ALL > SELECT 1, id FROM t WHERE id < 'pen' > ) AS P > ORDER BY P.sect, P.id > ; This is the correct answer. I'm not sure what you meant by "axiom" in your earlier post. To the best of my knowledge, UNION ALL makes no promise about order (and standard SQL proscribes ORDER BY in a subquery). If the implementation finds it more efficient, for instance, to send alternating rows from each element in the union, it's free to do so. --jkl
[sqlite] Correlated subquery refers to wrong rowid - bug?
On Wed, 9 Mar 2016 10:13:28 -0500 Richard Hipp wrote: > > which outputs one result (2), although the expected result would be > > empty. Sorry for my "what bug?" post. I forgot that the output was wrong! --jkl
[sqlite] Correlated subquery refers to wrong rowid - bug?
On Wed, 09 Mar 2016 15:32:01 +0100 Jean-Christophe Deschamps wrote: > > select id from a where id not in (select a.id from b); > As I understand it, there is no more an a.id column in table b. It > looks like SQLite is trying to get clever ignoring the "a." qualifier. It's not ignoring the qualifier. It's processing an odd correlated subquery. This would produce the same: select id from a where id not in (select a.id); For each row in a, select the id that is not the id in the row. What might be a little suprising is that the columns named in the SELECT need not come from the table in the FROM clause (if there is one). For example, I'm sure you find this valid: select id from a where id not in ( select 2 * (a.id / 2) from b where a.id > b.id ); Granted, that's a long walk for "odd ids in A greater than the smallest id in B. But it's not SQL's job to force succinct expression. --jkl
[sqlite] Correlated subquery refers to wrong rowid - bug?
On Wed, 9 Mar 2016 10:13:28 -0500 Richard Hipp wrote: > > select id from a where id not in (select a.id from b); > > > > which outputs one result (2), although the expected result would be > > empty. > > > > Thanks for the bug report. What bug? The query is valid SQL, and produces the correct results. The programmer would have been better off writing a useful query, but we've all been there. --jkl
[sqlite] Fastest way to find whether at least one row has a certain column value
On Sun, 6 Mar 2016 11:39:38 + Paul Sanderson wrote: > I understand this - but, there always a but, I still would like to do > something. Applying the limit anyway and then telling them the query > has been limited might be a solution. > > Time is usually not an issue but as the results are loaded into a grid > for display memory can be the issue. So execute the query, and start displaying the rows. When you reach "too many", ask the user if he wants to continue. Or light up the "next page" button, or whatever. (That doesn't exempt you, btw, from providing for the user to cancel long-running queries. But you know that.) ISTM what you really want (other than a crystal ball) is tabular access to the query optimizer's estimations, so you could so something like select "estimated rows" from ( estimate query select ... ) as A; I've never seen a system that provides queryable optimizer metadata. I don't remember ever having read a paper on the idea, either. But that's not too surprising: the purpose of externalized query optimizer output is normally to help the user understand what the optimizer will do, not let the program decide whether or not to do it. I still think you're barking up the wrong tree. You suppose that you can vet the queries to prevent accidental cross joins not by finding cross joins directly, but indirectly, by finding inordinately large outputs. I suggest that will lead to a lot of false positives, because mistakes like that are actually pretty rare. You and your user are better off if you don't second-guess him. Just execute the query and show him the output in a controlled way. He'll deal with long-running queries or supervoluminous output the way we all have since DBMSs were invented: cancel the query, and start looking into what happened. --jkl
[sqlite] How does your sqlite script binding handle aggregate UDFs?
On Sat, 5 Mar 2016 21:22:23 +0100 Stephan Beal wrote: > i'm not aware of any aggregates which (in normal use) take no > arguments Nondeterministic functions need not take any arguments. Built-in examples include NOW(). Your UDF could implement the Dilbert RNG: http://dilbert.com/strip/2001-10-25 --jkl
[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?
On Sat, 05 Mar 2016 17:16:52 -0700 "Keith Medcalf" wrote: > > Sometimes it's faster to recompute something than to cache it for > > later re-use. That's rare where I/O is involved, and vanishing rare > > where SQL is involved. > > The only thing worse is retrieving the entire result set and cacheing > the whole think in the gooey structures. Those wheels fall off the > bus much sooner. You make me feel like the 8th grade teacher who complains she teaches the same stuff every year, but the kids never get any smarter. I truthfully don't come across the classic client-side join of two "select * from T" queries anymore. I'm sure it happens, though. Very few shops include a team of SQL experts (or expert) whose mandate is to design tables and queries in the service of the applications. Costs of that shortsightedness are easy to cite if you know where to look, but are not to be found in the salary and delivery schedules that management usually pays attention to. --jkl
[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?
On Fri, 04 Mar 2016 00:35:47 -0800 Darren Duncan wrote: > > How exactly is the first way "easiest"? > > If these are pages displayed to the user, they may want to scroll > backwards at some point; They might, and if you say it's easier to go back to the database than to keep track of previously fetched data for re-display then, thanks, at least I understand your point of view. For myself I can't imagine such a design. After I've gone to the work of preparing the query fetching the results, and placing them in whatever construct is needed for display to the user, I'd certainly hang onto my display structures until the user was done with the data. If the user wants to see it again, the last thing I'd want to do is repeat all that. Sometimes it's faster to recompute something than to cache it for later re-use. That's rare where I/O is involved, and vanishing rare where SQL is involved. --jkl
[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?
On Thu, 3 Mar 2016 10:43:26 +0800 (CST) ?? wrote: > > Can anyone describe a situation for which this style of LIMIT & > > OFFSET is advisable from the application's point of view? (The > > DBMS costs are obvious enough.) > > For me this is the easiest way to implement a scrolling cursor. > Otherwise I will have to record for each table the condition I use > to scroll it. So if there is an easier way to implement the > scrolling cursor please let me know. You say, "record ... the condition ... to scroll [the table]". I'm sure I don't know what "condition" you mean. You appear to be doing something like: offset = 0 do sqlte3_prepare (limit, offset) do sqlite3_step until SQLITE_DONE offset += limit ... application stuff ... until last row processed The obvious, better alternative is sqlte3_prepare do do sqlite3_step until LIMIT ... application stuff ... until last row processed because in the second example you're using SQLite's cursor to "record the condition of the table". There's nothing for you to keep between "pages"; you just start from where you stopped. How exactly is the first way "easiest"? --jkl
[sqlite] Can I implement a scrolling window using LIMIT and OFFSET ?
On Wed, 2 Mar 2016 14:12:04 +0100 Clemens Ladisch wrote: > > https://www.sqlite.org/lang_select.html talks about LIMIT & OFFSET, > > without mentioning that is a bad idea. > > Neither does it mention that it is a good idea. > > > can I do that or not (will it become sluggish if I do that) ? > > When you use large OFFSET values, the database must compute all these > rows before throwing them away. Can anyone describe a situation for which this style of LIMIT & OFFSET is advisable from the application's point of view? (The DBMS costs are obvious enough.) >From first principles, if the application has OFFSET, it might has well keep the connection handle instead, and just stept through the next LIMIT rows. What put the DBMS throught through the work of re-executing the query, when sqlite3_step is at the ready? My guess is the answer is going to have something to do with a web framework, and the cost of maintaining open connections, and the fact that OFFSET can be stored in a cookie but a handle cannot. I wonder about that trade-off in general, because some fraction of discarded cursors will be reconstituted as new (inefficient) queries. And for SQLite in particular, it's hard to imagine so many clients that discarding processes is better than letting them consume virtual memory until they time out. But rather than speculate, I'd be interested to hear of real motivations. thanks. --jkl
[sqlite] How to check if connection to main database is still open or closed before new query ?
On Tue, 1 Mar 2016 17:13:29 + a a wrote: > I want to check after a while if the connection is allready closed or > not for the simple reason not to reopen the database but if is open > to run a query or if it is closed to reopen the database and then run > the query. I don't blame you for thinking that's what you want to do, but you really don't. Consider that the situation you describe has many analogs. You cannot check if a file is closed, if a socket is closed, if free(3) has been called on a pointer. More important, suppose you *could* check those things. Could the OS assure you that the file you opened, the socket, or the memory pointed to is the resource you intended? Could it say anything about the state of that resource? On both counts: No. The solution is to organize your program such that there's no question about the state of its resources. If you must have a flag then, as Keith suggests, the most common way is to set the handle to NULL after you close it. But simplest is best. It's often feasible to open the database just once at the beginning and never close it until exit. Unless you have hundreds of databases open concurrently or are working in an extremely constrained environment, the overhead of carrying around a largely unused database handle is negligible. HTH. --jkl
[sqlite] Random-access sequences
On Tue, 1 Mar 2016 08:15:25 -0500 Richard Damon wrote: > > The theoretical maximum number of rows in a table is 264 > > (18446744073709551616 or about 1.8e+19). This limit is unreachable > > since the maximum database size of 140 terabytes will be reached > > first. A 140 terabytes database can hold no more than approximately > > 1e+13 rows, and then only if there are no indices and if each row > > contains very little data. > > > You can hit 2^63 insertions well before hitting the size limit of the > database if you have also been doing deletions. Yes. If you manage 1,000,000 insertion/second, that's 3.15576 * 10^13/year. You would run out of integers in 584,542 years. To get around that, add an "epoch" column, also integer. Initially it is always zero. Whenever "position" exceeds 2^63, increment "epoch" and reset "position" to zero. That will give you at least twice as many years. --jkl
[sqlite] Encrypt the SQL query
On Thu, 25 Feb 2016 14:01:31 +0800 wrote: > Does SQLite provide a good way to encrypt the SQL query strings while > does not affect the performance when executing the queries? If you're worried about the user examining your program image statically, you could encrypt your SQL by whatever means, and decrypt it "just in time": sqlite3_prepare(db, decrypt(sql, key), ... ); Of course, if the key is also present in the image, you're only discouraging the uninterested. (Something I suppose Tim Cook knows a thing or two about...) If you're worried about the user examining the running program -- for example, with ltrace(1) -- then your question is moot, because at some point the encrypted SQL wiill have to be decrypted before SQLite interprets it. --jkl
[sqlite] Correlated subquery throwing an error
On Mon, 15 Feb 2016 14:19:12 -0700 Scott Robison wrote: > Each job will take some amount of time to process. The order doesn't > matter as long as all jobs are eventually processed and you have a > single process running the jobs. Limit 1 is a reasonable way to grab > a single job. Reasonable, perhaps, but not logical. The logical approach is to use a feature in the data to select the "single job". One obvious way in your example would be to use min(jobid) or somesuch. --jkl
[sqlite] Correlated subquery throwing an error
On Mon, 15 Feb 2016 14:55:34 -0700 "Keith Medcalf" wrote: > Pretty sure you meant: > > select * from (select min(t) as t from T) as T; Yes, thanks. :-) --jkl
[sqlite] Performance comparison between SQLite and SQL Server?
On Mon, 15 Feb 2016 11:21:06 +0800 wrote: > I am just curious whether there is a performance comparison between > SQLite and SQL Server? Odds are you will never see a such a comparison published. If you read your SQL Server EULA, you'll see it specifically prohibits publishing benchmark results. The restriction is not purely evil. It could be argued that the flaws embedded in any testing regimen render the conclusion meaningless. Performance of any kind is notoriously difficult to measure. That's especially true of DBMSs because of their complexity. Optimal DBMS performance is a function of hardware, OS, and DBMS configuration. That's a lot of expertise to bring to bear on making *one* DBMS perform at its best; doing it for N is at least N times harder, especially if you're being scrupulous. Back when SQL was young, people still read magazines about new technology, and benchmarks were a popular feature. I remember poring over them to learn whether Oracle was faster than Sybase, etc. But the more I learned, the better I understood the futility of the question. There were too many moving parts, and the journalists had nowhere near the resources needed to do the job properly. Any decision they rendered was flawed, and in any case became obsolete as the technology evolved. I think it was in answer to such testing that the vendors began including restrictions on published performance measurements in their EULAs. In answer to your question, I would say SQLite performs very well provided you live with its restrictions. In particular: 1. Very little write contention. 2. All processes running on the same machine, using a local filesystem. 3. Queries of limited complexity, or data of limited size. That is, you're lijmited to one writer at a time, you can't use a network filesystem, and query optimization will hurt with large tables if, for example, subqueries can't be flattened. SQL Server has none of those restrictions, and probably keeps pace with SQLite even on its home turf. But the administration of SQL Server is nontrivial. For that reason alone, I would never use it in situations where SQLite would do. --jkl
[sqlite] Correlated subquery throwing an error
On Mon, 15 Feb 2016 08:56:35 +0100 Clemens Ladisch wrote: > I don't know why correlated subqueries cannot use values from the > outer query in the ORDER BY or LIMIT clauses; ORDER BY is not part of SELECT! It's not a relational operator. Per the SQL standard -- ORDER BY cannot appear in a subquery. It can appear in only one place: as a kind of post-processor that determines the order in which the rows are delivered to the client. I suspect that's why Postgres ORDER BY doesn't recognize column names in the enclosing scope. If the day ever comes when SQLite can remove language features, internal ORDER BY (and LIMIT) would be at the top of my list, along with double-quotes for strings. --jkl
[sqlite] Correlated subquery throwing an error
On Mon, 15 Feb 2016 10:39:31 +0100 Clemens Ladisch wrote: > > you need to explicitly limit a subquery that is a field and must > > only ever return 1 result if the where clause is ambiguous about it > > Not in SQLite. (It ignores superfluous rows, and returns NULL if > there are no rows.) Yes, but the right way to do is to use logic instead of brute force. Instead of select * from (select t from T order by t limit 1) as T; use select * from (select min(t) as t from T group by t) as T; The latter has the benefit that it can be easily modified to add COUNT(*) to the subquery, and check for perhaps erroneous cases where COUNT(*) > 1. --jkl
[sqlite] IS a SQLite db of small size as good as reliable cache?
On Mon, 01 Feb 2016 06:39:05 -0700 "Keith Medcalf" wrote: > OS/2 had IBM cache technology in it which worked properly. I remember OS/2. I remember that, like VMS, you could back up the whole OS to ... well, floppies, I suppose, and later restore them to brand new drive, with nothing else required. I couldn't believe my eyes when the equivalent "system restore" process on NT started with 1. Install the OS. Twenty-five years on, that's still SOP on NT, except that nowadays no one expects to restore a backup anyway! But IIRC, NTFS was a Microsoft invention, not connected to OS/2, and borrowed quite heaviily from VMS, because David Cutler. So I'm inclined toward your low-wattage theory, because hardly anything Cutler originally provided remains unsmudged. --jkl
[sqlite] Is the first column of a composite primary key, special?
On Tue, 2 Feb 2016 16:19:07 +0100 Yannick Duch?ne wrote: > There are also representations. Sometimes there is not really a > value, just an identity which is the only thing offering > sense/meaning, and what may be erroneously seen as a value is rather > a representation. Representation is all. The database -- the whole thing, front to back -- is just a model, an approximation of the real world. As designer, you decide how to model that world, the "enterprise of interest". You decide what "entities" there are, what properties they have, how they are known (identified). The DBMS knows nothing of what is being modelled. It's "just" a logic system. That logic operates on values. It doesn't distinguish between kinds of values, between natural and surrogate keys. That sort of thing contributes to the understandability (and hence utility) of the model, but they're extralogical: outside the theory. > All of this, using ID in place of *some* values???however at the cost > of more verbose queries Back in the real world -- the computing world, where there's an actual system implementing the math -- yes, some things are faster than others, and some allowances have to be made for the limitations of the implementation and the computer. :-) --jkl