[sqlite] SQLite big tables create index in parallel
On 19 Mar 2016, at 9:36pm, Domingo Alvarez Duarte wrote: > Here it's: (database size 5GB, run twice) That's a nice piece of testing. As you can see from Dr Hipp's answer to another thread, an index is not created by reading the table one row at a time and using each row to modify each index. Instead all the rows are copied/sorted correctly first, and then the sorted copy is used to generate the index all at once. If this system was used to generate multiple indexes at once each of the copies would need to be sorted into a different order. The amount of temporary space (memory or file) would increase with your system, since space would be needed to store all the partially made indexes at once. This would mean more cache (or paging, or swapping, or whatever you call it) activity would be involved, which would slow the system down and wear out your storage system. So I think that although your idea might be efficient for small tables or setups with a lot of memory, it would be less efficient for other setups. If you want to get close to simulating what you asked for, compare the time taken for this: CREATE TABLE t (a REAL, b REAL); BEGIN; generate INSERT lines for lots of random numbers CREATE INDEX t_a (a); CREATE INDEX t_b (b); END; with this: CREATE TABLE t (a REAL, b REAL); BEGIN; CREATE INDEX t_a (a); CREATE INDEX t_b (b); generate INSERT lines for lots of random numbers END; Simon
[sqlite] SQLite big tables create index in parallel
Hello Simon ! Here it's: (database size 5GB, run twice) squilu time-create-index.nut Total number of records??? 11290493 Time spent counting the table items??? 0.070402 Time spent reading the table items??? 20.6519 Time spent indexing the table items??? 43.1917 Time difference indexing - reading =??? 22.5398 squilu time-create-index.nut Total number of records??? 11290493 Time spent counting the table items??? 0.051606 Time spent reading the table items??? 20.7647 Time spent indexing the table items??? 43.0528 Time difference indexing - reading =??? 22.2881 ? program: SQLite3.config_single_thread(); local db = SQLite3("hacker-news-items.db"); db.exec_dml("PRAGMA synchronous = OFF;"); db.exec_dml("drop index if exists items_user_idx;"); local start_time = os.clock(); print("Total number of records", db.exec_get_one("select count(*) from items;")); local time_spent_counting = os.clock() - start_time; print("Time spent counting the table items", time_spent_counting); db.exec_dml("begin;"); start_time = os.clock(); local stmt = dbprepare("select id, by from items"); while(stmt.next_row()) { ??? local id = stmt.col(0); } stmt.finalize(); local time_spent_reading = os.clock() - start_time; print("Time spent reading the table items", time_spent_reading); start_time = os.clock(); db.exec_dml("CREATE INDEX items_user_idx ON items(\"by\");"); local time_spent_indexing = os.clock() - start_time; print("Time spent indexing the table items", time_spent_indexing); print("Time difference indexing - reading =", time_spent_indexing - time_spent_reading); db.exec_dml("commit;"); db.close(); ? > Sat Mar 19 2016 09:56:44 PM CET from "Simon Slavin" > Subject: Re: [sqlite] SQLite big tables create index >in parallel > > On 19 Mar 2016, at 8:54pm, Domingo Alvarez Duarte > wrote: > > >>When you say [The majority of the time in 'CREATE INDEX' is spent writing >>the >> index, not reading the table.] where the data to spend time creating the >> index come from ? >> > Use the SQLite shell tool. > Use the '.timer ON' command. > Write a command to read every row of the table. Time the command > Write a command to create a new index. Time the command. > Subtract one from the other. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ?
[sqlite] Anyone familiar with SQLite source code: Index creation question
On 19 Mar 2016, at 10:04pm, Richard Hipp wrote: > (B). It's much much faster that way. Thanks. Makes sense. Simon.
[sqlite] SQLite big tables create index in parallel
Hello Simon ! Thanks for reply ! When you say [The majority of the time in 'CREATE INDEX' is spent writing the index, not reading the table.] where the data to spend time creating the index come from ? Cheers ! > Sat Mar 19 2016 08:13:13 PM CET from "Simon Slavin" > Subject: Re: [sqlite] SQLite big tables create index >in parallel > > On 19 Mar 2016, at 11:03am, Domingo Alvarez Duarte > wrote: > > >>Would be nice if sqlite provide a way to create indexes in parallel, I mean >> when working with big tables and creating several indexes the time spent >> scanning the whole database/table is considerable and it's the same for >>each >> "create index", we could have a big time/cpu/disk seek saving if we could >> create several indexes at the same time >> > It actually doesn't help. The majority of the time in 'CREATE INDEX' is >spent writing the index, not reading the table. And since the file structure >of SQLite means that each index resides on a separate set of pages you end up >spending more time juggling pages between caches than you would save. > > If you would like to simulate what happens you can try this: > > 1. CREATE tempTable copying the structure of myTable > 2. DELETE FROM myTable > 3. BEGIN a transaction > 4. CREATE the indexes you want > 5. INSERT INTO myTable SELECT * FROM tempTable > 6. END the transaction > > This is accepted to be slow in SQLite and the instructions are generally to >do the opposite, i.e. that the order 1. 2. 3. 5. 4. 6. will be faster. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ?
[sqlite] Anyone familiar with SQLite source code: Index creation question
Suppose I have the following table: CREATE TABLE t (a, b); with a few million rows of random data in. Suppose there are no indexes on the table. I create the following index on the table: CREATE INDEX t_b ON t (b); There are two ways to make the index. (A) Go through the table in row order adding each row to the index, modifying the tree as you go. (B) Use the same strategy as you would for the command SELECT * FROM t ORDER BY B using any existing index or creating a temporary one as required. This presents the table's rows to you in index order, allowing you to create the tree structure as you go. Which does SQLite do ? Is there an argument for doing (B) because it means that the tree structure can be written in order which may be more efficient. I'm aware that any temporary index created as part of (B) involves the same work it would do in (A), I'm just curious about the effects. Simon.
[sqlite] SQLite big tables create index in parallel
On 19 Mar 2016, at 8:54pm, Domingo Alvarez Duarte wrote: > When you say [The majority of the time in 'CREATE INDEX' is spent writing the > index, not reading the table.] where the data to spend time creating the > index come from ? Use the SQLite shell tool. Use the '.timer ON' command. Write a command to read every row of the table. Time the command. Write a command to create a new index. Time the command. Subtract one from the other. Simon.
[sqlite] COLLATE NU800_NOCASE as a loadable extension?
Aleksey Tulinov writes: . > > Unfortunately i'm not familiar with SQLite Expert, however > > >2.6 Message : no such collation sequence:_RMNOCASE > > Collation provided by nunicode SQLite extension is called NU800_NOCASE > (or NU800 for case-sensitive collation), that name is need to be used if > you want nunicode to handle "COLLATE" expressions. The OP wants a revision of your loadable extension with the collation name NU800_NOCASE renamed or also named RMNOCASE, to be used with a database created by a software that has a flawed collation so named. I would like to try it, too. Tom
[sqlite] windows binary error
I can't find sqlite3.exe in the windows binaries form the downloads page !!
[sqlite] Article about pointer abuse in SQLite
On Mar 19, 2016 1:19 PM, "James K. Lowden" wrote: > > On Sat, 19 Mar 2016 02:04:35 -0600 > Scott Robison wrote: > > > As he says, there's not real choice between fast and > > > correct > > > > Except that testing can verify something is correct for a given > > environment. > > That's actually not true, on a couple of levels. True, and that was a sloppy phrasing on my part. Please replace "correct" with "sufficiently correct for the use case". If you are writing software for one environment you are in a very different situation than if you are trying to write truly portable software intended to be built and used in any arbitrary environment. > Second, you can't test the future. If the correctness of the code is > subject to change by the compiler's interpretation of the language, how > is the programmer to prevent it? In addition to DRH's comment about the moving target set by evolving standards: even if there was nothing undefined by the standard you can't trust a new release of a compiler or library to have not introduced some bug that impacts you. Hence the need to test the heck out of any intended release. It's not proof of correctness but it is evidence of a likely sufficiently correct program. > > > > finally drive gcc & friends in the direction of working > > > with their users for a change. Or make them irrelevant. > > > > I think they'd continue to be popular with people looking to eek out > > as much performance as possible. > > You may be right. As a consultant I've often felt I was hired to > sprinkle magic pixie performance dust on the system. People want to > believe that performance is found in tools. How come there's no -O5? > > In truth, every performance problem I've encountered was a design > problem, often obvious, always unnecessary. "Use a better compiler" > has never been the solution. Unloading mounds of unnecessary > processing with a pitchfork is. I guess it depends on what you're doing. Some applications (I'm thinking games at the moment) need both quality design and quality code generation. When you know your application is being targeted for a platform you'll utilize undefined behavior if it eeks out a little bit more performance. If necessary you'll hide UB in conditional blocks so that you get the best out of different platforms. I agree that most applications do not need anything like this, and I think it is a good idea to steer clear of UB, but it is not universally true.
[sqlite] SQLite big tables create index in parallel
On 19 Mar 2016, at 11:03am, Domingo Alvarez Duarte wrote: > Would be nice if sqlite provide a way to create indexes in parallel, I mean > when working with big tables and creating several indexes the time spent > scanning the whole database/table is considerable and it's the same for each > "create index", we could have a big time/cpu/disk seek saving if we could > create several indexes at the same time It actually doesn't help. The majority of the time in 'CREATE INDEX' is spent writing the index, not reading the table. And since the file structure of SQLite means that each index resides on a separate set of pages you end up spending more time juggling pages between caches than you would save. If you would like to simulate what happens you can try this: 1. CREATE tempTable copying the structure of myTable 2. DELETE FROM myTable 3. BEGIN a transaction 4. CREATE the indexes you want 5. INSERT INTO myTable SELECT * FROM tempTable 6. END the transaction This is accepted to be slow in SQLite and the instructions are generally to do the opposite, i.e. that the order 1. 2. 3. 5. 4. 6. will be faster. Simon.
[sqlite] Reserved column names
On 19 Mar 2016, at 10:30am, Paul Sanderson wrote: > Is there a list of reserved column names > > I have seen the list or keywords at the link below > > http://www.sqlite.org/lang_keywords.html > > but for instance create table (abort int) will work fine. It may work fine for that statement but you may find that it causes problems later. For instance once you've created a column called 'abort' the command CREATE TRIGGER ON myTable WHEN abort ... may not do what you expect it to do. (I don't know that that specific example is a problem, I'm just illustrating.) So even if one of the words in that list works in one context it's better to avoid it. Simon.
[sqlite] Sqlite do not use index collate nocase
On 03/18/16 00:24 , Keith Medcalf wrote: > The column data is case sensitive and the comparison is case > sensitive. There is no case sensitive index. > > You either need (a) to make the column collate nocase (in which case > any index on the column is also in the same nocase collation and thus > you do not need it in the create index), or (b) specify that the > comparison is using collation nocase. I guess, OP suggested: following queries: SELECT * FROM t WHERE a = b; and SELECT * FROM t WHERE (a COLLATE NOCASE = b) AND a = b; gives same result (if (a = b) is true, then (a COLLATE NOCASE = b) is true too). If only NOCASE index is present, SQLite could've converted first query to second automatically (probably, unless ANALYZE says NOCASE index is "bad"/unspecific?) I'm not *totally* sure, but I think this maybe apply to any collation that obey rules from create_collation.html (`a COLLATE BINARY = b` always implies `a COLLATE WHATEVER = b`). Anyway, it is certainly true for SQLite's NOCASE. Note this works with equality, but does not work with other comparisons: ('b' COLLATE BINARY > 'B') is true, but ('b' COLLATE NOCASE > 'B') is false, so you cannot use non-BINARY collation to accelerate BINARY BETWEEN/ORDER/etc. Of course, it works with BINARY equality and non-BINARY collation index, but does not work with two arbitrary non-binary collations (you cannot use `COLLATE foo` index to accelerate `a COLLATE bar = b`). (BTW, for `BINARY` `GROUP BY`, you, of course, cannot directly use `COLLATE foobar` index, but it may be still advantageous to walk by this index, instead of original table order, as it splits large table into smaller definitely-non-equal subgroups; it may require adding some [non-trivial] code to sqlite3 to take advantage of this property, though; not low-hanging fruit). >> -Original Message- >> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- >> bounces at mailinglists.sqlite.org] On Behalf Of Domingo Alvarez Duarte >> Sent: Thursday, 17 March, 2016 12:06 >> To: SQLite mailing list >> Subject: [sqlite] Sqlite do not use index collate nocase >> >> Hello ! >> >> In one database I created an index with collate nocase but it seems that >> sqlite do not recognize it as a candidate index for a select. >> >> Bellow is the schema, the real database has more than a million rows. If I >> alter the table to include the collation or remove the collation from the >> index then sqlite does use the index. >> >> I expect an index collate nocase to be valid to queries like the one >> bellow. >> >> >> Cheers ! >> >> Schema: >> >> PRAGMA foreign_keys=OFF; >> BEGIN TRANSACTION; >> CREATE TABLE 'items' ( >> ??? 'id' integer PRIMARY KEY, >> ??? 'parent' INTEGER, >> ??? 'by' text, >> ??? 'score' integer DEFAULT 0, >> ??? 'title' text, >> ??? 'type' text, >> ??? 'url' text, >> ??? 'deleted' BOOLEAN DEFAULT 0, >> ??? 'dead' BOOLEAN DEFAULT 0, >> ??? 'comment' TEXT DEFAULT null, >> ??? 'time' integer NOT NULL >> ); >> CREATE INDEX "items_user_idx" ON "items"("by" COLLATE NOCASE); >> COMMIT; >> >> explain query plan select * from items? where by='doppp'; >> >> SCAN TABLE items
[sqlite] Anyone familiar with SQLite source code: Index creation question
On 3/19/16, Simon Slavin wrote: > > I create the following index on the table: > CREATE INDEX t_b ON t (b); > There are two ways to make the index. > > (A) Go through the table in row order adding each row to the index, > modifying the tree as you go. > > (B) Use the same strategy as you would for the command > SELECT * FROM t ORDER BY B > using any existing index or creating a temporary one as required. > > Which does SQLite do ? (B). It's much much faster that way. -- D. Richard Hipp drh at sqlite.org
[sqlite] Reserved column names
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 19/03/16 03:30, Paul Sanderson wrote: > I know that keywords can be quoted but I am interested in just > those that can be used unquoted (even if not advisable) Out of curiousity, why? My rule of thumb is to always quote (using square brackets) when the query is generated by code, and only quote where reasonable when the query is written by a human. For the APSW shell the following are all quoted: * zero length names (yes SQLite allows them) * if it starts with a digit * if any non-alphanumeric/underscore present (ascii only alphas ok) * if in the SQLite list of keywords Roger -BEGIN PGP SIGNATURE- Version: GnuPG v2 iEYEARECAAYFAlbt2QIACgkQmOOfHg372QRhAQCfc00p/L15AJmx8Zgrr9isuU5H B3cAnim38/I6S3gNsHQ7WZtJKok+T+sY =O8pL -END PGP SIGNATURE-
[sqlite] Article about pointer abuse in SQLite
On 3/19/16, James K. Lowden wrote: > > Second, you can't test the future. If the correctness of the code is > subject to change by the compiler's interpretation of the language, how > is the programmer to prevent it? > Indeed. Every bit of the code examined by Prof. Regehr was well-defined according to K&R. But subsequent revisions of the C-language standards changed that. How does one write code that will comply with language standards that keep changing out from under you? -- D. Richard Hipp drh at sqlite.org
[sqlite] Article about pointer abuse in SQLite
On Sat, 19 Mar 2016 02:04:35 -0600 Scott Robison wrote: > As he says, there's not real choice between fast and > > correct > > Except that testing can verify something is correct for a given > environment. That's actually not true, on a couple of levels. "[T]esting can be used very effectively to show the presence of bugs but never to show their absence." -- EWD303 I think that should be called Dijkstra's Dictum. It's not just quip; it's a concise insight into limits of testing versus proving correctness. Second, you can't test the future. If the correctness of the code is subject to change by the compiler's interpretation of the language, how is the programmer to prevent it? > > finally drive gcc & friends in the direction of working > > with their users for a change. Or make them irrelevant. > > I think they'd continue to be popular with people looking to eek out > as much performance as possible. You may be right. As a consultant I've often felt I was hired to sprinkle magic pixie performance dust on the system. People want to believe that performance is found in tools. How come there's no -O5? In truth, every performance problem I've encountered was a design problem, often obvious, always unnecessary. "Use a better compiler" has never been the solution. Unloading mounds of unnecessary processing with a pitchfork is. Doubtless there are some well tested, highly stable applications run at scale, for which 5% is a measurable and meaningful gain. IMO they're actually the ones driving UB treatment by compiler writers. The other 99% stand to gain from a compiler that emphasizes correctness and predictable behavior. --jkl
[sqlite] Reserved column names
On 3/19/16, Paul Sanderson wrote: > Is there a list of reserved column names > > I have seen the list or keywords at the link below > > http://www.sqlite.org/lang_keywords.html > > but for instance create table (abort int) will work fine. > There is no list. The "official" position is that keywords should never be used as identifiers. Ever. SQLite is very forgiving on this point. It will tolerate the use of many keywords as identifiers in many different contexts. (The Lemon parser generator contains special logic to handle this case.) This grace from the parser is to maximize backwards compatibility. But you should not count on it. If an identifier is a keyword, or looks like something that might someday become a keyword, then you should quote it. -- D. Richard Hipp drh at sqlite.org
[sqlite] COLLATE NU800_NOCASE as a loadable extension?
On 03/17/2016 07:29 PM, Torleif Haug?deg?rd wrote: Torleif, > Now the resultat was this one; > > Exception: > -- >2.1 Date : Thu, 17 Mar 2016 18:27:00 +0100 >2.2 Address : 00AA83F5 >2.3 Module Name : SQLiteExpertPers.exe - (The complete administration > tool for SQLite) >2.4 Module Version: 3.5.92.2512 >2.5 Type : ESQLiteException >2.6 Message : no such collation sequence:_RMNOCASE >. >SQL Statement: > > . >SELECT given FROM NameTable ORDER BY given. > Unfortunately i'm not familiar with SQLite Expert, however >2.6 Message : no such collation sequence:_RMNOCASE Collation provided by nunicode SQLite extension is called NU800_NOCASE (or NU800 for case-sensitive collation), that name is need to be used if you want nunicode to handle "COLLATE" expressions. Here is documentation on nunicode SQLite extension: https://bitbucket.org/alekseyt/nunicode#markdown-header-sqlite3-extension with some usage examples.
[sqlite] error messages from SQLite
Thanks Tim I am not concerned with errors from function calls - just errors with SQL queries at the command line etc. Paul www.sandersonforensics.com skype: r3scue193 twitter: @sandersonforens Tel +44 (0)1326 572786 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit -Forensic Toolkit for SQLite email from a work address for a fully functional demo licence On 19 March 2016 at 11:20, Tim Streater wrote: > On 19 Mar 2016 at 10:26, Paul Sanderson > wrote: > >> When executing a sql query I often find I have made a typo or been a >> bit dull and I'll get an error message back along the lines of >> >> Error: near "text": syntax error >> >> Is there a way of expanding on this, for instance adding more of the >> following text or a character offset. The last time this happened I >> had a create table... command with about 30 text columns and >> identifying which "text" string the error referred to took a little >> while. > > In my PHP usage of SQLite, I have a shim around the provided functions which > logs the SQL in the event of error. The application then displays it too. The > shim also expects a small string to identify which actual SQLite call was > involved, so I don't have to look through 140 source modules trying to guess > what was going on at the time. > > -- > Cheers -- Tim > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] SQLite big tables create index in parallel
Hello ! Would be nice if sqlite provide a way to create indexes in parallel, I mean when working with big tables and creating several indexes the time spent scanning the whole database/table is considerable and it's the same for each "create index", we could have a big time/cpu/disk seek saving if we could create several indexes at the same time: CREATE INDEXES ON big_table index_1(name), index_2(title), index_3(parent), index_4(city, country); Cheers !
[sqlite] error messages from SQLite
On 19 Mar 2016 at 10:26, Paul Sanderson wrote: > When executing a sql query I often find I have made a typo or been a > bit dull and I'll get an error message back along the lines of > > Error: near "text": syntax error > > Is there a way of expanding on this, for instance adding more of the > following text or a character offset. The last time this happened I > had a create table... command with about 30 text columns and > identifying which "text" string the error referred to took a little > while. In my PHP usage of SQLite, I have a shim around the provided functions which logs the SQL in the event of error. The application then displays it too. The shim also expects a small string to identify which actual SQLite call was involved, so I don't have to look through 140 source modules trying to guess what was going on at the time. -- Cheers -- Tim
[sqlite] Reserved column names
Is there a list of reserved column names I have seen the list or keywords at the link below http://www.sqlite.org/lang_keywords.html but for instance create table (abort int) will work fine. I know that keywords can be quoted but I am interested in just those that can be used unquoted (even if not advisable) Without going through all of them can someone point me at a list? Thanks
[sqlite] error messages from SQLite
When executing a sql query I often find I have made a typo or been a bit dull and I'll get an error message back along the lines of Error: near "text": syntax error Is there a way of expanding on this, for instance adding more of the following text or a character offset. The last time this happened I had a create table... command with about 30 text columns and identifying which "text" string the error referred to took a little while. Thanks
[sqlite] Article about pointer abuse in SQLite
On Mar 18, 2016 11:12 PM, "James K. Lowden" wrote: > > On Fri, 18 Mar 2016 16:33:56 -0600 > Scott Robison wrote: > > > I'd rather have code that might use some "undefined behavior" and > > generates the right answer than code that always conformed to defined > > behavior yet was logically flawed. > > Code that falls under undefined behavior *is* logically flawed, by > definition. Whether or not it works, it's not specified to. The > compiler may have generated perfectly correct machine code, but another > compiler or some future version of your present compiler may not. Perhaps I should have said "undefined behavior as per the standard". Code that does what is intended for the intended target environment utilizing a specific tool chain is not logically flawed just because the standard calls a construct "undefined behavior". > You might share my beef with the compiler writers, though: lots things > that are left undefined shouldn't be. Not just compiler writers but standards organizations. Lots of overlap to be sure, but not 100%. Because hardware architecture > varies, some practices that do work and have worked and are expected to > work on a wide variety of machines are UB. A recent thread on using > void* for a function pointer is an example: dlsym(2) returns a function > pointer defined as void*, but the C standard says void* can only refer > to data, not functions! So I guess casting between function pointers might be arguably safer if the void* form of a function was called void(*)(void) though even then there can be multiple forms of function pointers (like near vs far pointers in x86 real mode). > > Machines exist for which the size of a function pointer is not > sizeof(void*). Source code that assumes they are the same size is not > portable to those architectures. Fine. But a particular compiler > generates code for a particular architecture. On x86 hardware, all > pointers have always been and will always be the same size. All > Linux/Posix code relies on that, too, along with a host of other > assumptions. If that ever changed, a boat load of code would have to be > changed. Why does the compiler writer feel it's in his interest or > mine to warn me about that not-happening eventuality? For the machine > I'm compilng for, the code is *not* in error. For some future machine, > maybe it will be; let's leave that until then. > > I was looking at John Regehr's blog the other day. I think it was > there that I learned that the practice of dropping UB code on the floor > has been going on longer than I'd realized; it's just that gcc has been > more aggressive in recent years. I think it was there I saw this > construction: > > if( p < p + n) > error > > where p is a pointer. On lots of architectures, for large n, p + n can > be negative. The test works. Or did. The C standard says that's > UB, though. It doesn't promise the pointer will go negative. It doesn't > promise it won't. It doesn't promise not to tell your mother about > it. And, in one recent version, it doesn't compile it. Warning? No. > Error? No. Machine code? No! It's UB, so no code is generated (ergo, > no error handling)! Even though the hardware instructions that would > be -- that used to be -- generated work as implied by the code. Yes, these are the sorts of things that are frustrating. > Postel's Law is to be liberal in what you accept and conservative in > what you emit. The compilers have been practicing the opposite, > thwarting common longstanding practice just because they "can". > > Dan Bernstein is calling for a new C compiler that is 100% > deterministic: no UB. All UB per the standard would be defined by the > compiler. And maybe a few goodies, like zero-initialized automatic > (stack) variables. Neat idea, though undefined behavior isn't always bad. Interesting article at http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html that goes into some considerations. > > Such a compiler would enjoy great popularity, even if it imposed, say, > a 5% performance penalty, because C programmers would have greater > confidence in their code working as expected. They'd have some > assurance that the compiler wouldn't cut them off at the knees in its > next release. As he says, there's not real choice between fast and > correct Except that testing can verify something is correct for a given environment. If the "always defined befavior" compiler got off the ground, > may it would finally drive gcc & friends in the direction of working > with their users for a change. Or make them irrelevant. I think they'd continue to be popular with people looking to eek out as much performance as possible.
[sqlite] Article about pointer abuse in SQLite
On Fri, 18 Mar 2016 16:33:56 -0600 Scott Robison wrote: > I'd rather have code that might use some "undefined behavior" and > generates the right answer than code that always conformed to defined > behavior yet was logically flawed. Code that falls under undefined behavior *is* logically flawed, by definition. Whether or not it works, it's not specified to. The compiler may have generated perfectly correct machine code, but another compiler or some future version of your present compiler may not. You might share my beef with the compiler writers, though: lots things that are left undefined shouldn't be. Because hardware architecture varies, some practices that do work and have worked and are expected to work on a wide variety of machines are UB. A recent thread on using void* for a function pointer is an example: dlsym(2) returns a function pointer defined as void*, but the C standard says void* can only refer to data, not functions! Machines exist for which the size of a function pointer is not sizeof(void*). Source code that assumes they are the same size is not portable to those architectures. Fine. But a particular compiler generates code for a particular architecture. On x86 hardware, all pointers have always been and will always be the same size. All Linux/Posix code relies on that, too, along with a host of other assumptions. If that ever changed, a boat load of code would have to be changed. Why does the compiler writer feel it's in his interest or mine to warn me about that not-happening eventuality? For the machine I'm compilng for, the code is *not* in error. For some future machine, maybe it will be; let's leave that until then. I was looking at John Regehr's blog the other day. I think it was there that I learned that the practice of dropping UB code on the floor has been going on longer than I'd realized; it's just that gcc has been more aggressive in recent years. I think it was there I saw this construction: if( p < p + n) error where p is a pointer. On lots of architectures, for large n, p + n can be negative. The test works. Or did. The C standard says that's UB, though. It doesn't promise the pointer will go negative. It doesn't promise it won't. It doesn't promise not to tell your mother about it. And, in one recent version, it doesn't compile it. Warning? No. Error? No. Machine code? No! It's UB, so no code is generated (ergo, no error handling)! Even though the hardware instructions that would be -- that used to be -- generated work as implied by the code. Postel's Law is to be liberal in what you accept and conservative in what you emit. The compilers have been practicing the opposite, thwarting common longstanding practice just because they "can". Dan Bernstein is calling for a new C compiler that is 100% deterministic: no UB. All UB per the standard would be defined by the compiler. And maybe a few goodies, like zero-initialized automatic (stack) variables. Such a compiler would enjoy great popularity, even if it imposed, say, a 5% performance penalty, because C programmers would have greater confidence in their code working as expected. They'd have some assurance that the compiler wouldn't cut them off at the knees in its next release. As he says, there's not real choice between fast and correct If the "always defined befavior" compiler got off the ground, may it would finally drive gcc & friends in the direction of working with their users for a change. Or make them irrelevant. --jkl