[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Simon Slavin

On 19 Mar 2016, at 9:36pm, Domingo Alvarez Duarte  wrote:

> Here it's: (database size 5GB, run twice)  

That's a nice piece of testing.

As you can see from Dr Hipp's answer to another thread, an index is not created 
by reading the table one row at a time and using each row to modify each index. 
 Instead all the rows are copied/sorted correctly first, and then the sorted 
copy is used to generate the index all at once.

If this system was used to generate multiple indexes at once each of the copies 
would need to be sorted into a different order.  The amount of temporary space 
(memory or file) would increase with your system, since space would be needed 
to store all the partially made indexes at once.

This would mean more cache (or paging, or swapping, or whatever you call it) 
activity would be involved, which would slow the system down and wear out your 
storage system.

So I think that although your idea might be efficient for small tables or 
setups with a lot of memory, it would be less efficient for other setups.

If you want to get close to simulating what you asked for, compare the time 
taken for this:

CREATE TABLE t (a REAL, b REAL);
BEGIN;
generate INSERT lines for lots of random numbers
CREATE INDEX t_a (a);
CREATE INDEX t_b (b);
END;

with this:

CREATE TABLE t (a REAL, b REAL);
BEGIN;
CREATE INDEX t_a (a);
CREATE INDEX t_b (b);
generate INSERT lines for lots of random numbers
END;

Simon


[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Domingo Alvarez Duarte
Hello Simon !  

Here it's: (database size 5GB, run twice)  

squilu time-create-index.nut
Total number of records??? 11290493
Time spent counting the table items??? 0.070402
Time spent reading the table items??? 20.6519
Time spent indexing the table items??? 43.1917
Time difference indexing - reading =??? 22.5398  

squilu time-create-index.nut 
Total number of records??? 11290493
Time spent counting the table items??? 0.051606
Time spent reading the table items??? 20.7647
Time spent indexing the table items??? 43.0528
Time difference indexing - reading =??? 22.2881  

?  

program:  

SQLite3.config_single_thread();

local db = SQLite3("hacker-news-items.db");

db.exec_dml("PRAGMA synchronous = OFF;");

db.exec_dml("drop index if exists items_user_idx;");

local start_time = os.clock();
print("Total number of records", db.exec_get_one("select count(*) from
items;"));
local time_spent_counting = os.clock() - start_time;
print("Time spent counting the table items", time_spent_counting);

db.exec_dml("begin;");

start_time = os.clock();
local stmt = dbprepare("select id, by from items");

while(stmt.next_row())
{
??? local id = stmt.col(0);
}
stmt.finalize();

local time_spent_reading = os.clock() - start_time;
print("Time spent reading the table items", time_spent_reading);

start_time = os.clock();
db.exec_dml("CREATE INDEX items_user_idx ON items(\"by\");");

local time_spent_indexing = os.clock() - start_time;
print("Time spent indexing the table items", time_spent_indexing);

print("Time difference indexing - reading =", time_spent_indexing -
time_spent_reading);


db.exec_dml("commit;");

db.close();  

?  
>  Sat Mar 19 2016 09:56:44 PM CET from "Simon Slavin"
>  Subject: Re: [sqlite] SQLite big tables create index
>in parallel
>
>  On 19 Mar 2016, at 8:54pm, Domingo Alvarez Duarte
> wrote:
> 
>  
>>When you say [The majority of the time in 'CREATE INDEX' is spent writing
>>the
>> index, not reading the table.] where the data to spend time creating the
>> index come from ?
>> 

>  Use the SQLite shell tool.
> Use the '.timer ON' command.
> Write a command to read every row of the table. Time the command
> Write a command to create a new index. Time the command.
> Subtract one from the other.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
>
>  



?



[sqlite] Anyone familiar with SQLite source code: Index creation question

2016-03-19 Thread Simon Slavin

On 19 Mar 2016, at 10:04pm, Richard Hipp  wrote:

> (B).  It's much much faster that way.

Thanks.  Makes sense.

Simon.


[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Domingo Alvarez Duarte
Hello Simon !  

Thanks for reply !  

When you say [The majority of the time in 'CREATE INDEX' is spent writing the
index, not reading the table.] where the data to spend time creating the
index come from ?


Cheers !  
>  Sat Mar 19 2016 08:13:13 PM CET from "Simon Slavin"
>  Subject: Re: [sqlite] SQLite big tables create index
>in parallel
>
>  On 19 Mar 2016, at 11:03am, Domingo Alvarez Duarte
> wrote:
> 
>  
>>Would be nice if sqlite provide a way to create indexes in parallel, I mean
>> when working with big tables and creating several indexes the time spent
>> scanning the whole database/table is considerable and it's the same for
>>each
>> "create index", we could have a big time/cpu/disk seek saving if we could
>> create several indexes at the same time 
>> 

>  It actually doesn't help. The majority of the time in 'CREATE INDEX' is
>spent writing the index, not reading the table. And since the file structure
>of SQLite means that each index resides on a separate set of pages you end up
>spending more time juggling pages between caches than you would save.
> 
> If you would like to simulate what happens you can try this:
> 
> 1. CREATE tempTable copying the structure of myTable
> 2. DELETE FROM myTable
> 3. BEGIN a transaction
> 4. CREATE the indexes you want
> 5. INSERT INTO myTable SELECT * FROM tempTable
> 6. END the transaction
> 
> This is accepted to be slow in SQLite and the instructions are generally to
>do the opposite, i.e. that the order 1. 2. 3. 5. 4. 6. will be faster.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
>
>  



?



[sqlite] Anyone familiar with SQLite source code: Index creation question

2016-03-19 Thread Simon Slavin
Suppose I have the following table:
CREATE TABLE t (a, b);
with a few million rows of random data in.  Suppose there are no indexes on the 
table.

I create the following index on the table:
CREATE INDEX t_b ON t (b);
There are two ways to make the index.

(A) Go through the table in row order adding each row to the index, modifying 
the tree as you go.

(B) Use the same strategy as you would for the command
SELECT * FROM t ORDER BY B
using any existing index or creating a temporary one as required.  This 
presents the table's rows to you in index order, allowing you to create the 
tree structure as you go.

Which does SQLite do ?  Is there an argument for doing (B) because it means 
that the tree structure can be written in order which may be more efficient.

I'm aware that any temporary index created as part of (B) involves the same 
work it would do in (A), I'm just curious about the effects.

Simon.


[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Simon Slavin

On 19 Mar 2016, at 8:54pm, Domingo Alvarez Duarte  wrote:

> When you say [The majority of the time in 'CREATE INDEX' is spent writing the
> index, not reading the table.] where the data to spend time creating the
> index come from ?

Use the SQLite shell tool.
Use the '.timer ON' command.
Write a command to read every row of the table.  Time the command.
Write a command to create a new index.  Time the command.
Subtract one from the other.

Simon.


[sqlite] COLLATE NU800_NOCASE as a loadable extension?

2016-03-19 Thread Tom Holden
Aleksey Tulinov  writes:
   .
> 
> Unfortunately i'm not familiar with SQLite Expert, however
> 
>  >2.6 Message   : no such collation sequence:_RMNOCASE
> 
> Collation provided by nunicode SQLite extension is called NU800_NOCASE 
> (or NU800 for case-sensitive collation), that name is need to be used if 
> you want nunicode to handle "COLLATE" expressions.


The OP wants a revision of your loadable extension with the collation name 
NU800_NOCASE 
renamed or also named 
RMNOCASE, 
to be used with a database created by a software that 
has a flawed  collation so named. I would like to try it, too.

Tom






[sqlite] windows binary error

2016-03-19 Thread Amr Saber
I can't find sqlite3.exe in the windows binaries form the downloads page !!


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread Scott Robison
On Mar 19, 2016 1:19 PM, "James K. Lowden"  wrote:
>
> On Sat, 19 Mar 2016 02:04:35 -0600
> Scott Robison  wrote:
>
> > As he says, there's not real choice between fast and
> > > correct
> >
> > Except that testing can verify something is correct for a given
> > environment.
>
> That's actually not true, on a couple of levels.

True, and that was a sloppy phrasing on my part. Please replace "correct"
with "sufficiently correct for the use case". If you are writing software
for one environment you are in a very different situation than if you are
trying to write truly portable software intended to be built and used in
any arbitrary environment.

> Second, you can't test the future.  If the correctness of the code is
> subject to change by the compiler's interpretation of the language, how
> is the programmer to prevent it?

In addition to DRH's comment about the moving target set by evolving
standards: even if there was nothing undefined by the standard you can't
trust a new release of a compiler or library to have not introduced some
bug that impacts you. Hence the need to test the heck out of any intended
release. It's not proof of correctness but it is evidence of a likely
sufficiently correct program.

>
> > > finally drive gcc & friends in the direction of working
> > > with their users for a change.  Or make them irrelevant.
> >
> > I think they'd continue to be popular with people looking to eek out
> > as much performance as possible.
>
> You may be right.  As a consultant I've often felt I was hired to
> sprinkle magic pixie performance dust on the system.  People want to
> believe that performance is found in tools.  How come there's no -O5?
>
> In truth, every performance problem I've encountered was a design
> problem, often obvious, always unnecessary.  "Use a better compiler"
> has never been the solution.  Unloading mounds of unnecessary
> processing with a pitchfork is.

I guess it depends on what you're doing. Some applications (I'm thinking
games at the moment) need both quality design and quality code generation.
When you know your application is being targeted for a platform you'll
utilize undefined behavior if it eeks out a little bit more performance. If
necessary you'll hide UB in conditional blocks so that you get the best out
of different platforms.

I agree that most applications do not need anything like this, and I think
it is a good idea to steer clear of UB, but it is not universally true.


[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Simon Slavin

On 19 Mar 2016, at 11:03am, Domingo Alvarez Duarte  wrote:

> Would be nice if sqlite provide a way to create indexes in parallel, I mean
> when working with big tables and creating several indexes the time spent
> scanning the whole database/table is considerable and it's the same for each
> "create index", we could have a big time/cpu/disk seek saving if we could
> create several indexes at the same time 

It actually doesn't help.  The majority of the time in 'CREATE INDEX' is spent 
writing the index, not reading the table.  And since the file structure of 
SQLite means that each index resides on a separate set of pages you end up 
spending more time juggling pages between caches than you would save.

If you would like to simulate what happens you can try this:

1. CREATE tempTable copying the structure of myTable
2. DELETE FROM myTable
3. BEGIN a transaction
4.   CREATE the indexes you want
5.   INSERT INTO myTable SELECT * FROM tempTable
6. END the transaction

This is accepted to be slow in SQLite and the instructions are generally to do 
the opposite, i.e. that the order 1. 2. 3. 5. 4. 6. will be faster.

Simon.


[sqlite] Reserved column names

2016-03-19 Thread Simon Slavin

On 19 Mar 2016, at 10:30am, Paul Sanderson  
wrote:

> Is there a list of reserved column names
> 
> I have seen the list or keywords at the link below
> 
> http://www.sqlite.org/lang_keywords.html
> 
> but for instance create table (abort int) will work fine.

It may work fine for that statement but you may find that it causes problems 
later.  For instance once you've created a column called 'abort' the command

CREATE TRIGGER ON myTable WHEN abort ...

may not do what you expect it to do.  (I don't know that that specific example 
is a problem, I'm just illustrating.)  So even if one of the words in that list 
works in one context it's better to avoid it.

Simon.


[sqlite] Sqlite do not use index collate nocase

2016-03-19 Thread Yuriy M. Kaminskiy

On 03/18/16 00:24 , Keith Medcalf wrote:
> The column data is case sensitive and the comparison is case
> sensitive.  There is no case sensitive index.
>
> You either need (a) to make the column collate nocase (in which case
> any index on the column is also in the same nocase collation and thus
> you do not need it in the create index), or (b) specify that the
> comparison is using collation nocase.

I guess, OP suggested: following queries:

   SELECT * FROM t WHERE a = b;

and

   SELECT * FROM t WHERE (a COLLATE NOCASE = b) AND a = b;

gives same result (if (a = b) is true, then (a COLLATE NOCASE = b) is
true too).

If only NOCASE index is present, SQLite could've converted first query to
second automatically (probably, unless ANALYZE says NOCASE index is
"bad"/unspecific?)

I'm not *totally* sure, but I think this maybe apply to any collation that
obey rules from create_collation.html (`a COLLATE BINARY = b` always
implies `a COLLATE WHATEVER = b`).
Anyway, it is certainly true for SQLite's NOCASE.

Note this works with equality, but does not work with other comparisons:
('b' COLLATE BINARY > 'B') is true, but ('b' COLLATE NOCASE > 'B') is
false, so you cannot use non-BINARY collation to accelerate BINARY
BETWEEN/ORDER/etc.

Of course, it works with BINARY equality and non-BINARY collation
index, but does not work with two arbitrary non-binary collations (you
cannot use `COLLATE foo` index to accelerate `a COLLATE bar = b`).

(BTW, for `BINARY` `GROUP BY`, you, of course, cannot directly use
`COLLATE foobar` index, but it may be still advantageous to walk by this
index, instead of original table order, as it splits large table into
smaller definitely-non-equal subgroups; it may require adding some
[non-trivial] code to sqlite3 to take advantage of this property,
though; not low-hanging fruit).

>> -Original Message-
>> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>> bounces at mailinglists.sqlite.org] On Behalf Of Domingo Alvarez Duarte
>> Sent: Thursday, 17 March, 2016 12:06
>> To: SQLite mailing list
>> Subject: [sqlite] Sqlite do not use index collate nocase
>> 
>> Hello !
>> 
>> In one database I created an index with collate nocase but it seems that
>> sqlite do not recognize it as a candidate index for a select.
>> 
>> Bellow is the schema, the real database has more than a million rows. If I
>> alter the table to include the collation or remove the collation from the
>> index then sqlite does use the index.
>> 
>> I expect an index collate nocase to be valid to queries like the one
>> bellow.
>> 
>> 
>> Cheers !
>> 
>> Schema:
>> 
>> PRAGMA foreign_keys=OFF;
>> BEGIN TRANSACTION;
>> CREATE TABLE 'items' (
>> ??? 'id' integer PRIMARY KEY,
>> ??? 'parent' INTEGER,
>> ??? 'by' text,
>> ??? 'score' integer DEFAULT 0,
>> ??? 'title' text,
>> ??? 'type' text,
>> ??? 'url' text,
>> ??? 'deleted' BOOLEAN DEFAULT 0,
>> ??? 'dead' BOOLEAN DEFAULT 0,
>> ??? 'comment' TEXT DEFAULT null,
>> ??? 'time' integer NOT NULL
>> );
>> CREATE INDEX "items_user_idx" ON "items"("by" COLLATE NOCASE);
>> COMMIT;
>> 
>> explain query plan select * from items? where by='doppp';
>> 
>> SCAN TABLE items



[sqlite] Anyone familiar with SQLite source code: Index creation question

2016-03-19 Thread Richard Hipp
On 3/19/16, Simon Slavin  wrote:
>
> I create the following index on the table:
>   CREATE INDEX t_b ON t (b);
> There are two ways to make the index.
>
> (A) Go through the table in row order adding each row to the index,
> modifying the tree as you go.
>
> (B) Use the same strategy as you would for the command
>   SELECT * FROM t ORDER BY B
> using any existing index or creating a temporary one as required.
>
> Which does SQLite do ?

(B).  It's much much faster that way.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Reserved column names

2016-03-19 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 19/03/16 03:30, Paul Sanderson wrote:
> I know that keywords can be quoted but I am interested in just
> those that can be used unquoted (even if not advisable)

Out of curiousity, why?

My rule of thumb is to always quote (using square brackets) when the
query is generated by code, and only quote where reasonable when the
query is written by a human.

For the APSW shell the following are all quoted:

* zero length names (yes SQLite allows them)

* if it starts with a digit

* if any non-alphanumeric/underscore present (ascii only alphas ok)

* if in the SQLite list of keywords

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlbt2QIACgkQmOOfHg372QRhAQCfc00p/L15AJmx8Zgrr9isuU5H
B3cAnim38/I6S3gNsHQ7WZtJKok+T+sY
=O8pL
-END PGP SIGNATURE-


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread Richard Hipp
On 3/19/16, James K. Lowden  wrote:
>
> Second, you can't test the future.  If the correctness of the code is
> subject to change by the compiler's interpretation of the language, how
> is the programmer to prevent it?
>

Indeed.  Every bit of the code examined by Prof. Regehr was
well-defined according to K&R.  But subsequent revisions of the
C-language standards changed that.  How does one write code that will
comply with language standards that keep changing out from under you?
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread James K. Lowden
On Sat, 19 Mar 2016 02:04:35 -0600
Scott Robison  wrote:

> As he says, there's not real choice between fast and
> > correct
> 
> Except that testing can verify something is correct for a given
> environment.  

That's actually not true, on a couple of levels.  

"[T]esting can be used very effectively to show the presence of
bugs but never to show their absence."
-- EWD303

I think that should be called Dijkstra's Dictum.  It's not just quip;
it's a concise insight into limits of testing versus proving
correctness.  

Second, you can't test the future.  If the correctness of the code is
subject to change by the compiler's interpretation of the language, how
is the programmer to prevent it?  

> > finally drive gcc & friends in the direction of working
> > with their users for a change.  Or make them irrelevant.
> 
> I think they'd continue to be popular with people looking to eek out
> as much performance as possible.

You may be right.  As a consultant I've often felt I was hired to
sprinkle magic pixie performance dust on the system.  People want to
believe that performance is found in tools.  How come there's no -O5?  

In truth, every performance problem I've encountered was a design
problem, often obvious, always unnecessary.  "Use a better compiler"
has never been the solution.  Unloading mounds of unnecessary
processing with a pitchfork is.  

Doubtless there are some well tested, highly stable applications run at
scale, for which 5% is a measurable and meaningful gain.  IMO they're
actually the ones driving UB treatment by compiler writers.  The
other 99% stand to gain from a compiler that emphasizes correctness and
predictable behavior.  

--jkl



[sqlite] Reserved column names

2016-03-19 Thread Richard Hipp
On 3/19/16, Paul Sanderson  wrote:
> Is there a list of reserved column names
>
> I have seen the list or keywords at the link below
>
> http://www.sqlite.org/lang_keywords.html
>
> but for instance create table (abort int) will work fine.
>

There is no list.

The "official" position is that keywords should never be used as
identifiers.  Ever.

SQLite is very forgiving on this point.  It will tolerate the use of
many keywords as identifiers in many different contexts. (The Lemon
parser generator contains special logic to handle this case.) This
grace from the parser is to maximize backwards compatibility.  But you
should not count on it.  If an identifier is a keyword, or looks like
something that might someday become a keyword, then you should quote
it.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] COLLATE NU800_NOCASE as a loadable extension?

2016-03-19 Thread Aleksey Tulinov
On 03/17/2016 07:29 PM, Torleif Haug?deg?rd wrote:

Torleif,

> Now the resultat was this one;
>
> Exception:
> --
>2.1 Date  : Thu, 17 Mar 2016 18:27:00 +0100
>2.2 Address   : 00AA83F5
>2.3 Module Name   : SQLiteExpertPers.exe - (The complete administration 
> tool for SQLite)
>2.4 Module Version: 3.5.92.2512
>2.5 Type  : ESQLiteException
>2.6 Message   : no such collation sequence:_RMNOCASE
>.
>SQL Statement:
>   
> .
>SELECT given FROM NameTable ORDER BY given.
>

Unfortunately i'm not familiar with SQLite Expert, however

 >2.6 Message   : no such collation sequence:_RMNOCASE

Collation provided by nunicode SQLite extension is called NU800_NOCASE 
(or NU800 for case-sensitive collation), that name is need to be used if 
you want nunicode to handle "COLLATE" expressions.

Here is documentation on nunicode SQLite extension: 
https://bitbucket.org/alekseyt/nunicode#markdown-header-sqlite3-extension with 
some usage examples.


[sqlite] error messages from SQLite

2016-03-19 Thread Paul Sanderson
Thanks Tim

I am not concerned with errors from function calls - just errors with
SQL queries at the command line etc.
Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence


On 19 March 2016 at 11:20, Tim Streater  wrote:
> On 19 Mar 2016 at 10:26, Paul Sanderson  
> wrote:
>
>> When executing a sql query I often find I have made a typo or been a
>> bit dull and I'll get an error message back along the lines of
>>
>> Error: near "text": syntax error
>>
>> Is there a way of expanding on this, for instance adding more of the
>> following text or a character offset. The last time this happened I
>> had a create table... command with about 30 text columns and
>> identifying which "text" string the error referred to took a little
>> while.
>
> In my PHP usage of SQLite, I have a shim around the provided functions which 
> logs the SQL in the event of error. The application then displays it too. The 
> shim also expects a small string to identify which actual SQLite call was 
> involved, so I don't have to look through 140 source modules trying to guess 
> what was going on at the time.
>
> --
> Cheers  --  Tim
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] SQLite big tables create index in parallel

2016-03-19 Thread Domingo Alvarez Duarte
Hello !  

Would be nice if sqlite provide a way to create indexes in parallel, I mean
when working with big tables and creating several indexes the time spent
scanning the whole database/table is considerable and it's the same for each
"create index", we could have a big time/cpu/disk seek saving if we could
create several indexes at the same time:  

CREATE INDEXES ON big_table index_1(name), index_2(title), index_3(parent),
index_4(city, country);  

Cheers !



[sqlite] error messages from SQLite

2016-03-19 Thread Tim Streater
On 19 Mar 2016 at 10:26, Paul Sanderson  wrote:

> When executing a sql query I often find I have made a typo or been a
> bit dull and I'll get an error message back along the lines of
>
> Error: near "text": syntax error
>
> Is there a way of expanding on this, for instance adding more of the
> following text or a character offset. The last time this happened I
> had a create table... command with about 30 text columns and
> identifying which "text" string the error referred to took a little
> while.

In my PHP usage of SQLite, I have a shim around the provided functions which 
logs the SQL in the event of error. The application then displays it too. The 
shim also expects a small string to identify which actual SQLite call was 
involved, so I don't have to look through 140 source modules trying to guess 
what was going on at the time.

--
Cheers  --  Tim


[sqlite] Reserved column names

2016-03-19 Thread Paul Sanderson
Is there a list of reserved column names

I have seen the list or keywords at the link below

http://www.sqlite.org/lang_keywords.html

but for instance create table (abort int) will work fine.

I know that keywords can be quoted but I am interested in just those
that can be used unquoted (even if not advisable)

Without going through all of them can someone point me at a list?

Thanks


[sqlite] error messages from SQLite

2016-03-19 Thread Paul Sanderson
When executing a sql query I often find I have made a typo or been a
bit dull and I'll get an error message back along the lines of

Error: near "text": syntax error

Is there a way of expanding on this, for instance adding more of the
following text or a character offset. The last time this happened I
had a create table... command with about 30 text columns and
identifying which "text" string the error referred to took a little
while.

Thanks


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread Scott Robison
On Mar 18, 2016 11:12 PM, "James K. Lowden" 
wrote:
>
> On Fri, 18 Mar 2016 16:33:56 -0600
> Scott Robison  wrote:
>
> > I'd rather have code that might use some "undefined behavior" and
> > generates the right answer than code that always conformed to defined
> > behavior yet was logically flawed.
>
> Code that falls under undefined behavior *is* logically flawed, by
> definition.  Whether or not it works, it's not specified to.  The
> compiler may have generated perfectly correct machine code, but another
> compiler or some future version of your present compiler may not.

Perhaps I should have said "undefined behavior as per the standard". Code
that does what is intended for the intended target environment utilizing a
specific tool chain is not logically flawed just because the standard calls
a construct "undefined behavior".

> You might share my beef with the compiler writers, though: lots things
> that are left undefined shouldn't be.

Not just compiler writers but standards organizations. Lots of overlap to
be sure, but not 100%.

Because hardware architecture
> varies, some practices that do work and have worked and are expected to
> work on a wide variety of machines are UB.  A recent thread on using
> void* for a function pointer is an example: dlsym(2) returns a function
> pointer defined as void*, but the C standard says void* can only refer
> to data, not functions!

So I guess casting between function pointers might be arguably safer if the
void* form of a function was called void(*)(void) though even then there
can be multiple forms of function pointers (like near vs far pointers in
x86 real mode).

>
> Machines exist for which the size of a function pointer is not
> sizeof(void*).  Source code that assumes they are the same size is not
> portable to those architectures.  Fine.  But a particular compiler
> generates code for a particular architecture.  On x86 hardware, all
> pointers have always been and will always be the same size.  All
> Linux/Posix code relies on that, too, along with a host of other
> assumptions. If that ever changed, a boat load of code would have to be
> changed.  Why does the compiler writer feel it's in his interest or
> mine to warn me about that not-happening eventuality?  For the machine
> I'm compilng for, the code is *not* in error.  For some future machine,
> maybe it will be; let's leave that until then.
>
> I was looking at John Regehr's blog the other day.  I think it was
> there that I learned that the practice of dropping UB code on the floor
> has been going on longer than I'd realized; it's just that gcc has been
> more aggressive in recent years.  I think it was there I saw this
> construction:
>
> if( p < p + n)
> error
>
> where p is a pointer.  On lots of architectures, for large n, p + n can
> be negative.  The test works.  Or did.  The C standard says that's
> UB, though. It doesn't promise the pointer will go negative.  It doesn't
> promise it won't.  It doesn't promise not to tell your mother about
> it.  And, in one recent version, it doesn't compile it.  Warning?  No.
> Error? No.  Machine code?  No!  It's UB, so no code is generated (ergo,
> no error handling)!  Even though the hardware instructions that would
> be -- that used to be -- generated work as implied by the code.

Yes, these are the sorts of things that are frustrating.

> Postel's Law is to be liberal in what you accept and conservative in
> what you emit.  The compilers have been practicing the opposite,
> thwarting common longstanding practice just because they "can".
>
> Dan Bernstein is calling for a new C compiler that is 100%
> deterministic: no UB.  All UB per the standard would be defined by the
> compiler.  And maybe a few goodies, like zero-initialized automatic
> (stack) variables.

Neat idea, though undefined behavior isn't always bad. Interesting article
at http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
that goes into some considerations.

>
> Such a compiler would enjoy great popularity, even if it imposed, say,
> a 5% performance penalty, because C programmers would have greater
> confidence in their code working as expected. They'd have some
> assurance that the compiler wouldn't cut them off at the knees in its
> next release.  As he says, there's not real choice between fast and
> correct

Except that testing can verify something is correct for a given environment.

  If the "always defined befavior" compiler got off the ground,
> may it would finally drive gcc & friends in the direction of working
> with their users for a change.  Or make them irrelevant.

I think they'd continue to be popular with people looking to eek out as
much performance as possible.


[sqlite] Article about pointer abuse in SQLite

2016-03-19 Thread James K. Lowden
On Fri, 18 Mar 2016 16:33:56 -0600
Scott Robison  wrote:

> I'd rather have code that might use some "undefined behavior" and
> generates the right answer than code that always conformed to defined
> behavior yet was logically flawed. 

Code that falls under undefined behavior *is* logically flawed, by
definition.  Whether or not it works, it's not specified to.  The
compiler may have generated perfectly correct machine code, but another
compiler or some future version of your present compiler may not.  

You might share my beef with the compiler writers, though: lots things
that are left undefined shouldn't be.  Because hardware architecture
varies, some practices that do work and have worked and are expected to
work on a wide variety of machines are UB.  A recent thread on using
void* for a function pointer is an example: dlsym(2) returns a function
pointer defined as void*, but the C standard says void* can only refer
to data, not functions!  

Machines exist for which the size of a function pointer is not 
sizeof(void*).  Source code that assumes they are the same size is not
portable to those architectures.  Fine.  But a particular compiler
generates code for a particular architecture.  On x86 hardware, all
pointers have always been and will always be the same size.  All
Linux/Posix code relies on that, too, along with a host of other
assumptions. If that ever changed, a boat load of code would have to be
changed.  Why does the compiler writer feel it's in his interest or
mine to warn me about that not-happening eventuality?  For the machine
I'm compilng for, the code is *not* in error.  For some future machine,
maybe it will be; let's leave that until then.  

I was looking at John Regehr's blog the other day.  I think it was
there that I learned that the practice of dropping UB code on the floor
has been going on longer than I'd realized; it's just that gcc has been
more aggressive in recent years.  I think it was there I saw this
construction:

if( p < p + n)
error

where p is a pointer.  On lots of architectures, for large n, p + n can
be negative.  The test works.  Or did.  The C standard says that's
UB, though. It doesn't promise the pointer will go negative.  It doesn't
promise it won't.  It doesn't promise not to tell your mother about
it.  And, in one recent version, it doesn't compile it.  Warning?  No.
Error? No.  Machine code?  No!  It's UB, so no code is generated (ergo,
no error handling)!  Even though the hardware instructions that would
be -- that used to be -- generated work as implied by the code.

Postel's Law is to be liberal in what you accept and conservative in
what you emit.  The compilers have been practicing the opposite,
thwarting common longstanding practice just because they "can".  

Dan Bernstein is calling for a new C compiler that is 100%
deterministic: no UB.  All UB per the standard would be defined by the
compiler.  And maybe a few goodies, like zero-initialized automatic
(stack) variables.  

Such a compiler would enjoy great popularity, even if it imposed, say,
a 5% performance penalty, because C programmers would have greater
confidence in their code working as expected. They'd have some
assurance that the compiler wouldn't cut them off at the knees in its
next release.  As he says, there's not real choice between fast and
correct  If the "always defined befavior" compiler got off the ground,
may it would finally drive gcc & friends in the direction of working
with their users for a change.  Or make them irrelevant.  

--jkl