Re: What is a type error?

2006-07-22 Thread Benjamin Franksen
Darren New wrote:
 Chris Smith wrote:
 Specialized
 language already exist that reliably (as in, all the time) move array
 bounds checking to compile time;
 
 It sounds like this means the programmer has to code up what it means to
 index off an array, yes? Otherwise, I can't imagine how it would work.
 
 x := read_integer_from_stdin();
 write_to_stdout(myarray[x]);
 
 What does the programmer have to do to implement this semantic in the
 sort of language you're talking about? Surely something somewhere along
 the line has to  fail (for some meaning of failure) at run-time, yes?

You should really take a look at Epigram. One of its main features is that
it makes it possible not only to statically /check/ invariants, but also
to /establish/ them.

In your example, of course the program has to check the integer at runtime.
However, in a dependent type system, the type of the value returned from
the check-function can very well indicate whether it passed the test (i.e.
being a valid index for myarray) or not.

Ben
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-19 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  Chris Smith wrote:
  Joachim Durchholz [EMAIL PROTECTED] wrote:
  I *think* I understand Marshall here.  When you are saying assignment,
  you mean assignment to values of attributes within tuples of the cell.
  When Marshall is saying assignment, he seems to mean assigning a
  completely new *table* value to a relation; i.e., wiping out the entire
  contents of the relation and replacing it with a whole new set of
  tuples.  Your assignment is indeed less powerful than DML, whereas
  Marshall's assignment is more powerful than DML.
 
  Exactly.

 Ah well. I never meant that kind of assignment.

Sorry for the confusion.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-18 Thread Joachim Durchholz
Marshall schrieb:
 Chris Smith wrote:
 Joachim Durchholz [EMAIL PROTECTED] wrote:
 I *think* I understand Marshall here.  When you are saying assignment,
 you mean assignment to values of attributes within tuples of the cell.
 When Marshall is saying assignment, he seems to mean assigning a
 completely new *table* value to a relation; i.e., wiping out the entire
 contents of the relation and replacing it with a whole new set of
 tuples.  Your assignment is indeed less powerful than DML, whereas
 Marshall's assignment is more powerful than DML.
 
 Exactly.

Ah well. I never meant that kind of assignment.

Besides, all the aliasing possibilities already happen at the field 
level, so it never occurred to me that whole-table assignment might 
enter into the picture.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-18 Thread Joachim Durchholz
Marshall schrieb:
 No. The variable is the table, not the records.

In your examples, yes.

  Relations are not arrays.

They are, in all ways that matter for aliasing:
They are a collection of mutable data, accessible via selector values.

 Records are not lvalues.

Agreed, but they have identity. An lvalue is an identity that you can 
track in software; record identity is untrackable in standard SQL, but 
it's real enough to create aliasing problems.

 I would propose that variables have identity, and values do not.

What's a variable, by that definition?
It can't be something that has a name in a program, since that would 
exclude variables on the heap.
Another definition would be something that may have a different value 
if I look at it sometime later, but then that would include SQL records 
as well. Besides, the temporal aspect is problematic, since it's unclear 
whether the it some time later is still the same as the it before 
(if you have a copying garbage collector, you'd have to somehow 
establish identity over time, and we're full circle back at defining 
identity).

Personally, I say two things are identical / have the same identity, 
if they are (a) equal and (b) stay equal after changing just one of 
them. That's a nice local definition, and captures exactly those cases 
where aliasing is relevant in the first place (it's actually that 
changing here also changes there aspect that makes aliasing so dangerous).
It also covers SQL records. Well, they do have aliasing problems, or 
problems that are similar enough to them, so I'd say this is a 
surprising but actually interesting and useful result of having that 
definition.

 In part this is via the supplied definition of identity, in which, when
 you change one thing, if something else changes as well, they
 share identity. Since one cannot change values, they necessarily
 lack identity.

I agree that immutable values don't have identity (or, rather, that 
identity isn't an interesting concept for them since it's already 
subsumed under equality).

 It's certainly not storage location:
 if you DELETE a record and then INSERT it with the same values, it may
 be allocated somewhere entirely else, and our intuition would say it's
 not the same (i.e. not identical).
 
 Well, it would depend on how our intuition had been primed. If it
 was via implementation techniques in low level languages, we
 might reach a different conclusion than if our intuition was primed
 via logical models and relation theory.

Sure.

However, since I'm interested in practical consequences for programming, 
I'm looking for definitions that allow me to capture those effects that 
create bugs.
In the case of identity, I'd say that's aliasing.

 Since intuition gives me ambivalent results here, I'll go back to my
 semiformal definition (and take the opportunity to make it a bit more
 precise):
 Two path expressions (lvalues, ...) are aliases if and only if the
 referred-to values compare equal, and if they stay equal after applying
 any operation to the referred-to value through either of the path
 expressions.
 
 Alas, this leaves me confused. I don't see how a path expression
 (in this case, SELECT ... WHERE) can be an l-value. You cannot
 apply imperative operations to the result.

It's not SELECT...WHERE..., it's WHERE... that's an lvalue.
And you can indeed use it with UPDATE, so yes it's an lvalue by my book. 
(A given WHERE clause may become invalid or refer to a different record 
after some updates, so it's not the kind of lvalue that you have in a C 
program.)

  (Also I think the use
 of equality here is too narrow; it is only necessary to show that
 two things both change, not that they change in the same way.)

Sorry for using misleading terminology.
An alias is when two names refer to an identical object (both in real 
life and IMHO in programming), and that's what I wrote.
Aliasing is the kind of problems that you describe.

 I was under the impression you agred that i+2 was not
 a path expression.

Yes, but [i+2] is one.

In the same vein, name = 'John' is an expression, WHERE name = 
'John' is a path expression.

  If our hypothetical language lacks record
 identity, then I would say that any query is simply an expression
 that returns a value, as in i+2.

As long as the data is immutable, yes.
The question is how equality behaves under mutation.

 On the other hand, variable addresses as tangible identities don't hold
 much water anyway.
 Imagine data that's written out to disk at program end, and read back
 in. Further imagine that while the data is read into main memory,
 there's a mechanism that redirects all further reads and writes to the
 file into the read-in copy in memory, i.e. whenever any program changes
 the data, all other programs see the change, too.
 Alternatively, think about software agents that move from machine to
 machine, carrying their data with them. They might be communicating with
 each other, so they need some means of 

Re: What is a type error? [correction]

2006-07-18 Thread Darren New
David Hopwood wrote:
 Darren New wrote:
 
David Hopwood wrote:


public class LoopInitTest {
public static String getString() { return foo; }

public static void main(String[] args) {
String line = getString();
boolean is_last = false;

while (!is_last) {
if (line.charAt(0) == 'q') {
is_last = true;
}

// insert line into inputs (not important for analysis)

if (!is_last) {
line = getString();
}
}
}
}

which compiles without error, because is_last is definitely initialized.

At what point do you think is_last or line would seem to not be
initialized? They're both set at the start of the function, and (given
that it's Java) nothing can unset them.

At the start of the while loop, it's initialized. At the end of the
while loop, it's initialized. So the merge point of the while loop has
it marked as initialized.
 
 
 Apparently, Hermes (at least the version of it described in that paper)
 essentially forgets that is_last has been initialized at the top of the
 loop, and so when it does the merge, it is merging 'not necessarily 
 initialized'
 with 'initialized'.


No, it's not that it forgets. It's that the insert line into inputs 
unitializes line. Hence, line is only conditionally set at the 
bottom of the loop, so it's only conditionally set at the top of the loop.

 This sounds like a pretty easy thing to fix to me (and maybe it was fixed
 later, since there are other papers on Hermes' typestate checking that I
 haven't read yet).

You simply misunderstand the insert line into inputs semantics. Had 
that line actually been commented out in the Hermes code, the loop would 
have compiled without a problem.

That said, in my experience, finding this sort of typestate error 
invariably led me to writing clearer code.

boolean found_ending = false;
while (!found_ending) {
   string line = getString();
   if (line.charAt(0) == 'q')
 found_ending = true;
   else
 insert line into inputs;
}

It seems that's much clearer to me than the contorted version presented 
as the example. If you want it to work like the Java code, where you can 
still access the line variable after the loop, the rearrangement is 
trivial and transparent as well.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error? [correction]

2006-07-18 Thread David Hopwood
Darren New wrote:
 David Hopwood wrote:
 
[...]
 Apparently, Hermes (at least the version of it described in that paper)
 essentially forgets that is_last has been initialized at the top of the
 loop, and so when it does the merge, it is merging 'not necessarily
 initialized' with 'initialized'.
 
 No, it's not that it forgets. It's that the insert line into inputs
 unitializes line. Hence, line is only conditionally set at the
 bottom of the loop, so it's only conditionally set at the top of the loop.
 
 This sounds like a pretty easy thing to fix to me (and maybe it was fixed
 later, since there are other papers on Hermes' typestate checking that I
 haven't read yet).
 
 You simply misunderstand the insert line into inputs semantics.

Yes, you're right, I did misunderstand this.

 Had that line actually been commented out in the Hermes code, the loop would
 have compiled without a problem.
 
 That said, in my experience, finding this sort of typestate error
 invariably led me to writing clearer code.
 
 boolean found_ending = false;
 while (!found_ending) {
   string line = getString();
   if (line.charAt(0) == 'q')
 found_ending = true;
   else
 insert line into inputs;
 }
 
 It seems that's much clearer to me than the contorted version presented
 as the example. If you want it to work like the Java code, where you can
 still access the line variable after the loop, the rearrangement is
 trivial and transparent as well.

I agree.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 Good point. Perhaps I should have said relational algebra +
 variables with assignment. It is interesting to consider
 assignment vs. the more restricted update operators: insert,
 update, delete.
 Actually I see it the other way round: assignment is strictly less
 powerful than DML since it doesn't allow creating or destroying
 variables, while UPDATE does cover assignment to fields.
 
 Oh, my.
 
 Well, for all table variables T, there exists some pair of
 values v and v', such that we can transition the value of
 T from v to v' via assignment, but not by any single
 insert, update or delete.

I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any 
record, so it's doing exactly what an assignment does. INSERT/DELETE can 
create resp. destroy records, which is what new and delete operators 
would do.

I must really be missing the point.

  Further, it is my understanding
 that your claim of row identity *depends* on the restricted
 nature of DML; if the only mutator operation is assignment,
 then there is definitely no record identity.

Again, I fail to connect.

I and others have given aliasing examples that use just SELECT and UPDATE.

 (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
 at which point there is not much of a difference.)
 
 I am not sure what this means. Delete can be expressed in
 terms of assignment. (As can insert and update.)

INSERT cannot be expressed in terms of assignment. INSERT creates a new 
record; there's no way that assignment in a language like C can create a 
new data structure!
The same goes for DELETE.

  (Assignment can also be expressed in terms of insert and delete.)

Agreed.

I also realize that this makes it a bit more difficult to nail down the 
nature of identity in a database. It's certainly not storage location: 
if you DELETE a record and then INSERT it with the same values, it may 
be allocated somewhere entirely else, and our intuition would say it's 
not the same (i.e. not identical). (In a system with OID, it would 
even be impossible to recreate such a record, since it would have a 
different OID. I'm not sure whether this makes OID systems better or 
worse at preserving identity, but that's just a side track.)

Since intuition gives me ambivalent results here, I'll go back to my 
semiformal definition (and take the opportunity to make it a bit more 
precise):
Two path expressions (lvalues, ...) are aliases if and only if the 
referred-to values compare equal, and if they stay equal after applying 
any operation to the referred-to value through either of the path 
expressions.

In the context of SQL, this means that identity isn't the location where 
the data is stored. It's also not the values stored in the record - 
these may change, including key data. SQL record identity is local, it 
can be defined from one operation to the next, but there is no such 
thing as a global identity that one can memorize and look up years 
later, without looking at the intermediate states of the store.

It's a gross concept, now that I think about it. Well, or at least 
rather alien for us programmers, who are used to taking the address of a 
variable to get a tangible identity that will stay stable over time.

On the other hand, variable addresses as tangible identities don't hold 
much water anyway.
Imagine data that's written out to disk at program end, and read back 
in. Further imagine that while the data is read into main memory, 
there's a mechanism that redirects all further reads and writes to the 
file into the read-in copy in memory, i.e. whenever any program changes 
the data, all other programs see the change, too.
Alternatively, think about software agents that move from machine to 
machine, carrying their data with them. They might be communicating with 
each other, so they need some means of establishing identity 
(addressing) the memory buffers that they use for communication.

  I don't know what new would be in a value-semantics, relational
 world.

It would be INSERT.

Um, my idea of value semantics is associated with immutable values. 
SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

So by my definition, SQL doesn't have value semantics, by your 
definition, it would have value semantics but updates which are enough 
to create aliasing problems, so I'm not sure what point you're making 
here...

 Filters are just like array indexing: both select a subset of variables
 from a collection.
 
 I can't agree with this wording. A filter produces a collection
 value from a collection value. I don't see how variables
 enter in to it.

A collection can consist of values or variables.

And yes, I do think that WHERE is a selection over a bunch of variables 
- you can update records after all, so they are variables! They don't 
have a name, at least none which is guaranteed to be 

Re: What is a type error?

2006-07-17 Thread Rob Warnock
Joachim Durchholz  [EMAIL PROTECTED] wrote:
+---
| INSERT cannot be expressed in terms of assignment. INSERT creates a new 
| record; there's no way that assignment in a language like C can create a 
| new data structure!  The same goes for DELETE.
+---

Well, what about malloc()  free()? I mean, if your
database is a simple linked list, then INSERT is just:

ROW *p = malloc(sizeof *p);
p-field1 = value1;
p-field2 = value2;
...
p-fieldN = valueN;
database = cons(p, database);   /* Common Lisp's CONS */

and DELETE is just:

ROW *p = find_if(predicate_function, database); /* CL's FIND-IF */
database = delete(p, database); /* CL's DELETE */
free(p);

[There are single-pass methods, of course, but...]


-Rob

-
Rob Warnock [EMAIL PROTECTED]
627 26th Avenue URL:http://rpw3.org/
San Mateo, CA 94403 (650)572-2607

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Joachim Durchholz
Chris Smith schrieb:
 David Hopwood [EMAIL PROTECTED] wrote:
 Chris Smith wrote:
 If checked by execution, yes.  In which case, I am trying to get my head 
 around how it's any more true to say that functional languages are 
 compilable postconditions than to say the same of imperative languages.
 
 A postcondition must, by definition [*], be a (pure) assertion about the
 values that relevant variables will take after the execution of a subprogram.
 
 Okay, my objection was misplaced, then, as I misunderstood the original 
 statement.  I had understood it to mean that programs written in pure 
 functional languages carry no additional information except for their 
 postconditions.

Oh, but that's indeed correct for pure functional languages. (Well, they 
*might* carry additional information anyway, but it's not a requirement 
to make it a programming language.)

The answer that I have for your original objection may be partial, but 
here goes anyway:

Postconditions cannot easily capture all side effects.
E.g. assume there's a file-open routine but no way to test whether a 
given file handle was ever opened (callers are supposed to test the 
return code from the open() call).
In an imperative language, that's a perfectly valid (though possibly not 
very good) library design.
Now the postcondition would have to say something like from now on, 
read() and write() are valid on the filehandle that I returned. I know 
of no assertion language that can express such temporal relationships, 
and even if there is (I'm pretty sure there is), I'm rather sceptical 
that programmers would be able to write correct assertions, or correctly 
interpret them - temporal logic offers several traps for the unwary. 
(The informal postcondition above certainly isn't complete, since it 
doesn't cover close(); I shunned the effort to get a wording that would 
correctly cover sequences like open()-close()-open(), or 
open()-close()-close()-open().)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Joachim Durchholz
Rob Warnock schrieb:
 Joachim Durchholz  [EMAIL PROTECTED] wrote:
 +---
 | INSERT cannot be expressed in terms of assignment. INSERT creates a new 
 | record; there's no way that assignment in a language like C can create a 
 | new data structure!  The same goes for DELETE.
 +---
 
 Well, what about malloc()  free()?

These do exactly what assignment cannot do.

The correspondence between SQL and C would be:
   INSERT - malloc() (create a new data structure)
   DELETE - free() (destroy a data structure)
   UPDATE - assignment (change data within a data structure)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Chris Smith
Joachim Durchholz [EMAIL PROTECTED] wrote:
 I fail to see an example that would support such a claim.
 
 On the other hand, UPDATE can assign any value to any field of any 
 record, so it's doing exactly what an assignment does. INSERT/DELETE can 
 create resp. destroy records, which is what new and delete operators 
 would do.
 
 I must really be missing the point.

I *think* I understand Marshall here.  When you are saying assignment, 
you mean assignment to values of attributes within tuples of the cell.  
When Marshall is saying assignment, he seems to mean assigning a 
completely new *table* value to a relation; i.e., wiping out the entire 
contents of the relation and replacing it with a whole new set of 
tuples.  Your assignment is indeed less powerful than DML, whereas 
Marshall's assignment is more powerful than DML.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Marshall
Chris Smith wrote:
 Joachim Durchholz [EMAIL PROTECTED] wrote:
  I fail to see an example that would support such a claim.
 
  On the other hand, UPDATE can assign any value to any field of any
  record, so it's doing exactly what an assignment does. INSERT/DELETE can
  create resp. destroy records, which is what new and delete operators
  would do.
 
  I must really be missing the point.

 I *think* I understand Marshall here.  When you are saying assignment,
 you mean assignment to values of attributes within tuples of the cell.
 When Marshall is saying assignment, he seems to mean assigning a
 completely new *table* value to a relation; i.e., wiping out the entire
 contents of the relation and replacing it with a whole new set of
 tuples.  Your assignment is indeed less powerful than DML, whereas
 Marshall's assignment is more powerful than DML.

Exactly.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  Joachim Durchholz wrote:
  Marshall schrieb:
  Good point. Perhaps I should have said relational algebra +
  variables with assignment. It is interesting to consider
  assignment vs. the more restricted update operators: insert,
  update, delete.
  Actually I see it the other way round: assignment is strictly less
  powerful than DML since it doesn't allow creating or destroying
  variables, while UPDATE does cover assignment to fields.
 
  Oh, my.
 
  Well, for all table variables T, there exists some pair of
  values v and v', such that we can transition the value of
  T from v to v' via assignment, but not by any single
  insert, update or delete.

 I fail to see an example that would support such a claim.

variable T : unary relation of int
T = { 1, 2, 3 };  // initialization
T := { 4, 5 };   // assignment

The above transition of the value of T cannot be be
done by any one single insert, update or delete.
Two would suffice, however. (In fact, any assignement
can be modeled at a full delete followed by an insert
of the new value.)


 On the other hand, UPDATE can assign any value to any field of any
 record,

Yes.

 so it's doing exactly what an assignment does.

No. The variable is the table, not the records. Relations are not
arrays.
Records are not lvalues.


 INSERT/DELETE can
 create resp. destroy records, which is what new and delete operators
 would do.

 I must really be missing the point.

   Further, it is my understanding
  that your claim of row identity *depends* on the restricted
  nature of DML; if the only mutator operation is assignment,
  then there is definitely no record identity.

 Again, I fail to connect.

 I and others have given aliasing examples that use just SELECT and UPDATE.

Sure, but update's semantics are defined in a per-record way,
which is consistent with record identity. Assignment's isn't.


  (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
  at which point there is not much of a difference.)
 
  I am not sure what this means. Delete can be expressed in
  terms of assignment. (As can insert and update.)

 INSERT cannot be expressed in terms of assignment. INSERT creates a new
 record; there's no way that assignment in a language like C can create a
 new data structure!
 The same goes for DELETE.

I was intendind to be discussing a hypothetical relation-based
language,
so while I generally agree with you statement about C, I don't see
how it applies.


   (Assignment can also be expressed in terms of insert and delete.)

 Agreed.

 I also realize that this makes it a bit more difficult to nail down the
 nature of identity in a database.

I would propose that variables have identity, and values do not.
In part this is via the supplied definition of identity, in which, when
you change one thing, if something else changes as well, they
share identity. Since one cannot change values, they necessarily
lack identity.


 It's certainly not storage location:
 if you DELETE a record and then INSERT it with the same values, it may
 be allocated somewhere entirely else, and our intuition would say it's
 not the same (i.e. not identical).

Well, it would depend on how our intuition had been primed. If it
was via implementation techniques in low level languages, we
might reach a different conclusion than if our intuition was primed
via logical models and relation theory.


 (In a system with OID, it would
 even be impossible to recreate such a record, since it would have a
 different OID. I'm not sure whether this makes OID systems better or
 worse at preserving identity, but that's just a side track.)

OIDs are something of a kludge, and they break set semantics.


 Since intuition gives me ambivalent results here, I'll go back to my
 semiformal definition (and take the opportunity to make it a bit more
 precise):
 Two path expressions (lvalues, ...) are aliases if and only if the
 referred-to values compare equal, and if they stay equal after applying
 any operation to the referred-to value through either of the path
 expressions.

Alas, this leaves me confused. I don't see how a path expression
(in this case, SELECT ... WHERE) can be an l-value. You cannot
apply imperative operations to the result. (Also I think the use
of equality here is too narrow; it is only necessary to show that
two things both change, not that they change in the same way.)

I was under the impression you agred that i+2 was not
a path expression. If our hypothetical language lacks record
identity, then I would say that any query is simply an expression
that returns a value, as in i+2.


 In the context of SQL, this means that identity isn't the location where
 the data is stored. It's also not the values stored in the record -
 these may change, including key data. SQL record identity is local, it
 can be defined from one operation to the next, but there is no such
 thing as a global identity that one can memorize and look up years
 

Re: What is a type error?

2006-07-17 Thread Darren New
Chris Smith wrote:

 Darren New [EMAIL PROTECTED] wrote:
 
I'm not sure what linear or uniqueness typing is. It's typestate, and if 
I remember correctly the papers I read 10 years ago, the folks at 
TJWatson that invented Hermes also invented the concept of typestate. 
They at least claim to have coined the term.
 
 Coining the term is one thing, but I feel pretty confident that the idea 
 was not invented in 1986 with the Hermes language, but rather far 
 earlier.

Yes. However, the guys who invented Hermes didn't come up with it out of 
the blue. It was around (in NIL - Network Implementation Language) for 
years beforehand. I read papers about these things in graduate school, 
but I don't know where my photocopies are.

NIL was apparently quite successful, but a niche language, designed by 
IBM for programming IBM routers. Hermes was an attempt years later to 
take the same successful formula and turn it into a general-purpose 
programming system, which failed (I believe) for the same reason that a 
general purpose operating system that can't run C programs will fail.

 Perhaps they may have invented the concept of considering it 
 any different from other applications of types, though.  

 From what I can determine, the authors seem to imply that typestate is 
dataflow analysis modified in (at least) two ways:

1) When control flow joins, the new typestate is the intersection of 
typestates coming into the join, where as dataflow analysis doesn't 
guarantee that. (They imply they think dataflow analysis is allowed to 
say the variable might or might not be initialized here, while 
typestate would ensure the variable is uninitialized.)

2) The user has control over the typestate, so the user can (for 
exmaple) assert a variable is uninitialized at some point, and by doing 
so, make it so.

How this differs from theoretical lambda types and all I couldn't say.

 What is being named here is the overcoming of a limitation that 
 programming language designers imposed upon themselves, whether from not 
 understanding the theoretical research or not believing it important, I 
 don't know.

I believe there's also a certain level of common-senseness needed to 
make a language even marginally popular. :-)  While it's possible that 
there's really no difference between type and typestate at the 
theoretical level, I think most practical programmers would have trouble 
wrapping their head around that, just as programming in an entirely 
recursive pattern when one is used to looping can be disorienting.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Darren New
Joachim Durchholz wrote:
 of no assertion language that can express such temporal relationships, 
 and even if there is (I'm pretty sure there is), I'm rather sceptical 
 that programmers would be able to write correct assertions, or correctly 
 interpret them - temporal logic offers several traps for the unwary. 

FWIW, this is exactly the area to which LOTOS (Language Of Temporal 
Orderering Specifications) is targetted at. It's essentially based on 
CSP, but somewhat extended. It's pretty straightforward to learn and 
understand, too.  Some have even added realtime constraints to it.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
 We seem to have slipped back from the hypothetical relation language
 with only assignement back to SQL.

I missed the point where we started discussing such a language.  I 
suspect it was while some of us were still operating under the 
misconception that you assignment to attributes of tuples, rather than 
to entire relations.

I don't see how such a language (limited to assignment of entire 
relations) is particularly helpful to consider.  If the relations are to 
be considered opaque, then there's clearly no aliasing going on.  
However, such a language doesn't seem to solve any actual problems.  It 
appears to be nothing other than a toy language, with a fixed finite set 
of variables having only value semantics, no scope, etc.  I assume that 
relational databases will have the equivalent of SQL's update statement; 
and if that's not the case, then I would need someone to explain how to 
accomplish the same goals in the new relational language; i.e. it would 
need some way of expressing transformations of relations, not just 
complete replacement of them with new relations that are assumed to 
appear out of thin air.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Darren New
Marshall wrote:
 I would propose that variables have identity, and values do not.
 In part this is via the supplied definition of identity, in which, when
 you change one thing, if something else changes as well, they
 share identity.

Maybe you gave a better definition the first time, but this one doesn't 
really hold up.

 of equality here is too narrow; it is only necessary to show that
 two things both change, not that they change in the same way.)

If I change X, then Y[X] changes also. I don't think X is identical to Y 
or Y[X], nor is it an alias of either.  I think that's where the 
equality comes into it.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Joe Marshall

Marshall wrote:

 I am having a hard time with this very broad definition of aliasing.

How about this definition:  Consider three variables, i, j, and k, and
a functional equivalence predicate (EQUIVALENT(i, j) returns true if
for every pure function F, F(i) = F(j)).  Now suppose i and j are
EQUIVALENT at some point, then a side effecting function G is invoked
on k, after which i and j are no longer equivalent.  Then there is
aliasing.

This is still a little awkward, but there are three main points:
  1.  Aliasing occurs between variables (named objects).
  2.  It is tied to the notion of equivalence.
  3.  You can detect it when a procedure that has no access to a value
can nonetheless modify the value.

In a call-by-value language, you cannot alias values directly, but if
the values are aggregate data structures (like in Java), you may be
able to modify a shared subcomponent.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Marshall
Chris Smith wrote:
 Marshall [EMAIL PROTECTED] wrote:
  We seem to have slipped back from the hypothetical relation language
  with only assignement back to SQL.

 [...]
 I don't see how such a language (limited to assignment of entire
 relations) is particularly helpful to consider.

I find it useful to consider various language features together and
individually, to see how these features interact. If nothing else,
it furthers my understanding of how languages work.


 If the relations are to
 be considered opaque, then there's clearly no aliasing going on.

Not certain I understand, but I think I agree.


 However, such a language doesn't seem to solve any actual problems.  It
 appears to be nothing other than a toy language, with a fixed finite set
 of variables having only value semantics, no scope, etc.

No, such a language is entirely useful, since relational assignment
is *more* expressive than insert/update/delete. (How did you get no
scope
by the way? And fixed variables? All I said was to change ins/upd/del
to
assignment.)

Also note one could fairly easily write a macro for ins/upd/del if one
had assignement. (See below.)

Now, there are some significant issues with trying to produce a
performant implementation in a distributed environment with strictness.
In fact I will go so far as to say it's impossible. However on a single
machine I don't see any reason to think it would perform any worse,
(although I haven't thought about it deeply.)


 I assume that
 relational databases will have the equivalent of SQL's update statement;
 and if that's not the case, then I would need someone to explain how to
 accomplish the same goals in the new relational language; i.e. it would
 need some way of expressing transformations of relations, not just
 complete replacement of them with new relations that are assumed to
 appear out of thin air.

Consider:

i := i + 1;

Note that the new value of i didn't just appear out of thin air; it was
in fact based on the previous value of i.

Given:
variable T : relation of r;
update_fn : r - r   // takes one record and returns new, updated
record
filter_fn : r - boolean  // indicator function

insert(T, new_rows) = { T := T union new_rows; }

update(T, update_fn, filter_fn) = {
  transaction {
let Tmp = filter(T, filter_fn);  // select only those rows to
update
T := T minus Tmp;
T := T union update_fn(Tmp);
  }
}

delete(T,filter_fn) = { T := T minus filter(T, filter_fn); }

So we can define insert, update and delete in terms of relational
assignment, relational subtraction, and relational union. Type
checking details omitted.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread David Hopwood
Darren New wrote:
 From what I can determine, the authors seem to imply that typestate is
 dataflow analysis modified in (at least) two ways:
 
 1) When control flow joins, the new typestate is the intersection of
 typestates coming into the join, where as dataflow analysis doesn't
 guarantee that. (They imply they think dataflow analysis is allowed to
 say the variable might or might not be initialized here, while
 typestate would ensure the variable is uninitialized.)

Right, but this is a disadvantage of their typestate algorithm. It is why
the example in Figure 2 of
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue4/spe950wk.pdf
fails to check, even though it obviously initializes all variables.

Consider the equivalent Java program:

public class LoopInitTest {
public static void main(String[] args) {
boolean b;
int i;

while (true) {
b = true;
if (b) {
v = 1;
}
v = v + 1;
}
}
}

As it happens, javac rejects this:

LoopInitTest.java:12: variable v might not have been initialized
v = v + 1;
^

but for a different and more trivial reason than the Hermes algorithm.
Change if (b) { v = 1; } to just v = 1;, and the Java version will be
accepted by its definite assignment analysis (which is a dataflow analysis),
but the equivalent Hermes program still would not.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
  If the relations are to
  be considered opaque, then there's clearly no aliasing going on.
 
 Not certain I understand, but I think I agree.

My condition, though, was that relations be opaque.  Since you will be 
violating that condition further on down, I just felt that it's useful 
to point that out here.

 No, such a language is entirely useful, since relational assignment
 is *more* expressive than insert/update/delete.

Assignment is more powerful *as* assignment.  However, it is less 
powerful when the task at hand is deriving new relations from old ones.  
Assignment provides absolutely no tools for doing that.  I thought you 
were trying to remove those tools from the language entirely in order to 
remove the corresponding aliasing problems.  I guess I was wrong, since 
you make it clear below that you intend to keep at least basic set 
operations on relations in your hypothetical language.

 Consider:
 
 i := i + 1;
 
 Note that the new value of i didn't just appear out of thin air; it was
 in fact based on the previous value of i.

Right.  That's exactly the kind of thing I thought you were trying to 
avoid.

 So we can define insert, update and delete in terms of relational
 assignment, relational subtraction, and relational union. Type
 checking details omitted.

Then the problem is in the step where you assign the new relation to the 
old relational variable.  You need to check that the new relation 
conforms to the invariants that are expressed on that relational 
variable.  If you model this as assignment of relations (or relation 
values... I'm unclear on the terminology at this point) then naively 
this requires scanning through an entire set of relations in the 
constraint, to verify that the invariant holds.  You've may have avoided 
aliasing in any conventional sense of the word by stretching the word 
itself beyond breaking... but you've only done it by proactively 
accepting its negative consequences.

It remains non-trivial to scan through a 2 GB database table to verify 
that some attribute of every tuple matches some attribute of another 
table, even if you call the entire thing one relational variable.  The 
implementation, of course, isn't at all going to make a copy of the 
entire (possibly several GB) relation and rewrite it all every time it 
makes a change, and it isn't going to give up and rescan all possible 
invariants every time every change is made.  In other words, you've 
risen to a layer of abstraction where the aliasing problem does not 
exist.  The implementation is still going to deal with the aliasing 
problem, which will resurface once you pass over to the other side of 
the abstraction boundary.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error? [correction]

2006-07-17 Thread David Hopwood
David Hopwood wrote:
 Darren New wrote:
 
From what I can determine, the authors seem to imply that typestate is
dataflow analysis modified in (at least) two ways:

1) When control flow joins, the new typestate is the intersection of
typestates coming into the join, where as dataflow analysis doesn't
guarantee that. (They imply they think dataflow analysis is allowed to
say the variable might or might not be initialized here, while
typestate would ensure the variable is uninitialized.)
 
 Right, but this is a disadvantage of their typestate algorithm. It is why
 the example in Figure 2 of
 http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue4/spe950wk.pdf
 fails to check, even though it obviously initializes all variables.
 
 Consider the equivalent Java program:

I mixed up Figures 1 and 2. Here is the Java program that Figure 2 should
be compared to:

public class LoopInitTest {
public static String getString() { return foo; }

public static void main(String[] args) {
String line = getString();
boolean is_last = false;

while (!is_last) {
if (line.charAt(0) == 'q') {
is_last = true;
}

// insert line into inputs (not important for analysis)

if (!is_last) {
line = getString();
}
}
}
}

which compiles without error, because is_last is definitely initialized.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error? [correction]

2006-07-17 Thread Darren New
David Hopwood wrote:
 
 public class LoopInitTest {
 public static String getString() { return foo; }
 
 public static void main(String[] args) {
 String line = getString();
 boolean is_last = false;
 
 while (!is_last) {
 if (line.charAt(0) == 'q') {
 is_last = true;
 }
 
 // insert line into inputs (not important for analysis)
 
 if (!is_last) {
 line = getString();
 }
 }
 }
 }
 
 which compiles without error, because is_last is definitely initialized.

At what point do you think is_last or line would seem to not be 
initialized? They're both set at the start of the function, and (given 
that it's Java) nothing can unset them.

At the start of the while loop, it's initialized. At the end of the 
while loop, it's initialized. So the merge point of the while loop has 
it marked as initialized.

Now, if the insert line into inputs actually unset line, then yes, 
you're right, Hermes would complain about this.

Alternately, if you say
if (x) v = 1;
if (x) v += 1;
then Hermes would complain when it wouldn't need to. However, that's 
more a limitation of the typestate checking algorithms than the concept 
itself; that is to say, clearly the typestate checker could be made 
sufficiently intelligent to track most simple versions of this problem 
and not complain, by carrying around conditionals in the typestate 
description.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error? [correction]

2006-07-17 Thread Darren New
Darren New wrote:
 Now, if the insert line into inputs actually unset line, then yes, 
 you're right, Hermes would complain about this.

Oh, I see. You translated from Hermes into Java, and Java doesn't have 
the insert into statement. Indeed, the line you commented out is 
*exactly* what's important for analysis, as it unsets line.

Had it been
   insert copy of line into inputs
then you would not have gotten any complaint from Hermes, as it would 
not have unset line there.

In this case, it's equivalent to
if (!is_line) line = getString();
if (!is_line) use line for something...
except the second test is at the top of the loop instead of the bottom.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Marshall
Chris Smith wrote:
 Marshall [EMAIL PROTECTED] wrote:
   If the relations are to
   be considered opaque, then there's clearly no aliasing going on.
 
  Not certain I understand, but I think I agree.

 My condition, though, was that relations be opaque.  Since you will be
 violating that condition further on down, I just felt that it's useful
 to point that out here.

  No, such a language is entirely useful, since relational assignment
  is *more* expressive than insert/update/delete.

 Assignment is more powerful *as* assignment.  However, it is less
 powerful when the task at hand is deriving new relations from old ones.

At the implementation level, it makes some things harder, however
as a logical model, it is more powerful. While this is very much a real
world issue, it is worth noting that it is a performance issue *merely*
and not a semantic issue.


 Assignment provides absolutely no tools for doing that.  I thought you
 were trying to remove those tools from the language entirely in order to
 remove the corresponding aliasing problems.  I guess I was wrong, since
 you make it clear below that you intend to keep at least basic set
 operations on relations in your hypothetical language.

  Consider:
 
  i := i + 1;
 
  Note that the new value of i didn't just appear out of thin air; it was
  in fact based on the previous value of i.

 Right.  That's exactly the kind of thing I thought you were trying to
 avoid.

I was under the impression tat Joachim, for example, did not
consider i+1 as an alias for i.


  So we can define insert, update and delete in terms of relational
  assignment, relational subtraction, and relational union. Type
  checking details omitted.

 Then the problem is in the step where you assign the new relation to the
 old relational variable.  You need to check that the new relation
 conforms to the invariants that are expressed on that relational
 variable.  If you model this as assignment of relations (or relation
 values... I'm unclear on the terminology at this point) then naively
 this requires scanning through an entire set of relations in the
 constraint, to verify that the invariant holds.  You've may have avoided
 aliasing in any conventional sense of the word by stretching the word
 itself beyond breaking... but you've only done it by proactively
 accepting its negative consequences.

 It remains non-trivial to scan through a 2 GB database table to verify
 that some attribute of every tuple matches some attribute of another
 table, even if you call the entire thing one relational variable.  The
 implementation, of course, isn't at all going to make a copy of the
 entire (possibly several GB) relation and rewrite it all every time it
 makes a change, and it isn't going to give up and rescan all possible
 invariants every time every change is made.  In other words, you've
 risen to a layer of abstraction where the aliasing problem does not
 exist.  The implementation is still going to deal with the aliasing
 problem, which will resurface once you pass over to the other side of
 the abstraction boundary.

Yes, these *performance* issues make assignment prohibitive for
real-world use, at least if we are talking about data management
in the large. This is not the same thing as saying the resulting
language is a toy language, though; its semantics are quite
interesting and possibly a better choice for *defining* the semantics
of the imperative operations than directly modelling the imperative
operations. (Or maybe not.) In any event, it's worth thinking about,
even if performance considerations make it not worth implementing.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-17 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
 Yes, these *performance* issues make assignment prohibitive for
 real-world use, at least if we are talking about data management
 in the large. This is not the same thing as saying the resulting
 language is a toy language, though; its semantics are quite
 interesting and possibly a better choice for *defining* the semantics
 of the imperative operations than directly modelling the imperative
 operations. (Or maybe not.) In any event, it's worth thinking about,
 even if performance considerations make it not worth implementing.

My toy language comment was directed at a language that I mistakenly 
thought you were proposing, but that you really weren't.  You can ignore 
it, and all the corresponding comments about assignment being less 
powerful, etc.  I was apparently not skilled at communication when I 
tried to say that in the last message.

It is, perhaps, worth thinking about.  My assertion here (which I think 
I've backed up, but there's been enough confusion that I'm not surprised 
if it was missed) is that the underlying reasons that performance might 
be poor for this language are a superset of the performance problems 
caused by aliasing.  Hence, when discussing the problems caused by 
aliasing for the performance of language implementations (which I 
believe was at some point the discussion here), this isn't a 
particularly useful example.

It does, though, have the nice property of hiding the aliasing from the 
semantic model.  That is interesting and worth considering, but is a 
different conversation; and I don't know how to start it.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error? [correction]

2006-07-17 Thread David Hopwood
Darren New wrote:
 David Hopwood wrote:
 
 public class LoopInitTest {
 public static String getString() { return foo; }

 public static void main(String[] args) {
 String line = getString();
 boolean is_last = false;

 while (!is_last) {
 if (line.charAt(0) == 'q') {
 is_last = true;
 }

 // insert line into inputs (not important for analysis)

 if (!is_last) {
 line = getString();
 }
 }
 }
 }

 which compiles without error, because is_last is definitely initialized.
 
 At what point do you think is_last or line would seem to not be
 initialized? They're both set at the start of the function, and (given
 that it's Java) nothing can unset them.
 
 At the start of the while loop, it's initialized. At the end of the
 while loop, it's initialized. So the merge point of the while loop has
 it marked as initialized.

Apparently, Hermes (at least the version of it described in that paper)
essentially forgets that is_last has been initialized at the top of the
loop, and so when it does the merge, it is merging 'not necessarily initialized'
with 'initialized'.

This sounds like a pretty easy thing to fix to me (and maybe it was fixed
later, since there are other papers on Hermes' typestate checking that I
haven't read yet).

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 I would say, records in SQL have value, and their
 identity is exactly their value.
 
 Definitely not. You can have two equal records and update just one of
 them, yielding non-equal records; by my definition (and by intuition),
 this means that the records were equal but not identical.
 
 This sort of thing comes up when one has made the mistake of
 not defining any keys on one's table and needs to rectify the
 situation. It is in fact considered quite a challenge to do.
 My preferred technique for such a situation is to create
 a new table with the same columns and SELECT DISTINCT ...
 INTO ... then recreate the original table. So that way doesn't
 fit your model.

I was mentioning multiple equal records just as an example where the 
records have an identity that's independent of the record values.

 How else might you do it? I suppose there are nonstandard
 extensions, such as UPDATE ... LIMIT. And in fact there
 might be some yucky thing that made it in to the standard
 that will work.

Right, it might be difficult to update multiple records that are exactly 
the same. It's not what SQL was intended for, and difficult to do anyway 
- I was thinking of LIMIT (I wasn't aware that it's nonstandard), and I 
agree that there may be other ways to do it.

However, I wouldn't overvalue that case. The Jane/John Doe example 
posted elsewhere in this thread is strictly within the confines of what 
SQL was built for, yet there is aliasing.

 But what you descrbe is certainly *not* possible in the
 relational algebra; alas that SQL doesn't hew closer
 to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.

  Would you also agree that
 if a language *did* adhere to relation semantics,
 then relation elements would *not* have identity?
 (Under such a language, one could not have two
 equal records, for example.)

Any identity that one could define within relational semantics would be 
just equality, so no, it wouldn't be very helpful (unless as part of a 
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even 
observable) only when there's updates, and relational algebra doesn't 
have updates.

   I do not see that they have
 any identity outside of their value. We can uniquely identify
 any particular record via a key, but a table may have more
 than one key, and an update may change the values of one
 key but not another. So it is not possible in general to
 definitely and uniquely assign a mapping from each record
 of a table after an update to each record of the table before
 the update, and if you can't do that, then where
 is the record identity?
 Such a mapping is indeed possible. Simply extend the table with a new
 column, number the columns consecutively, and identify the records via
 that column.
 
 That doesn't work for me. It is one thing to say that for all
 tables T, for all elements e in T, e has identity. It is a different
 thing to say that for all tables T, there exists a table T' such
 that for all elements e in T', e has identity.

Let me continue that argument:
Records in T' have identity.
 From an SQL point-of-view, there's no difference between the records in 
T' and records in other tables, so they must share all properties, in 
particular that of having an identity.

(I agree that's not as convincing as seeing aliasing in action. See below.)

 But even if you don't do that, there's still identity. It is irrelevant
 whether the programs can directly read the value of the identity field;
 the adverse effects happen because updates are in-place. (If every
 update generated a new record, then we'd indeed have no identity.)

 Okay. At this point, though, the term aliasing has become extremely
 general. I believe i+1+1 is an alias for i+2 under this definition.
 No, i+1+1 isn't an alias in itself. It's an expression - to be an
 alias, it would have to be a reference to something.

 However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
 important - 1+1 is replacable by 2 in every context, so this is
 essentially the same as saying a[i+2] is an alias of a[i+2], which is
 vacuously true.
 
 To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the 
selector part of a reference into a table.

 A query is simply an expression that makes mention of a variable.
 However I would have to admit that if SQL table elements have
 identity, it would be more like the array example.

OK.

 (The model of SQL I use in my head is one with set semantics,
 and I am careful to avoid running in to bag semantics by, for
 example, always defining at least one key, and using SELECT
 DISTINCT where necessary. But it is true that the strict set
 semantics in my head are not the wobbly bag semantics in
 the standard.)

Set-vs.-bag doesn't affect SQL's 

Re: What is a type error?

2006-07-16 Thread Chris F Clark
Marshall [EMAIL PROTECTED] wrote:
 In general, I feel that records are not the right conceptual
 level to think about.

Unfortunately, they are the right level.  Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go.  The right level
is the lowest level at which the programmer can see and manipulate
values.  Thus, since a programmer can see and manipulate fields within
records in SQL that is the level we need to concern ourselves with
when talking about aliasing in SQL.  In other words, since a SELECT
(or UPDATE) statement can talk about specific fields of specific
records within the table (not directly, but reliably, which I'll
explain in a moment) then those are the primitive objects in SQL
that can be aliased.

What I meant by not directly, but reliably:

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values. You do
not necessarily expect the records to be returned in the same order,
but you do expect the same set of records to be returned and the
records to have the same values in the same fields.  Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values.  If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that.  One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records.  However, I don't know
if that's allowed in the relational algebra either. (I think I finally
see your point, and more on that later.)

 I do not see that they have
 any identity outside of their value. We can uniquely identify
 any particular record via a key, but a table may have more
 than one key, and an update may change the values of one
 key but not another. So it is not possible in general to
 definitely and uniquely assign a mapping from each record
 of a table after an update to each record of the table before
 the update, and if you can't do that, then where
 is the record identity?

I don't know if your point of having two keys within one table amkes a
difference.  If one can uniquely identify a record, then it has
identity.  The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

 But what you descrbe is certainly *not* possible in the
 relational algebra; alas that SQL doesn't hew closer
 to it. Would you agree? Would you also agree that
 if a language *did* adhere to relation semantics,
 then relation elements would *not* have identity?
 (Under such a language, one could not have two
 equal records, for example.)

Ok, here I think I will explain my understanding of your point.  If we
treat each select statement and each update statements as a separate
program, i.e. we can't save records from one select statement to a
later one (and I think in the relational algebra one cannot), then
each single (individual) select statement or update statement is a
functional program.  That is we can view a single select statement as
doing a fold over some incoming data, but it cannot change the data.
Similarly, we can view an update statement as creating a new copy of
the table with mutations in it, but the update statement can only
refer to the original immutable table that was input.  (Well, that's
atleast how I recall the relational algebra.)  It is the second point,
about the semantics of the update statement which is important.  There
are no aliasing issues because in the relational algebra one cannot
refer to the mutated values in an update statement, every reference in
an update statement is refering to the unmutated input (again, that's
my recollection).  (Note, if my recollection is off, and one can refer
to the mutated records in an update statement, perhaps they can
explain why aliasing is not an issue in it.)

Now for some minor points:

 For example, we might ask in C, if we update a[i], will
 the value of b[j] change? And we can't say, because
 we don't know if there is aliasing. But in Fortran, we
 can say that it won't, because they are definitely
 different variables. 

Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a[i] might change b[j].

This is in fact, an important point, it is in subroutines (and
subroutine like things), where we have local names for things
(bindings) that are actually names for things which may 

Re: What is a type error?

2006-07-16 Thread Dr.Ruud
Chris F Clark schreef:

 If you have a fixed database, and you do two selects which specify the
 same sets of fields to be selected and the same keys to select records
 with, one expects the two selects to return the same values.

When your fixed means read-only, or (fully) locked, then yes.

Modern databases also have modes without locks, so without special
attention (like maybe your fixed) the two selects do not necessarily
return the same set of records.


 Further, if you
 do an update, you expect certain fields of certain records to change
 (and be reflected in subsequent selects).

Doesn't your fixed block updates?


 However, therein lies the rub, if you do a select on some records, and
 then an update that changes those records, the records you have from
 the select have either changed or show outdated values.

Not necessarily outdated: values can be fixed in time for a purpose.
Example: calculating interest is done once per day, but the whole
process takes more than a day.

Some systems implemented a lock plus requery just before the update, to
check for unacceptable changes in the stored data; this to prevent
having to keep locks while waiting.


 If there is
 some way to refer to the records you first selected before the update,
 then you have an aliasing problem, but maybe one can't do that.  One
 could also have an aliasing problem, if one were allowed to do two
 updates simultaneously, so that one update could changed records in
 the middle of the other changing the records.

Some databases allow you to travel back in time: run this query on the
data of 1 year ago. All previous values are kept behind the current
value.

-- 
Affijn, Ruud

Gewoon is een tijger.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:

  But what you descrbe is certainly *not* possible in the
  relational algebra; alas that SQL doesn't hew closer
  to it. Would you agree?

 Yup, SQL (particularly its update semantics) aren't relational semantics.
 Still, it's SQL we've been talking about.

   Would you also agree that
  if a language *did* adhere to relation semantics,
  then relation elements would *not* have identity?
  (Under such a language, one could not have two
  equal records, for example.)

 Any identity that one could define within relational semantics would be
 just equality, so no, it wouldn't be very helpful (unless as part of a
 greater framework).
 However, that's not really a surprise. Aliasing becomes relevant (even
 observable) only when there's updates, and relational algebra doesn't
 have updates.

Good point. Perhaps I should have said relational algebra +
variables with assignment. It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete. I think I see what you mean about the restricted
update operators working on identity.


  Okay. At this point, though, the term aliasing has become extremely
  general. I believe i+1+1 is an alias for i+2 under this definition.
  No, i+1+1 isn't an alias in itself. It's an expression - to be an
  alias, it would have to be a reference to something.
 
  However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
  important - 1+1 is replacable by 2 in every context, so this is
  essentially the same as saying a[i+2] is an alias of a[i+2], which is
  vacuously true.
 
  To me, the SQL with the where clause is like i+2, not like a[2].

 I'd say WHERE is like [i+2]: neither is valid on its own, it's the
 selector part of a reference into a table.

Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)


  However the situation in SQL strikes me as being different.
  If one is speaking of base tables, and one has two queries,
  one has everything one needs to know simply looking at
  the queries. If views are involved, one must further examine
  the definition of the view.

 Sure. Usually, SQL itself is enough abstract that no additional
 abstraction happens; so even if you have aliasing, you usually know
 about it.
 I.e. when compared to what you were saying about Java: It is possible
 to have multiple references to a single Java mutable object and
 not know it., the and not know it part is usually missing.

Yes!


 The picture changes as soon as the SQL statements get hidden behind
 layers of abstraction, say result sets and prepared query handles.

 ... Um... let me also restate the preconditions for aliasing become a
 problem. You need three elements for that:
 1) Identities that can be stored in multiple places.
 2) Updates.
 3) Abstraction (of updatable entities).

 Without identities, there cannot be aliasing.
 Without updates, the operation that makes aliasing problematic is excluded.
 Without abstraction, you may have aliasing but can clearly identify it,
 and don't have a problem.

Aha! That description works very well for me.


 I admit that identity cannot be reliably established in SQL. The
 identity of a record isn't available (unless via OID extensions).
 However, it is there and it can have effect.

Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

Thanks,

Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Marshall
Chris F Clark wrote:
 Marshall [EMAIL PROTECTED] wrote:
  In general, I feel that records are not the right conceptual
  level to think about.

 Unfortunately, they are the right level.  Actually,the right level
 might even be lower, the fields within a record, but that's moving
 even farther away from the direction you wish to go.  The right level
 is the lowest level at which the programmer can see and manipulate
 values.

But how is this not always the bit?



  I do not see that they have
  any identity outside of their value. We can uniquely identify
  any particular record via a key, but a table may have more
  than one key, and an update may change the values of one
  key but not another. So it is not possible in general to
  definitely and uniquely assign a mapping from each record
  of a table after an update to each record of the table before
  the update, and if you can't do that, then where
  is the record identity?

 I don't know if your point of having two keys within one table amkes a
 difference.  If one can uniquely identify a record, then it has
 identity.  The fact that there is another mapping where one may not be
 able to uniquely identify the record is not necessarily relevant.

The issue with two keys is that the keys may not *agree* on the
mapping before vs. after the update.


  For example, we might ask in C, if we update a[i], will
  the value of b[j] change? And we can't say, because
  we don't know if there is aliasing. But in Fortran, we
  can say that it won't, because they are definitely
  different variables.

 Unfortunately, your statement about FORTRAN is not true, if a and b
 are parameters to a subroutine (or members of a common block, or
 equivalenced, or ...), then an update of a[i] might change b[j].

Ah, common blocks. How that takes me back!

But your point is a good one; I oversimplified.


 Marshall again:
  Fair enough. I could only remember three, but I was sure someone
  else could name more. :-)

 There actual are some more, but very rare, for example there was call-
 by-text-string, which is sort of like call-by-name, but allows a
 parameter to reach into the calling routine and mess with local
 variables therein.  Most of the rare ones have really terrible
 semantics, and are perhaps best forgotten except to keep future
 implementors from choosing them.  For example, call-by-text-string is
 really easy to implement in a naive interpreter that reparses each
 statement at every invocation, but but it exposes the implementation
 properties so blatantly that writing a different interpretor is nearly
 impossible.

 If I recall correctly, there are also some call-by- methods that take
 into account networks and addressing space issues--call-by-reference
 doesn't work if one cannot see the thing being referenced.  Of coruse,
 one must then ask whether one considers remote-procedure-call and/or
 message-passing to be the same thing as a local call.

 One last nit, Call-by-value is actually call-by-copy-in-copy-out when
 generalized.

 There are some principles that one can use to organize the parameter
 passing methods.  However, the space is much richer than people
 commonly know about (and that is actually a good thing, that most
 people aren't aware of the richness, simplicity is good).


Fair enough.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Chris Smith
We Chris's stick together, as always.

Marshall [EMAIL PROTECTED] wrote:
  Unfortunately, they are the right level.  Actually,the right level
  might even be lower, the fields within a record, but that's moving
  even farther away from the direction you wish to go.  The right level
  is the lowest level at which the programmer can see and manipulate
  values.
 
 But how is this not always the bit?

First of all, we should be consistent in speaking about the logical 
language semantics, which may not include bits anyway.

That said, if bits were the primitive concept of data in a language, 
then we'd be able to get away with talking about higher-level concepts 
because we agree to always manipulate a larger structure (a byte, for 
example) as a pure value.  If we were using bitwise operators, then we 
wouldn't be able to get away with it, for example.  That said, it's also 
quite possible to consider aliasing on higher levels as well; it's just 
not possible to point out the lack of aliasing for higher levels of 
abstraction, and thus conclude that no aliasing exists.  Aliasing is 
still possible for entities within those layers of abstraction.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
 Is it possible you're being distracted by the syntax? WHERE is a
 binary operation taking a relation and a filter function. I don't
 think of filters as being like array indexing; do they appear
 analogous to you? (Always a difficult question because often
 times A and B share some qualities and not others, and
 what does one call that?)

This is definitely a good question.  Nevertheless, I think we're fooling 
ourselves if we decide that we've made progress by defining part of the 
problem out of existence.  The original problem was that when there are 
several ways of getting access to something that is identical (rather 
than just equal), this causes problems for invariant checking.  These 
several ways of accessing data are being called 'aliasing' -- I have no 
comment on whether that's standard usage or not, because I don't know.  
I suspect that aliasing is normally used for something closer to the 
physical level.  The point is that whatever aliasing really means, 
what we're discussing is having two paths by which one may access 
identical data.

So there are infinitely complex ways in which this can occur.  I can 
have pointers that are aliased.  I can have integers, which by 
themselves are just plain values, but can alias each other as indices 
into an array.  I can have strings that do the same thing for an 
associative array.  I can, of course, have arbitrarily more complex data 
structures with exactly the same issue.  I can also have two different 
WHERE clauses, which when applied to the same relation yield exactly the 
same set of tuples.  The problem arises when code is written to update 
some value (or set of tuples, in the SQL case, since definitions differ 
there) using one pointer/int/string/WHERE clause/etc, and at the same 
time an invariant was written using the other pointer/int/WHERE/etc, the 
result is that either of:

a) The compiler has to be able to prove that the two are not aliased

b) The compiler has to re-evaluate the invariant from scratch when this 
operation is performed.

(Yeah, there's actually a whole continuum between a and b, based on more 
limited abilities of the compiler to prove things.  I'm oversimplifying, 
and I think it's rather benign in this case.)

So in this sense, a WHERE clause is a binary operation in exactly the 
same way that an array indexing expression is a binary operation, and 
both are capable of aliasing in at least this logical sense.

Now it's certainly true that languages with unrestricted pointers are 
less successful at limiting the scope of re-evaluation of invariants.  
In the array or SQL relation cases, there's a limited set of value/tuple 
modifications that might cause the invariant to need rechecking 
(specifically those that point to the same array, or to the relation or 
one of its views).  A language with unrestricted untyped pointers, if it 
doesn't get lucky with the capture analysis, may have to re-evaluate all 
invariants in the application on any given assignment.  Nevertheless, 
the re-evaluation of the invariant still needs to be done.

-- 
Chris Smith - Lead Software Developer /Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Joachim Durchholz
Marshall schrieb:
 
 Good point. Perhaps I should have said relational algebra +
 variables with assignment. It is interesting to consider
 assignment vs. the more restricted update operators: insert,
 update, delete.

Actually I see it the other way round: assignment is strictly less 
powerful than DML since it doesn't allow creating or destroying 
variables, while UPDATE does cover assignment to fields.
(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE, 
at which point there is not much of a difference.)

 Okay. At this point, though, the term aliasing has become extremely
 general. I believe i+1+1 is an alias for i+2 under this definition.
 No, i+1+1 isn't an alias in itself. It's an expression - to be an
 alias, it would have to be a reference to something.

 However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
 important - 1+1 is replacable by 2 in every context, so this is
 essentially the same as saying a[i+2] is an alias of a[i+2], which is
 vacuously true.
 
 To me, the SQL with the where clause is like i+2, not like a[2].
 I'd say WHERE is like [i+2]: neither is valid on its own, it's the
 selector part of a reference into a table.
 
 Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)

  WHERE is a
 binary operation taking a relation and a filter function. I don't
 think of filters as being like array indexing; do they appear
 analogous to you?

Yes.

Filters are just like array indexing: both select a subset of variables 
from a collection. In SQL, you select a subset of a table, in a 
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of 
filtering you can apply, while array indexing allows filtering just by 
ordinal position. However, the relevant point is that both select things 
that can be updated.)

 I admit that identity cannot be reliably established in SQL. The
 identity of a record isn't available (unless via OID extensions).
 However, it is there and it can have effect.
 
 Yes, I think I see what you mean now. This is in part a consequence
 of the restricted update operators.

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well, 
it's unavailable in standard SQL.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
 
  Good point. Perhaps I should have said relational algebra +
  variables with assignment. It is interesting to consider
  assignment vs. the more restricted update operators: insert,
  update, delete.

 Actually I see it the other way round: assignment is strictly less
 powerful than DML since it doesn't allow creating or destroying
 variables, while UPDATE does cover assignment to fields.

Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete. The reverse is not true. I
would consider that a solid demonstration of which
one is more powerful. Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.


 (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
 at which point there is not much of a difference.)

I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.) (Assignment
can also be expressed in terms of insert and delete.) I
don't know what new would be in a value-semantics, relational
world. So I would say it's assignment vs. INSERT+UPDATE+DELETE,
but perhaps I'm not understanding what you mean.

Assignment by itself is complete. Insert and delete together
are complete.


  Okay. At this point, though, the term aliasing has become extremely
  general. I believe i+1+1 is an alias for i+2 under this definition.
  No, i+1+1 isn't an alias in itself. It's an expression - to be an
  alias, it would have to be a reference to something.
 
  However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
  important - 1+1 is replacable by 2 in every context, so this is
  essentially the same as saying a[i+2] is an alias of a[i+2], which is
  vacuously true.
  
  To me, the SQL with the where clause is like i+2, not like a[2].
  I'd say WHERE is like [i+2]: neither is valid on its own, it's the
  selector part of a reference into a table.
 
  Is it possible you're being distracted by the syntax?

 I think that's very, very unlikely ;-)

Well, I just thought I'd ask. :-)


   WHERE is a
  binary operation taking a relation and a filter function. I don't
  think of filters as being like array indexing; do they appear
  analogous to you?

 Yes.

 Filters are just like array indexing: both select a subset of variables
 from a collection.

I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it. One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the address of operator is.
Note you can't update the result of a filter.


 In SQL, you select a subset of a table, in a
 programming language, you select a subset of an array.

 (The SQL selection mechanism is far more flexible in the kinds of
 filtering you can apply, while array indexing allows filtering just by
 ordinal position. However, the relevant point is that both select things
 that can be updated.)

When you have been saying select things that can be updated
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value. However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated. Which one do you mean? (Or is there a third
possibility?)


  I admit that identity cannot be reliably established in SQL. The
  identity of a record isn't available (unless via OID extensions).
  However, it is there and it can have effect.
 
  Yes, I think I see what you mean now. This is in part a consequence
  of the restricted update operators.

 I don't think so. You can update SQL records any way you want.
 The unavailability of a record identity is due to the fact that, well,
 it's unavailable in standard SQL.

I disagree. The concept of record-identity is quite tenuous, and
depends on specifics of SQL semantics. (In fact I'm not entirely
convinced it's definitely there even in SQL.) It certainly does not
hold in, say, set theory. Set elements do not have identity; they
only have value. If we have a set-semantics language that
supports set variables with assignment, we still do not have
element-identity.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Chris Smith
David Hopwood [EMAIL PROTECTED] wrote:
 Chris Smith wrote:
  If checked by execution, yes.  In which case, I am trying to get my head 
  around how it's any more true to say that functional languages are 
  compilable postconditions than to say the same of imperative languages.
 
 A postcondition must, by definition [*], be a (pure) assertion about the
 values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original 
statement.  I had understood it to mean that programs written in pure 
functional languages carry no additional information except for their 
postconditions.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Chris Smith
Darren New [EMAIL PROTECTED] wrote:
 I'm not sure what linear or uniqueness typing is. It's typestate, and if 
 I remember correctly the papers I read 10 years ago, the folks at 
 TJWatson that invented Hermes also invented the concept of typestate. 
 They at least claim to have coined the term.

Coining the term is one thing, but I feel pretty confident that the idea 
was not invented in 1986 with the Hermes language, but rather far 
earlier.  Perhaps they may have invented the concept of considering it 
any different from other applications of types, though.  I still have 
trouble delineating how to consider typestate different from ordinary 
types in formal terms, unless I make the formal model complex enough to 
include some kind of programmer-defined identifiers such as variables.  
The distinction is not at all relevant to common type system models 
based on the lambda calculus.

While acknowledging, on the one hand, that the word typestate is used 
to describe this, I also object that types have *always* been assigned 
to expressions in differing type environments.  Otherwise, it would be 
impossible to assign types to lambda abstractions in the simply typed 
lambda calculus, the simplest of all generally studied type systems.  
What is being named here is the overcoming of a limitation that 
programming language designers imposed upon themselves, whether from not 
understanding the theoretical research or not believing it important, I 
don't know.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Zoltan Somogyi
Andreas Rossberg [EMAIL PROTECTED] writes:
Uh, aliasing all over the place! Actually, I think that logic 
programming, usually based on deep unification, brings by far the worst 
incarnation of aliasing issues to the table.

I agree that deep unification, as implemented in Prolog, brings with it
lots of aliasing problems. However, these problems have been recognized
from the seventies, and people have tried to solve them. There have
been a huge variety of mode systems added to various dialects of Prolog
over the years, which each attempt to tackle (various parts of) the aliasing
problem, none of them fully successfully in my view, since their designers
usually actually *wanted* to retain at least *some* of the expressive power
of the logic variable.

Some non-Prolog logic languages have departed from this approach, the earliest
being the Relational Language. To tie this message to another thread, the
Relational Language had a strict mode system that is as identical as possible
to the notion of typestate in NIL and Hermes given the limitations of
comparing declarative apples with imperative oranges, but significantly
earlier, 1979 vs 1986 IIRC.

Marshall wrote:
I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there!  And it doesn't seem to suffer from the
lack.

Of course, the main logic programming language today that disallows the use
of unification for arbitrary aliasing is Mercury. It enforces this through
a strong mode system. Our motivation for the design of this mode system
was precisely to eliminate the problems Andreas identified above. However
it has also turned out to be an excellent foundation for Mercury's notion of
unique modes, its equivalent of uniqueness types, which Mercury uses to
express I/O.

Zoltan Somogyi [EMAIL PROTECTED] http://www.cs.mu.oz.au/~zs/
Department of Computer Science and Software Engineering, Univ. of Melbourne
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-15 Thread Chris F Clark
Joachim Durchholz wrote:
 
  You can have aliasing without pointers; e.g. arrays are fully sufficient.
  If i = j, then a [i] and a [j] are aliases of the same object.

Marshall [EMAIL PROTECTED] writes:

 I am having a hard time with this very broad definition of aliasing.
 Would we also say that a[1+1] and a[2] are aliases? It seems
 to me, above, that we have only a, and with only one variable
 there can be no aliasing.

As you can see from the replies, all these things which you do not
consider aliasing are considered aliasing (by most everyone else, in
fact these are exactly what is meant by aliasing).  There is good
reason for this.  Let us explain this by looking again at the SQL
example that has been kicked around some.

SELECT * FROM persons WHERE name = John
 and
SELECT * FROM persons WHERE surname = Doe

Some set of records are shared by both select clauses.  Those records
are the aliased records.  If the set is empty there is no problem.  If
we do not update the records from one of the select clauses, we also
have no problem.  However, if we update the records for one of the
select clauses and the set of aliased records is not empty, the
records returned by the other select clause have changed.  Therefore,
everything we knew to be true about the other select clause may or
may not still be true.  It's the may not case we are worried about.

For example, in the imperative Mi-5 Q asks Moneypenny to run the
first query (name = John) to get a list of agents for infiltrating
Spectre.  In another department, they're doing sexual reassignments
and updating the database with the second query (surname = Doe) and
making the name become Jane (along with other changes to make the
transformation correct).  So, when Q select John Doe from the first
list and sends that agent on a mission Q is really sending Jane Doe
and Spectre realizes that the agent is one and kills her.  Not a happy
result.

In the functional Mi-5, the sexual reassignment group, makes clones
of the agents reassigned, so that there is now both a John Doe and a
Jane Doe.  Of course, the payroll is a little more bloated, so that
when Q says John Doe isn't needed for any further assignments, the
Garbage Collection department does a reduction in force and gets rid
of John Doe.  The good news is that they didn't send Jane Doe to her
death.

The problem with aliasing, we want the things we knew true in one part
of our program, i.e. the Johns on Q's list are still Johns and not
Janes, to be true after we've run another part of our program.  If the
second part of our program can change what was true after the first
part of our program, we can't depend on the results from the first
part of our program, i.e. Q can send Jane Doe into certain death.

The problem is compounded by the complexities of our programs.  When Q
selected his list, he didn't know of the department doing the
reassignments (and couldn't know because they were part of a
top-secret project).  So, since bureaucracies grow to meet the
increasing needs of the bureaucracy, often the solution is increased
complexity, more regulations and paperwork to fill out, semaphores and
locks to hold on critical data, read-only data, status requests, etc.
All to keep Q from killing Jane by accident.  Sometimes they work. the
reassignment department has to wait 30-days for the clearance to
perform their operation in which time John Doe completes the
infiltration of Spectre saves the world from destruction and is ready
for his next assignment.

The key point is that each record (and even each field in each record)
if it can be known by two names, is an alias.  It is not sufficient to
talk about whole variables as not being aliased if there is some way
to refer to some part of the variable and change that part of the
variable.  Thus, a[1+1] is an alias for a[2] if you can have one part
of the code talking about one and another part of the code talking
about the other.

To put it one still final way, consider the following code;
assert(sorted(array a));
a[1] = 2;
assert(sorted(array a)); // is this still true?

Because a[1] is an alias of array a, you cannot be certain that the
2nd assert will not fire (fail) without additional analysis.

assert(sorted(array a));
assert(a[0] = 2);
assert(2 = a[2]);
a[1] = 2;
assert(sorted(array a)); // this is still true!

-Chris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-15 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:

  In some cases, you need an additional level of conceptual indirection -
  instead of *doing* the updates, you write a function that *describes* them.
 
  But then what do you do with that function?

 I pass it to an engine that's imperative.

 However, that engine doesn't need to do aliasing anymore.

 In other words, I'm separating mutability and aliasing, so that they
 can't combine and explode.

This would seem to be identical to what I am proposing.


   Let's say I have an
  employee database. John Smith just got hired on 1/1/2006 with
  a salary of $10,000. I need to record this fact somewhere. How
  do I do that without variables? Current-employees is a variable.
  Even if I have the space to keep all historical data, so I'm not
  deleting anything, I still have to have a variable for the latest
  version of the accumulated data. I can solve this without
  pointers, but I can't solve it without variables.

 As I said elsewhere, the record has an identity even though it isn't
 explicit in SQL.

H. What can this mean?

In general, I feel that records are not the right conceptual
level to think about.

In any event, I am not sure what you mean by non-explicit
identity. I would say, records in SQL have value, and their
identity is exactly their value. I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

(The situation is even more dire when one considers that
SQL actually uses bag semantics and not set semantics.)


[...]

  I should like to learn more about these. I have some vague
  perception of the existence of linear logic, but not much
  else. However, I also already have an excellent solution
  to the pointer aliasing problem, so I'm less motivated.

 Pointers are just the most obvious form of aliasing. As I said
 elsewhere, there are others: array indexing, by-reference parameters, or
 even WHERE clauses in SQL statements. In fact, anything that selects an
 updatable piece of data from a collection is an alias, and incurs the
 same problems as pointers, though they may come in different disguises.

Okay. At this point, though, the term aliasing has become extremely
general. I believe i+1+1 is an alias for i+2 under this definition.
That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.


  Actually SQL has references - they are called primary keys, but they
  are references nevertheless.
 
  I strongly object; this is quite incorrect. I grant you that from the
  50,000 foot level they appear identical, but they are not. To
  qualify as a reference, there need to be reference and dereference
  operations on the reference datatype; there is no such operation
  is SQL.
 
  Would you say the relational algebra has references?

 Yes.

  Or, consider the classic prolog ancestor query. Let's say we're
  setting up as follows
 
  father(bob, joe).
  father(joe, john).
 
  Is joe a reference, here? After all, when we do the ancestor
  query for john, we'll see his father is joe and then use that to
  find joe's father. Keys in SQL are isomorphic to joe in the
  above prolog.

 Agreed. They are both references ;-P

Again, by generalizing the term this far, I am concerned with a
loss of precision. If joe in the prolog is a references, then
reference is just a term for data that is being used in a
certain way. The conection with a specfic address space
has been lost in favor of the full domain of the datatype.


  (Some SQL dialects also offer synthetic
  ID fields that are guaranteed to remain stable over the lifetime of a
  record.
 
  Primary keys are updatable; there is nothing special about them.

 Right - I was wrong with identifying keys and identities. In fact the
 identity of an SQL record cannot be directly obtained (unless via those
 ID fields).
 The records still have identities. It's possible to have two WHERE
 clauses that refer to the same record, and if you update the record
 using one WHERE clause, the record returned by the other WHERE clause
 will have changed, too.

Is this any different from saying that an expression that includes
a variable will produce a different value if the variable changes?

In other words, i+1 will return a different value if we update i.

It seems odd to me to suggest that i+1 has identity. I can see
that i has identity, but I would say that i+1 has only value.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.


  With a repeatable read isolation level, you actually return to a
  declarative view of the database: 

Re: What is a type error?

2006-07-15 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 As I said elsewhere, the record has an identity even though it isn't
 explicit in SQL.
 
 H. What can this mean?
 
 In general, I feel that records are not the right conceptual
 level to think about.

They are, when it comes to aliasing of mutable data. I think it's 
justified by the fact that aliased mutable data has a galling tendency 
to break abstraction barriers. (More on this on request.)

 In any event, I am not sure what you mean by non-explicit
 identity.

The identity isn't visible from inside SQL. (Unless there's an OID 
facility available, which *is* an explicit identity.)

  I would say, records in SQL have value, and their
 identity is exactly their value.

Definitely not. You can have two equal records and update just one of 
them, yielding non-equal records; by my definition (and by intuition), 
this means that the records were equal but not identical.

  I do not see that they have
 any identity outside of their value. We can uniquely identify
 any particular record via a key, but a table may have more
 than one key, and an update may change the values of one
 key but not another. So it is not possible in general to
 definitely and uniquely assign a mapping from each record
 of a table after an update to each record of the table before
 the update, and if you can't do that, then where
 is the record identity?

Such a mapping is indeed possible. Simply extend the table with a new 
column, number the columns consecutively, and identify the records via 
that column.

But even if you don't do that, there's still identity. It is irrelevant 
whether the programs can directly read the value of the identity field; 
the adverse effects happen because updates are in-place. (If every 
update generated a new record, then we'd indeed have no identity.)

 Okay. At this point, though, the term aliasing has become extremely
 general. I believe i+1+1 is an alias for i+2 under this definition.

No, i+1+1 isn't an alias in itself. It's an expression - to be an 
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly 
important - 1+1 is replacable by 2 in every context, so this is 
essentially the same as saying a[i+2] is an alias of a[i+2], which is 
vacuously true.

There's another aspect here. If two expressions are always aliases to 
the same mutable, that's usually easy to determine; this kind of 
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional 
aliasing. I.e. a[i] and a[j] may or may not be aliases of each other, 
depending on the current value of i and j, and *that* is a problem 
because the number of code paths to be tested doubles. It's even more of 
a problem because testing with random data will usually not uncover the 
case where the aliasing actually happens; you have to go around and 
construct test cases specifically for the code paths that have aliasing. 
Given that references may cross abstraction barriers (actually that's 
often the purpose of constructing a reference in the first place), this 
means you have to look for your test cases at multiple levels of 
software abstraction, and *that* is really, really bad.

 That is so general that I am concerned it has lost its ability to
 identify problems specific to pointers.

If the reference to pointers above means references, then I don't 
know about any pointer problems that cannot be reproduced, in one form 
or the other, in any of the other aliasing mechanisms.

 Again, by generalizing the term this far, I am concerned with a
 loss of precision. If joe in the prolog is a references, then
 reference is just a term for data that is being used in a
 certain way. The conection with a specfic address space
 has been lost in favor of the full domain of the datatype.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms, 
so I welcome this opportunity to get rid of a too-concrete model.

 The records still have identities. It's possible to have two WHERE
 clauses that refer to the same record, and if you update the record
 using one WHERE clause, the record returned by the other WHERE clause
 will have changed, too.
 
 Is this any different from saying that an expression that includes
 a variable will produce a different value if the variable changes?

Yes.
Note that the WHERE clause properly includes array indexing (just set up 
a table that has continuous numeric primary key, and a single other column).

I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating 
i, I'm talking about how a[i] may be an alias of a[j].

 It seems odd to me to suggest that i+1 has identity.

It doesn't (unless it's passed around as a closure, but that's 
irrelevant to this discussion).
i does have identity. a[i] does have identity. a[i+1] does have 
identity.
Let me say that for 

Re: What is a type error?

2006-07-15 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  Joachim Durchholz wrote:
  As I said elsewhere, the record has an identity even though it isn't
  explicit in SQL.
 
  H. What can this mean?
 
  In general, I feel that records are not the right conceptual
  level to think about.

 They are, when it comes to aliasing of mutable data. I think it's
 justified by the fact that aliased mutable data has a galling tendency
 to break abstraction barriers. (More on this on request.)

I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.


  In any event, I am not sure what you mean by non-explicit
  identity.

 The identity isn't visible from inside SQL. (Unless there's an OID
 facility available, which *is* an explicit identity.)

Agreed about OIDs.


   I would say, records in SQL have value, and their
  identity is exactly their value.

 Definitely not. You can have two equal records and update just one of
 them, yielding non-equal records; by my definition (and by intuition),
 this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]


   I do not see that they have
  any identity outside of their value. We can uniquely identify
  any particular record via a key, but a table may have more
  than one key, and an update may change the values of one
  key but not another. So it is not possible in general to
  definitely and uniquely assign a mapping from each record
  of a table after an update to each record of the table before
  the update, and if you can't do that, then where
  is the record identity?

 Such a mapping is indeed possible. Simply extend the table with a new
 column, number the columns consecutively, and identify the records via
 that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.


 But even if you don't do that, there's still identity. It is irrelevant
 whether the programs can directly read the value of the identity field;
 the adverse effects happen because updates are in-place. (If every
 update generated a new record, then we'd indeed have no identity.)

  Okay. At this point, though, the term aliasing has become extremely
  general. I believe i+1+1 is an alias for i+2 under this definition.

 No, i+1+1 isn't an alias in itself. It's an expression - to be an
 alias, it would have to be a reference to something.

 However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
 important - 1+1 is replacable by 2 in every context, so this is
 essentially the same as saying a[i+2] is an alias of a[i+2], which is
 vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)


 There's another aspect here. If two expressions are always aliases to
 the same mutable, that's usually easy to determine; this kind of
 aliasing is usually not much of a problem.
 What's more of a problem are those cases where there's occasional
 aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
 depending on the current value of i and j, and *that* is a problem
 because the number of code paths to be tested doubles. It's even more of
 a problem because testing with random data will 

Re: What is a type error?

2006-07-14 Thread George Neuner
On 13 Jul 2006 08:45:49 -0700, Marshall [EMAIL PROTECTED]
wrote:

On the other hand, there is no problem domain for which pointers
are a requirement. I agree they are deucedly convenient, though.


I would argue that pointers/references _are_ a requirement for I/O.  I
know of no workable method for interpreting raw bits as meaningful
data other than to overlay a typed template upon them.

Categorically disallowing address manipulation functionally cripples
the language because an important class of programs (system programs)
cannot be written.

Of course, languages can go overboard the other way too.  IMO, C did
not need to provide address arithmetic at the language level,
reinterpretable references and array indexing would have sufficed for
any use.  Modula 3's type safe view is an example of getting it right.

It is quite reasonable to say I don't write _ so I don't need
[whatever language feature enables writing it].  It is important,
however, to be aware of the limitation and make your choice
deliberately.


George
--
for email reply remove / from address
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Marshall
George Neuner wrote:
 On 13 Jul 2006 08:45:49 -0700, Marshall [EMAIL PROTECTED]
 wrote:
 
 On the other hand, there is no problem domain for which pointers
 are a requirement. I agree they are deucedly convenient, though.
 

 I would argue that pointers/references _are_ a requirement for I/O.  I
 know of no workable method for interpreting raw bits as meaningful
 data other than to overlay a typed template upon them.

I think I know what you mean. I agree that pointers are necessary
for, e.g., device drivers. So I have to weaken my earlier statement.


 Categorically disallowing address manipulation functionally cripples
 the language because an important class of programs (system programs)
 cannot be written.

That's fair, although I could argue how important systems programming
is these days. (And C/C++ are cock-of-the-walk there anyway.)


 Of course, languages can go overboard the other way too.  IMO, C did
 not need to provide address arithmetic at the language level,
 reinterpretable references and array indexing would have sufficed for
 any use.  Modula 3's type safe view is an example of getting it right.

 It is quite reasonable to say I don't write _ so I don't need
 [whatever language feature enables writing it].  It is important,
 however, to be aware of the limitation and make your choice
 deliberately.

Agreed.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Marshall
Chris Smith wrote:
 Marshall [EMAIL PROTECTED] wrote:
   What you are asking for is some subset of identity, and I've not yet
   succeeded in understanding exactly what it is or what its limits are...
   except that so far, it seems to have everything to do with pointers or
   aliasing.
 
  Perhaps it is specifically first-class identity, rather than identity
  per se.

 As in: the value of one variable can (be/refer to/depend on) the
 identity of another variable?  I can certainly see this as as
 reasonable concept to consider.

Yes, I think it's important. We've become so accustomed, in
modern languages, to considering variables as first-class,
that we cannot see any more that many of the problems
attributed to variables (such as aliasing) are actually
problems only with *first-class* variables. The historical
reality that non-first-class variables are associated with
more antique languages (Fortran/pre-OO) makes us shy
away from even considering removing first-class status
from variables. But I think it's an important point in the
design space (although I certainly respect Andreas'
reluctance; to paraphrase, first class or bust. :-) )

After all, what are the alternatives? Purely-functional
languages remove themselves from a large class of
problems that I consider important: data management.
I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack. I really don't know enough about uniqueness types
to evaluate them. But again, I already have my proposed
solution: variables but no first-class variables. (And yes,
I'm acutely aware of how problematic that makes
closures. :-()


   I'm not yet convinced that this is any different from a language with
   standard pointer aliasing.  If I have two tables A and B, and a foreign
   key from A into B, then I run into the same problems with enforcing
   constraints that I would see with a pointer model... when I update a
   relation, I need to potentially check every other relation that contains
   a foreign key into it, in order to ensure that its constraints are not
   violated by that constraint.  That's the same thing that is being
   pointed out as a negative consequence of aliasing in other languages.
 
  No, that's not the same thing. What you are describing here is
  not an aliasing issue, but simply the consequences of allowing
  constraints to mention more than one variable.

  A foreign key constraint is a multi-variable constraint.
  Specifically, a foreign key from table A, attribute a
  to table B, attribute b is the constraint:
 
  forall a in A, exists b in B such that a = b.
 
  Note that two variables, A and B, are referenced in
  the constraint.

 There's confusion here coming from different usages of the word
 variable.  Let us talk instead of values, and of the abstract structures
 that gives them meaning.

I don't think that will work. We cannot discuss constraints, nor
aliasing, without bringing variables in to the picture. (Aliasing
may exist without variables but it is a non-problem then.)


 In both cases (invariants in a hypothetical
 imperative language, and in a relational database), the constraints make
 reference to these structures of values (relations, for example, or
 various kinds of data structures), and not to the individual values or
 objects that they contain.

I am not certain what this means.


 In both cases, the problem is not that we
 don't know what structures to check to verify the invariant; rather,
 it's that we have to check ALL of the values in that structure.

But not all values in all structures, as you do in the presence
of aliasing.


 As someone pointed out, this is to be expected in a world of mutable
 things with identity that are globally locatable.  It is simple fact
 that if I tell you I spoke to Barbara's husband, you may need to trace
 down who Barbara's husband is before you could discover that, for
 example, maybe I actually spoke to your boss, or to your nephew's best-
 friend's father.  If databases are capable of modeling these kinds of
 relationships (and of course they are), then they are as susceptible to
 aliasing -- in a logical sense that avoids mention of pointer -- as
 anyone else.

I don't see that they're the same thing. Similar is some ways, yes,
but not the same.

As I said much earlier, the terminology is problematic, and it may
be that we (or just I) simply lack the precision of terms necessary
for the conversation.

 
Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 What about my example of SQL? Mutation, no pointers, no aliasing.
 Yet: useful.

Sorry, but SQL does have aliasing.

E.g. if you have records that have name=John, surname=Doe, the 
statements
   SELECT * FROM persons WHERE name = John
and
   SELECT * FROM persons WHERE name = Doe
are aliases of each other.

The alias is actually in the WHERE clause. And this *can* get you into 
trouble if you have something that does
   UPDATE ... WHERE name = John
and
   UPDATE ... WHERE surname = Doe
e.g. doing something with the Johns, then updating the names of all 
Does, and finally restoring the Johns (but not recognizing that changing 
the names of all Does might have changed your set of Johns).

Conceptually, this is just the same as having two different access path 
to the same memory cell. Or accessing the same global variable through a 
call-by-reference parameter and via its global name.

BTW with views, you get not just aliasing but what makes aliasing really 
dangerous. Without views, you can simply survey all the queries that you 
are working with and lexically compare table and field names to see 
whether there's aliasing. With views, the names that you see in a 
lexical scope are not enough to determine aliasing.
E.g. if you use a view that builds upon the set of Johns but aren't 
aware of that (possibly due to abstraction barriers), and you change the 
name field of all Does, then you're altering the view without a chance 
to locally catch the bug. That's just as bad as with any pointer 
aliasing problem.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Andreas Rossberg
Darren New wrote:
 Andreas Rossberg wrote:
 
 Yes, technically you are right. But this makes a pretty weak notion of 
 mutability. All stateful data structures had to stay within their 
 lexical scope, and could never be passed to a function. 
 
 Not really. The way Hermes handles this is with destructive assignment. 
 Each variable holds a value, and you (almost) cannot have multiple 
 variables referring to the same value.

OK, this is interesting. I don't know Hermes, is this sort of like a 
dynamically checked equivalent of linear or uniqueness typing?

 If you want to assign Y to X, you use
X := Y
 after which Y is considered to be uninitialized. If you want X and Y to 
 have the same value, you use
X := copy of Y
 after which X and Y have the same value but are otherwise unrelated, and 
 changes to one don't affect the other.

Mh, but if I understand correctly, this seems to require performing a 
deep copy - which is well-known to be problematic, and particularly 
breaks all kinds of abstractions. Not to mention the issue with 
uninitialized variables that I would expect occuring all over the place. 
So unless I'm misunderstanding something, this feels like trading one 
evil for an even greater one.

- Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Andreas Rossberg
Marshall wrote:
 
 After all, what are the alternatives? Purely-functional
 languages remove themselves from a large class of
 problems that I consider important: data management.

Maybe, but I have yet to see how second-class variables are really more 
adequate in dealing with it.

And note that even with second-class state you can still have aliasing 
issues - you just need mutable arrays and pass around indices. Keys in 
databases are a more general form of the same problem.

 I have explored the OO path to its bitter end and am
 convinced it is not the way. So what is left? Uniqueness
 types and logic programming, I suppose. I enjoy logic
 programming but it doesn't seem quite right. But notice:
 no pointers there!  And it doesn't seem to suffer from the
 lack.

Uh, aliasing all over the place! Actually, I think that logic 
programming, usually based on deep unification, brings by far the worst 
incarnation of aliasing issues to the table.

- Andreas

-- 
Andreas Rossberg, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 void foo() {
   int i = 0;
   int j = 0;
   j = 1;
   i = 2;
   // check value of j here. It is still 1, no matter what you filled
// in above.
   // The assignment to i cannot be made to affect the value of j.
 }
 
 Those two local primitive variables cannot be made to have the same
 identity.

Sure. To state it more clearly, they cannot be aliases.

  But you can update them, so this is an example of mutability
 without the possibility of identity.

You're being a bit sloppy with terminology here. Identity in the 
phrase above refers to two entities, in the sense of i and j cannot be 
identical.
Identity is actually a property of a single entity, namely that what 
remains constant regardless of what you do with the entity.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 By your definition, pointer and variable are synonyms. That doesn't
 seem like a good idea to me. (What if i and j end up in registers?
 I have not heard it said that registers have addresses.)

There are no registers in the JVM ;-P

More specifically, where Joe said pointer he really meant reference. 
i refers to a variable, and you really want to make sure that the first 
i refers to the same variable as the second i, so whatever is referred 
to by i is really an identity.

 Those two local primitive variables cannot be made to have the same
 identity. But you can update them, so this is an example of mutability
 without the possibility of identity.
 The identity is temporal:  You use the same variable name at two
 different times.  Do you intend for the second `i' to mean the same
 variable as the first `i'?
 
 Okay, so i and i have the same identity.

Vacuous but true.

 I suppose you could argue that the language's namespace is an
 address-space, and variable names are addresses.

Correct.

 At this point, though, you've made the concept of identity so broad
 that it is now necessarily a property of all languages that use named
 values, whether updatable or not. I think you would also have to call
 3 a void pointer by this scheme.

It's not a void pointer, it's a pointer to an integer, but you're 
correct: 3, when written in a program text, is a reference to the 
successor of the successor of the successor of zero.

If you find that silly and would like to stick with equality, then 
you're entirely correct. Natural numbers are entirely immutable by 
definition, and I'm definitely not the first one to observe that 
identity and equality are indistinguishable for immutable objects.

 Clearly there is *some* difference between a language which allows
 explicit pointers and thus aliasing and one that doesn't.

You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

(After I observed that, I found it no longer a surprise that array 
optimizations are what Fortran compiler teams sink most time into. The 
aliasing problems are *exactly* the same as those with pointers in C - 
though in practice, the Fortranistas have the advantage that the 
compiler will usually know that a [i] and b [j] cannot be aliases of 
each other, so while they have the same problems, the solutions give the 
Fortranistas more leverage.)

  What term would you use? First-class variables?

I think it's more a quasi-continuous spectrum.

On one side, we have alias-free toy languages (no arrays, no pointers, 
no call-by-reference, just to name the most common sources of aliasing).

SQL is slightly more dangerous: it does have aliases, but they are 
rarely a problem (mostly because views aren't a very common thing, and 
even if they are used, they aren't usually so abstract that they really 
hide the underlying dependencies).

Next are languages with arrays and call-by-reference. Pascal without 
pointers would be a candidate.
Here, aliasing occurs, and it can and does hide behind abstraction 
barriers, but it's not a serious problem unless the software becomes 
*really* large.

The end of the line are languages that use references to mutable data 
everywhere. All OO languages are a typical example of that.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 It's just that I know that it's viable to give up destructive updates.
 Giving up pointers is a far more massive restriction.
 
 Oddly, I feel the opposite. While it's true there are many domains
 for which purely functional programming is a fine choice, there
 are some domains for which it is insufficient. Any kind of data
 managament, for example, requires that you be able to update
 the information.

Sure, the data itself cannot be managed non-destructively.

However, the programs that operate on the data need not do destructive 
updates internally. That way, I can have pointers inside the programs 
without giving up aliasing.
(Aliasing is really, really, really important. It's the reason why 
mathematicians can handle infinite objects - they copy just the alias, 
i.e. the name or description, instead of the object itself. It's one of 
the things that make abstraction useful.)

 Right. To me the response to this clear: give up pointers. Imperative
 operations are too useful to give up; indeed they are a requirement
 for certain problems.
 I don't know any.
 In some cases, you need an additional level of conceptual indirection -
 instead of *doing* the updates, you write a function that *describes* them.
 
 But then what do you do with that function?

I pass it to an engine that's imperative.

However, that engine doesn't need to do aliasing anymore.

In other words, I'm separating mutability and aliasing, so that they 
can't combine and explode.

  Let's say I have an
 employee database. John Smith just got hired on 1/1/2006 with
 a salary of $10,000. I need to record this fact somewhere. How
 do I do that without variables? Current-employees is a variable.
 Even if I have the space to keep all historical data, so I'm not
 deleting anything, I still have to have a variable for the latest
 version of the accumulated data. I can solve this without
 pointers, but I can't solve it without variables.

As I said elsewhere, the record has an identity even though it isn't 
explicit in SQL.
(SQL's data manipulation language is generally not considered half as 
clean as the query language; I think the core of the problemsis that 
DML combines mutability and identity.)

 I should like to learn more about these. I have some vague
 perception of the existence of linear logic, but not much
 else. However, I also already have an excellent solution
 to the pointer aliasing problem, so I'm less motivated.

Pointers are just the most obvious form of aliasing. As I said 
elsewhere, there are others: array indexing, by-reference parameters, or 
even WHERE clauses in SQL statements. In fact, anything that selects an 
updatable piece of data from a collection is an alias, and incurs the 
same problems as pointers, though they may come in different disguises.

 Actually SQL has references - they are called primary keys, but they
 are references nevertheless.
 
 I strongly object; this is quite incorrect. I grant you that from the
 50,000 foot level they appear identical, but they are not. To
 qualify as a reference, there need to be reference and dereference
 operations on the reference datatype; there is no such operation
 is SQL.
 
 Would you say the relational algebra has references?

Yes.

 Or, consider the classic prolog ancestor query. Let's say we're
 setting up as follows
 
 father(bob, joe).
 father(joe, john).
 
 Is joe a reference, here? After all, when we do the ancestor
 query for john, we'll see his father is joe and then use that to
 find joe's father. Keys in SQL are isomorphic to joe in the
 above prolog.

Agreed. They are both references ;-P

 (Some SQL dialects also offer synthetic
 ID fields that are guaranteed to remain stable over the lifetime of a
 record.
 
 Primary keys are updatable; there is nothing special about them.

Right - I was wrong with identifying keys and identities. In fact the 
identity of an SQL record cannot be directly obtained (unless via those 
ID fields).
The records still have identities. It's possible to have two WHERE 
clauses that refer to the same record, and if you update the record 
using one WHERE clause, the record returned by the other WHERE clause 
will have changed, too.

 With a repeatable read isolation level, you actually return to a
 declarative view of the database: whatever you do with it, you won't see
 it until you commit the transaction. (As soon as you commit, the
 declarative peace is over and you better watch out that your data
 doesn't become inconsistent due to aliasing.)
 
 Alas, transaction isolation levels are a performance hack.
 I cannot defend them on any logical basis. (Aside: did you mean
 serializable, rather than repeatable read?)

Possibly. There are so many isolation levels that I have to look them up 
whenever I want to get the terminology 100% correct.

 The only thing that can really be done about it is not adding it
 artificially into a program. In those cases where aliasing is part of
 the 

Re: What is a type error?

2006-07-14 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  What about my example of SQL? Mutation, no pointers, no aliasing.
  Yet: useful.

 Sorry, but SQL does have aliasing.

Well. I suppose we do not have an agreed upon definition
of aliasing, so it is hard to evaluate either way. I would
propose using the same one you used for identity:

if there are two variables and modifying one also modifies
the other, then there is aliasing between them.

I avoided mentioning equality to include, for example,
having an array i that is an alias to a subset of array j.
Please feel free to critque this definition, and/or supply
an alternative.


 E.g. if you have records that have name=John, surname=Doe, the
 statements
SELECT * FROM persons WHERE name = John
 and
SELECT * FROM persons WHERE name = Doe
 are aliases of each other.

 The alias is actually in the WHERE clause.

Not by my definition, because there is only one variable here.


 And this *can* get you into
 trouble if you have something that does
UPDATE ... WHERE name = John
 and
UPDATE ... WHERE surname = Doe
 e.g. doing something with the Johns, then updating the names of all
 Does, and finally restoring the Johns (but not recognizing that changing
 the names of all Does might have changed your set of Johns).

The fact that some person might get confused about the
semantics of what they are doing does not indicate aliasing.
It is easy enough to do an analysis of your updates and
understand what will happen; this same analysis is impossible
with two arbitrary pointers, unless one has a whole-program
trace. That strikes me as a significant difference.


 Conceptually, this is just the same as having two different access path
 to the same memory cell. Or accessing the same global variable through a
 call-by-reference parameter and via its global name.

There are similarities, but they are not the same.


 BTW with views, you get not just aliasing but what makes aliasing really
 dangerous. Without views, you can simply survey all the queries that you
 are working with and lexically compare table and field names to see
 whether there's aliasing. With views, the names that you see in a
 lexical scope are not enough to determine aliasing.
 E.g. if you use a view that builds upon the set of Johns but aren't
 aware of that (possibly due to abstraction barriers), and you change the
 name field of all Does, then you're altering the view without a chance
 to locally catch the bug. That's just as bad as with any pointer
 aliasing problem.

It is certainly aliasing, but I would not call it just as bad. There
are
elaborate declarative constraint mechanisms in place, for example.
And the definition of the view itsef is a declarative, semantic
entity. Whereas a pointer is an opaque, semantics-free address.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Marshall
Andreas Rossberg wrote:
 Marshall wrote:
 
  After all, what are the alternatives? Purely-functional
  languages remove themselves from a large class of
  problems that I consider important: data management.

 Maybe, but I have yet to see how second-class variables are really more
 adequate in dealing with it.

 And note that even with second-class state you can still have aliasing
 issues - you just need mutable arrays and pass around indices. Keys in
 databases are a more general form of the same problem.

So for array a, you would claim that a[5] is an alias for
(a part of) a? That seems to stretch the idea of aliasing
to me. With these two expressions, it is obvious enough
what is going on; with two arbitrary pointers, it is not.

It seems to me the source of the problem is the opacity
of pointers, but perhaps I am missing something.


  I have explored the OO path to its bitter end and am
  convinced it is not the way. So what is left? Uniqueness
  types and logic programming, I suppose. I enjoy logic
  programming but it doesn't seem quite right. But notice:
  no pointers there!  And it doesn't seem to suffer from the
  lack.

 Uh, aliasing all over the place! Actually, I think that logic
 programming, usually based on deep unification, brings by far the worst
 incarnation of aliasing issues to the table.

Hmmm. Can you elaborate just a bit?


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  void foo() {
int i = 0;
int j = 0;
j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled
 // in above.
// The assignment to i cannot be made to affect the value of j.
  }
 
  Those two local primitive variables cannot be made to have the same
  identity.

 Sure. To state it more clearly, they cannot be aliases.

Yes.


   But you can update them, so this is an example of mutability
  without the possibility of identity.

 You're being a bit sloppy with terminology here. Identity in the
 phrase above refers to two entities, in the sense of i and j cannot be
 identical.
 Identity is actually a property of a single entity, namely that what
 remains constant regardless of what you do with the entity.

It is a fair critque. I'll try to be more precise.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Marshall
Joachim Durchholz wrote:

 You can have aliasing without pointers; e.g. arrays are fully sufficient.
 If i = j, then a [i] and a [j] are aliases of the same object.

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

A further question:

given a 32 bit integer variable x, and offsets i and j (equal as in
the above example) would you say that

   x = (1  i)
and
   x = (1  j)

are aliased expressions for setting a particular bit in x?

I am not being facetious; I am trying to understand the limits
of your definition for aliasing.


 (After I observed that, I found it no longer a surprise that array
 optimizations are what Fortran compiler teams sink most time into. The
 aliasing problems are *exactly* the same as those with pointers in C -
 though in practice, the Fortranistas have the advantage that the
 compiler will usually know that a [i] and b [j] cannot be aliases of
 each other, so while they have the same problems, the solutions give the
 Fortranistas more leverage.)

I don't understand this paragraph. On the one hand, it seems you
are saying that C and Fortran are identically burdened with the
troubles caused by aliasing, and then a moment later it seems
you are saying the situation is distinctly better with Fortran.


   What term would you use? First-class variables?

 I think it's more a quasi-continuous spectrum.

 On one side, we have alias-free toy languages (no arrays, no pointers,
 no call-by-reference, just to name the most common sources of aliasing).

 SQL is slightly more dangerous: it does have aliases, but they are
 rarely a problem (mostly because views aren't a very common thing, and
 even if they are used, they aren't usually so abstract that they really
 hide the underlying dependencies).

 Next are languages with arrays and call-by-reference. Pascal without
 pointers would be a candidate.
 Here, aliasing occurs, and it can and does hide behind abstraction
 barriers, but it's not a serious problem unless the software becomes
 *really* large.

 The end of the line are languages that use references to mutable data
 everywhere. All OO languages are a typical example of that.

Now with this, it appears you are agreeing that SQL has an advantage
vis-a-vis aliasing compared to OO languages. Yes? If so, we are
agreeing on the part I care about, and the specifics of just what
we call aliasing are not so important to me.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread rossberg
Marshall wrote:
 Andreas Rossberg wrote:
 
  And note that even with second-class state you can still have aliasing
  issues - you just need mutable arrays and pass around indices. Keys in
  databases are a more general form of the same problem.

 So for array a, you would claim that a[5] is an alias for
 (a part of) a? That seems to stretch the idea of aliasing
 to me.

Not at all, I'd say. In my book, aliasing occurs whenever you have two
different lvalues that denote the same mutable cell. I think it would
be rather artificial to restrict the definition to lvalues that happen
to be variables or dereferenced pointers.

So of course, a[i] aliases a[j] when i=j, which in fact is a well-known
problem in some array or matrix routines (e.g. copying array slices
must always consider overlapping slices as special cases).

  Uh, aliasing all over the place! Actually, I think that logic
  programming, usually based on deep unification, brings by far the worst
  incarnation of aliasing issues to the table.

 Hmmm. Can you elaborate just a bit?

Logic variables immediately become aliases when you unify them.
Afterwards, after you bind one, you also bind the other - or fail,
because both got already bound the other way round. Unfortunately,
unification also is a deep operation, that is, aliasing can be induced
into variables deeply nested into some terms that happen to get
unified, possibly without you being aware of any of this (the
unification, nor the variable containment). To avoid accidental
aliasing you essentially must keep track of all potential unifications
performed by any given predicate (including transitively, by its
subgoals), which totally runs square to all concerns of modularity and
abstraction.

I've never been much of a logic programmer myself, but our language
group has a dedicated LP and CP background, and people here have
developed very strong opinions about the adequateness of unrestricted
logic variables and particularly unification in practice. I remember
somebody calling Prolog the worst imperative language ever invented.

- Andreas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Darren New
Andreas Rossberg wrote:
 OK, this is interesting. I don't know Hermes, is this sort of like a 
 dynamically checked equivalent of linear or uniqueness typing?

I'm not sure what linear or uniqueness typing is. It's typestate, and if 
I remember correctly the papers I read 10 years ago, the folks at 
TJWatson that invented Hermes also invented the concept of typestate. 
They at least claim to have coined the term.

It's essentially a dataflow analysis that allows you to do the same 
sorts of things that don't read variables that may not yet have been 
assigned to, except that you could annotate that variables change to 
the state of uninitialized after they've already been initialized.

 Mh, but if I understand correctly, this seems to require performing a 
 deep copy - which is well-known to be problematic, and particularly 
 breaks all kinds of abstractions.

Um, no, because there are no aliases. There's only one name for any 
given value, so there's no deep copy problems. A deep copy and a 
shallow copy are the same thing. And there are types of values you can 
assign but not copy, such as callmessages (which are problematic to copy 
for the same reason a stack frame would be problematic to copy).

I believe, internally, that there were cases where the copy was deep 
and cases where it was shallow, depending on the surrounding code. 
Making a copy of a table and passing it to another process had to be a 
deep copy (given that a column could contain another table, for 
example). Making a copy of a table and using it for read-only purposes 
in the same process would likely make a shallow copy of the table. 
Iterating over a table and making changes during the iteration made a 
copy-on-write subtable, then merged it back into the original table when 
it was done the loop, since the high-level semantic definition of 
looping over a table is that you iterate over a copy of the table.

The only thing close to aliases are references to some other process's 
input ports (i.e., multiple client-side sockets connected to a 
server-side socket). If you want to share data (such as a file system or 
program library), you put the data in a table in a process, and you hand 
out client-side connections to the process. Mostly, you'd define such 
connections as accepting a data value (for the file contents) with the 
parameter being undefined upon return from the call, and the file name 
as being read-only, for example. If you wanted to store the file, you 
could just pass a pointer to its data (in the implementation). If you 
wanted a copy of it, you would either copy it and pass the pointer, or 
you'd pass the pointer with a flag indicating it's copy-on-write, or you 
could pass the pointer and have the caller copy it at some point before 
returning, depending on what the caller did with it. The semantics were 
high-level with the intent to allow the compiler lots of leeway in 
implementation, not unlike SQL.

 Not to mention the issue with 
 uninitialized variables that I would expect occuring all over the place. 

The typestate tracks this, and prevents you from using uninitialized 
variables. If you do a read (say, from a socket) and it throws an end 
of data exception, the compiler prevents you from using the buffer you 
just tried but failed to read.

Indeed, that's the very point of it all. By doing this, you can run 
untrusted code in the same address space as trusted code, and be 
assured that the compiler will prevent the untrusted code from messing 
up the trusted code. The predecessor of Hermes (NIL) was designed to let 
IBM's customers write efficient networking code and emulations and such 
that ran in IBM's routers, without the need for expensive (in 
performance or money) hardware yet with the safety that they couldn't 
screw up IBM's code and hence cause customer service problems.

 So unless I'm misunderstanding something, this feels like trading one 
 evil for an even greater one.

In truth, it was pretty annoying. But more because you wound up having 
to write extensive declarations and compile the declarations before 
compiling the code that implements them and such. That you didn't get to 
use uninitialized variables was a relatively minor thing, especially 
given that many languages nowadays complain about uninitialized 
variables, dead code, etc. But for lots of types of programs, it let you 
do all kinds of things with a good assurance that they'd work safely and 
efficiently. It was really a language for writing operating systems in, 
when you get right down to it.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joe Marshall

Marshall wrote:
 Joe Marshall wrote:
  Marshall wrote:
  
   Consider the following Java fragment:
  
   void foo() {
 int i = 0;
 int j = 0;
  
 // put any code here you want
  
 j = 1;
 i = 2;
 // check value of j here. It is still 1, no matter what you filled in
   above.
 // The assignment to i cannot be made to affect the value of j.
  
   }
 
  True, but you have hidden the pointers.  Semantically, the identifiers
  i and j refer not to integers but to locations that hold integers.  The
  assignment modifies the location.

 What the implementation looks like shouldn't affect how we speak
 of the logical model. In the above code, there are no pointers.

 By your definition, pointer and variable are synonyms. That doesn't
 seem like a good idea to me. (What if i and j end up in registers?
 I have not heard it said that registers have addresses.)

You probably won't like this, but

The example you give above is a bit ambiguous as to whether or not it
shows `true' mutation.  So what do I mean by `true' mutation?  The code
above could be transformed by alpha-renaming into an equivalent version
that does no mutation:

void foo() {
   int i = 0;
   int j = 0;

   // put any code here you want

   int jj = 1;
   int ii = 2;
   // rename all uses of i to ii and j to jj after this point.

 }

Once I've done the renaming, we're left with a pure function.  I claim
that this sort of `mutation' is uninteresting because it can be
trivially eliminated through renaming.  Since we can eliminate it
through renaming, we can use a simplified semantics that does not
involve a `store', and use a substitution model of evaluation.

The more interesting kind of mutation cannot be eliminated through
renaming.  The semantic model requires the notion of a stateful
`store', and the model is not referentially transparent: substitution
does not work.  These qualities are what make mutation problematic.  My
claim is that this kind of mutation cannot be had without some model of
pointers and references.

There is a further distinguishing characteristic:  Can I write a
program that detects the mutation (and acts differently depending on
whether there is mutation or not)?  It is unclear from your example
above whether you intended this to be possible, but certainly there is
no way for a caller of foo to tell that foo reassigns an internal
variable.  If there a procedure is extrinsically a pure function, then
there is no real meaning to any `mutation' that goes on inside; there
is always a way to turn the internal mutation into a pure function
through alpha-renaming.


 Clearly there is *some* difference between a language which allows
 explicit pointers and thus aliasing and one that doesn't. What term
 would you use? First-class variables?

My argument to you in re this statement:
 Again, I disagree: it is posible to have mutability without
 pointers/identity/objects.

is that mutability without a notion of pointers, identity, or objects
is trivially eliminated through alpha renaming and thus isn't the
`mutability' of interest.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 What about my example of SQL? Mutation, no pointers, no aliasing.
 Yet: useful.
 Sorry, but SQL does have aliasing.
 
 Well. I suppose we do not have an agreed upon definition
 of aliasing, so it is hard to evaluate either way. I would
 propose using the same one you used for identity:
 
 if there are two variables and modifying one also modifies
 the other, then there is aliasing between them.

I think that's an adequate example.
For a definition, I'd say it's a bit too narrow unless we use a fairly 
broad definition for variable. I.e. in a C context, I'd say that a-b 
is a variable, too, as would be foo(blah)-x.

 I avoided mentioning equality to include, for example,
 having an array i that is an alias to a subset of array j.

This would mean that there's aliasing between, say, a list of 
transaction records and the balance of the account (since modifying the 
list of transactions will change the balance, unless the object isn't 
properly encapsulated).
For purposes of this discussion, it's probably fair to say that's a 
form of aliasing, too, even though it's quite indirect.

 E.g. if you have records that have name=John, surname=Doe, the
 statements
SELECT * FROM persons WHERE name = John
 and
SELECT * FROM persons WHERE name = Doe
 are aliases of each other.

Argh... I made a most braindamaged, stupid mistake here. The second 
SELECT should have used the *surname* field, so it would be

 SELECT * FROM persons WHERE surname = Doe

Then, if there's a record that has name = John, surname = Doe, the 
two WHERE clauses have aliasing in the form of overlapping result sets.

 The alias is actually in the WHERE clause.
 
 Not by my definition, because there is only one variable here.

Sorry, my wording was sloppy.

I meant to say that 'in SQL, you identify records via clauses like WHERE 
name = John, i.e. WHERE clauses are a kind of identity'.

This is still not precise - the identity of an SQL record isn't 
explicitly accessible (except the nonstandard OID facility that some SQL 
engines offer). Nevertheless, they do have an identity, and there's a 
possibility of aliasing - if you change all Johns, you may also change a 
Doe.

 And this *can* get you into
 trouble if you have something that does
UPDATE ... WHERE name = John
 and
UPDATE ... WHERE surname = Doe
 e.g. doing something with the Johns, then updating the names of all
 Does, and finally restoring the Johns (but not recognizing that changing
 the names of all Does might have changed your set of Johns).
 
 The fact that some person might get confused about the
 semantics of what they are doing does not indicate aliasing.
 It is easy enough to do an analysis of your updates and
 understand what will happen; this same analysis is impossible
 with two arbitrary pointers, unless one has a whole-program
 trace. That strikes me as a significant difference.

Sure. I said that aliases in SQL aren't as bad as in other programs.

Once you get abstraction mixed in, the analysis becomes less 
straightforward. In the case of SQL, views are such an abstraction 
facility, and they indeed can obscure what you're doing and make the 
analysis more difficult. If it's just SQL we're talking about, you 
indeed have to look at the whole SQL to check whether there's a view 
that may be involved in the queries you're analysing, so the situation 
isn't *that* different from pointers - it's just not a problem because 
the amount of code is so tiny!

 Conceptually, this is just the same as having two different access path
 to the same memory cell. Or accessing the same global variable through a
 call-by-reference parameter and via its global name.
 
 There are similarities, but they are not the same.

What are the relevant differences? How does the semantics of a WHERE 
clause differ from that of a pointer, in terms of potential aliasing?

My line of argument would be this:
Pointers can be simulated using arrays, so it's no surprise that arrays 
can emulate all the aliasing of pointers.
Arrays can be simulated using WHERE (with SELECT and UPDATE), so I'd say 
that SQL can emulate all the aliasing of arrays, and hence that of pointers.

 BTW with views, you get not just aliasing but what makes aliasing really
 dangerous. Without views, you can simply survey all the queries that you
 are working with and lexically compare table and field names to see
 whether there's aliasing. With views, the names that you see in a
 lexical scope are not enough to determine aliasing.
 E.g. if you use a view that builds upon the set of Johns but aren't
 aware of that (possibly due to abstraction barriers), and you change the
 name field of all Does, then you're altering the view without a chance
 to locally catch the bug. That's just as bad as with any pointer
 aliasing problem.
 
 It is certainly aliasing, but I would not call it just as bad. There
 are
 elaborate declarative constraint mechanisms in 

Re: What is a type error?

2006-07-14 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 You can have aliasing without pointers; e.g. arrays are fully sufficient.
 If i = j, then a [i] and a [j] are aliases of the same object.
 
 I am having a hard time with this very broad definition of aliasing.
 Would we also say that a[1+1] and a[2] are aliases? It seems
 to me, above, that we have only a, and with only one variable
 there can be no aliasing.

a[1] is a variable, too. You can assign values to it.

Inside function foo when called as foo(a[1]), it may even be referenced 
with yet another name and updated in isolation; foo may not even know 
it's an array element. (This is sort-of true in C, and definitely true 
in languages that don't have pointer arithmetic. If you pass a[1] to a 
call-by-reference parameter in Pascal, the callee has no way to access 
the parameter as if it were an array element.)

 A further question:
 
 given a 32 bit integer variable x, and offsets i and j (equal as in
 the above example) would you say that
 
x = (1  i)
 and
x = (1  j)
 
 are aliased expressions for setting a particular bit in x?

Yes.

 I am not being facetious; I am trying to understand the limits
 of your definition for aliasing.

Understood and appreciated.

 (After I observed that, I found it no longer a surprise that array
 optimizations are what Fortran compiler teams sink most time into. The
 aliasing problems are *exactly* the same as those with pointers in C -
 though in practice, the Fortranistas have the advantage that the
 compiler will usually know that a [i] and b [j] cannot be aliases of
 each other, so while they have the same problems, the solutions give the
 Fortranistas more leverage.)
 
 I don't understand this paragraph. On the one hand, it seems you
 are saying that C and Fortran are identically burdened with the
 troubles caused by aliasing, and then a moment later it seems
 you are saying the situation is distinctly better with Fortran.

That's exactly the situation.
Aliasing is present in both language, but in Fortran, the guarantee that 
two names aren't aliased are easier to come by.
Even more non-aliasing guarantees exist if the language has a more 
expressive type system. In most cases, if you know that two expressions 
have different types, you can infer that they cannot be aliases.

Of course, the life of compiler writers if of less importance to us; I 
added them only because the reports of Fortran compilers having aliasing 
problems due to array indexes made me aware that pointers aren't the 
only source of aliasing. That was quite a surprise to me then.

 Now with this, it appears you are agreeing that SQL has an advantage
 vis-a-vis aliasing compared to OO languages. Yes?

Sure.
I think that making an aliasing bug in SQL is just as easy as in any 
pointer-ridden language, but transactions and constraints (plus the 
considerably smaller typical size of SQL code) make it far easier to 
diagnose any such bugs.

 If so, we are agreeing on the part I care about, and the specifics of
 just what we call aliasing are not so important to me.

I'm not sure whether I'm getting the same mileage out of this, but the 
relevance of a problem is obviously a function on what one is doing on a 
day-to-day basis, so I can readily agree with that :-)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-14 Thread Chris Smith
Marshall wrote...
 I am having a hard time with this very broad definition of aliasing.
 Would we also say that a[1+1] and a[2] are aliases? It seems
 to me, above, that we have only a, and with only one variable
 there can be no aliasing.

The problem with this (and with the relational one as well) is that the 
practical problems don't go away by redefining variable to mean larger 
collections of values.

 given a 32 bit integer variable x, and offsets i and j (equal as in
 the above example) would you say that
 
x = (1  i)
 and
x = (1  j)
 
 are aliased expressions for setting a particular bit in x?

I don't know about Joachim, but yes, I would.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Andreas Rossberg
Marshall wrote:
 
 Okay, sure. But for the problem you describe, both imperativeness
 and the presence of pointers is each necessary but not sufficient;
 it is the two together that causes the problem. So it strikes
 me (again, a very minor point) as inaccurate to describe this as
 a problem with imperative languages per se.
 
 [...]
 
 Right. To me the response to this clear: give up pointers. Imperative
 operations are too useful to give up; indeed they are a requirement
 for certain problems. Pointers on the other hand add nothing except
 efficiency and a lot of confusion. They should be considered an
 implementation technique only, hidden behind some pointerless
 computational model.

Don't get yourself distracted by the low-level notion of pointer. The 
problem *really* is mutability and the associated notion of identity, 
which explicit pointers just exhibit on a very low level.

When you have a language with mutable types (e.g. mutable arrays) then 
objects of these types have identity, which is observable through 
assignment. This is regardless of whether identity is an explicit 
concept (like it becomes with pointers and comparison of pointer values, 
i.e. addresses).

Consequently, you cannot possibly get rid of aliasing issues without 
getting rid of (unrestricted) mutability. Mutability implies object 
identity implies aliasing problems.

On the other hand, pointers are totally a futile concept without 
mutability: if everything is immutable, it is useless to distinguish 
between an object and a pointer to it.

In other words, pointers are essentially just an *aspect* of mutability 
in lower-level languages. On a sufficiently high level of abstraction, 
it does not make much sense to differentiate between both concepts - 
pointers are just mutable objects holding other mutable objects 
(immutable pointer types exist, but are only interesting if you also 
have pointer arithmetics - which, however, is largely equivalent to 
arrays, i.e. not particularly relevant either).

- Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 I can see the lack of a formal model being an issue, but is the
 imperative bit really all that much of an obstacle? How hard
 is it really to deal with assignment? Or does the issue have
 more to do with pointers, aliasing, etc.?
 Actually aliasing is *the* hard issue.
 Okay, sure. Nice explanation.

 But one minor point: you describe this as an issue with imperative
 languages. But aliasing is a problem associated with pointers,
 not with assignment.
 Aliasing is not a problem if the aliased data is immutable.
 
 Okay, sure. But for the problem you describe, both imperativeness
 and the presence of pointers is each necessary but not sufficient;
 it is the two together that causes the problem. So it strikes
 me (again, a very minor point) as inaccurate to describe this as
 a problem with imperative languages per se.

Sure.
It's just that I know that it's viable to give up destructive updates. 
Giving up pointers is a far more massive restriction.

 Right. To me the response to this clear: give up pointers. Imperative
 operations are too useful to give up; indeed they are a requirement
 for certain problems.

I don't know any.
In some cases, you need an additional level of conceptual indirection - 
instead of *doing* the updates, you write a function that *describes* them.

  Pointers on the other hand add nothing except
 efficiency and a lot of confusion. They should be considered an
 implementation technique only, hidden behind some pointerless
 computational model.
 
 I recognize that this is likely to be a controversial opinion.

Indeed.

In fact modes are a way to restrict pointer aliasing.

 I heartily support immutability as the default, for these and other
 reasons.

OK, then we're in agreement here.

 Some functional languages restrict assignment so that there can exist at
 most a single reference to any mutable data structure. That way, there's
 still no aliasing problems, but you can still update in place where it's
 really, really necessary.
 
 Are we speaking of uniqueness types now? I haven't read much about
 them, but it certainly seems like an intriguing concept.

Yup.
It's called modes in some other languages (Mercury or Clean IIRC).

 I know of no professional language that doesn't have references of some
 kind.
 
 Ah, well. I suppose I could mention prolog or mercury, but then
 you used that troublesome word professional. So I will instead
 mention a language which, if one goes by number of search results
 on hotjobs.com for xxx progammer for different value of xxx, is
 more popular than Java and twice as popular as C++. It lacks
 pointers (although I understand they are beginning to creep in
 in the latest version of the standard.) It also posesses a quite
 sophisticated set of facilities for declarative integrity constraints.
 Yet for some reason it is typically ignored by language designers.
 
 http://hotjobs.yahoo.com/jobseeker/jobsearch/search_results.html?keywords_all=sql+programmer

Oh, right. SQL is an interesting case of getting all the problems of 
pointers without having them ;-)

Actually SQL has references - they are called primary keys, but they 
are references nevertheless. (Some SQL dialects also offer synthetic 
ID fields that are guaranteed to remain stable over the lifetime of a 
record. Seems like SQL is imperative enough that programmers want this, 
else the SQL vendors wouldn't have added the feature...)
SQL also has updates.
The result: updates with undefined semantics. E.g. if you have a numeric 
key field, UPDATE commands that increment the key by 1 will fail or work 
depending on the (unspecified) order in which UPDATE touches the 
records. You can have even more fun with updatable views.
With a repeatable read isolation level, you actually return to a 
declarative view of the database: whatever you do with it, you won't see 
it until you commit the transaction. (As soon as you commit, the 
declarative peace is over and you better watch out that your data 
doesn't become inconsistent due to aliasing.)


Aliasing isn't really related to specific programming practices. If two 
accountants chat, and one talks about the hot blonde secretaire and the 
other about his adorable wife, you can imagine the awkwardness that 
ensues as soon as they find out they're talking about the same person!
The only thing that can really be done about it is not adding it 
artificially into a program. In those cases where aliasing is part of 
the modelled domain, you really have to carefully inspect all 
interactions and never, never, never dream about abstracting it away.


Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joachim Durchholz
Darren New schrieb:
 Joachim Durchholz wrote:
 Actually, in a functional programming language (FPL), you write just 
 the postconditions and let the compiler generate the code for you.
 
 Certainly. And my point is that the postcondition describing all valid 
 chess boards reachable from this one is pretty much going to be as big 
 as an implementation for generating it, yes?

Yes. It's a classical case where the postcondition and the code that 
fulfils it are essentially the same.

  The postcondition will
 still have to contain all the rules of chess in it, for example. At best 
 you've replaced loops with some sort of universal quanitifier with a 
 such that phrase.

Correct.

OTOH, isn't that the grail that many people have been searching for: 
programming by simply declaring the results that they want to see?

 Anyway, I expect you could prove you can't do this in the general case. 
 Otherwise, you could just write a postcondition that asserts the output 
 of your function is machine code that when run generates the same 
 outputs as the input string would. I.e., you'd have a compiler that can 
 write other compilers, generated automatically from a description of the 
 semantics of the input stream and the semantics of the machine the code 
 is to run on. I'm pretty sure we're not there yet, and I'm pretty sure 
 you start running into the limits of computability if you do that.

No, FPLs are actually just that: compilable postconditions.
Computability issues aren't more or less a factor than with other kinds 
of compilers: they do limit what you can do, but these limits are loose 
enough that you can do really useful stuff within them (in particular, 
express all algorithms).

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joachim Durchholz
Marshall schrieb:
 David Hopwood wrote:
 This property is, after all, not something that the program should depend on.
 It is determined by how good the static checker currently is, and we want to 
 be
 able to improve checkers (and perhaps even allow them to regress slightly in
 order to simplify them) without changing programs.
 
 Hmmm. I have heard that argument before and I'm conflicted.

I'd want several things.

A way for me to indicate what assertions must be proven statically.
Highlighting (be it compiler messages or flashing colors in an IDE) that 
marks assertions that *will* break.
And highlighting for assertions that *may* break.
In the language, a (possibly) simplicistic inference engine definition 
that gives me minimum guarantees about the things that it will be able 
to prove; if something is out of the reach of the engine, a 
straightforward way to add intermediate assertions until the inference 
succeeds.

(Plus diagnostics that tell me where the actual error may be, whether 
it's a bug in the code or an omission in the assertions. That's probably 
the hardest part of it all.)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Andreas Rossberg wrote:
 Marshall wrote:
 
  Okay, sure. But for the problem you describe, both imperativeness
  and the presence of pointers is each necessary but not sufficient;
  it is the two together that causes the problem. So it strikes
  me (again, a very minor point) as inaccurate to describe this as
  a problem with imperative languages per se.
 
  [...]
 
  Right. To me the response to this clear: give up pointers. Imperative
  operations are too useful to give up; indeed they are a requirement
  for certain problems. Pointers on the other hand add nothing except
  efficiency and a lot of confusion. They should be considered an
  implementation technique only, hidden behind some pointerless
  computational model.

 Don't get yourself distracted by the low-level notion of pointer. The
 problem *really* is mutability and the associated notion of identity,
 which explicit pointers just exhibit on a very low level.

 When you have a language with mutable types (e.g. mutable arrays) then
 objects of these types have identity, which is observable through
 assignment. This is regardless of whether identity is an explicit
 concept (like it becomes with pointers and comparison of pointer values,
 i.e. addresses).

Hmmm, well, I cannot agree. You've defined away the pointers
but then slipped them back in again by assumption (objects
of these types have identity.)

First let me say that the terminology is somewhat problematic.
For the specific issue being discussed here, pointers, identity,
and objects are all the same concept. (I agree that pointer
connotes a low-level construct, however.) Sometimes I think
of this issue as being one with first class variables. An object
with mutable fields is a variable, and if we have pointers or
references or any way to have two different pathways to
that object/those variables, then we run in to the aliasing problem.

However if the mutable types are not first class, then there
is no way to have the aliasing. Thus, if we do not have pointers
or objects or identity but retain mutability, there is no aliasing
problem.


 Consequently, you cannot possibly get rid of aliasing issues without
 getting rid of (unrestricted) mutability. Mutability implies object
 identity implies aliasing problems.

Mutability by itself does not imply identity. I agree that mutability
plus
identity implies aliasing problems, however.


 On the other hand, pointers are totally a futile concept without
 mutability: if everything is immutable, it is useless to distinguish
 between an object and a pointer to it.

Agreed.


 In other words, pointers are essentially just an *aspect* of mutability
 in lower-level languages.

Again, I disagree: it is posible to have mutability without
pointers/identity/objects.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  Joachim Durchholz wrote:
  Marshall schrieb:
  Joachim Durchholz wrote:
  Marshall schrieb:
  I can see the lack of a formal model being an issue, but is the
  imperative bit really all that much of an obstacle? How hard
  is it really to deal with assignment? Or does the issue have
  more to do with pointers, aliasing, etc.?
  Actually aliasing is *the* hard issue.
  Okay, sure. Nice explanation.
 
  But one minor point: you describe this as an issue with imperative
  languages. But aliasing is a problem associated with pointers,
  not with assignment.
  Aliasing is not a problem if the aliased data is immutable.
 
  Okay, sure. But for the problem you describe, both imperativeness
  and the presence of pointers is each necessary but not sufficient;
  it is the two together that causes the problem. So it strikes
  me (again, a very minor point) as inaccurate to describe this as
  a problem with imperative languages per se.

 Sure.
 It's just that I know that it's viable to give up destructive updates.
 Giving up pointers is a far more massive restriction.

Oddly, I feel the opposite. While it's true there are many domains
for which purely functional programming is a fine choice, there
are some domains for which it is insufficient. Any kind of data
managament, for example, requires that you be able to update
the information.

On the other hand, there is no problem domain for which pointers
are a requirement. I agree they are deucedly convenient, though.


  Right. To me the response to this clear: give up pointers. Imperative
  operations are too useful to give up; indeed they are a requirement
  for certain problems.

 I don't know any.
 In some cases, you need an additional level of conceptual indirection -
 instead of *doing* the updates, you write a function that *describes* them.

But then what do you do with that function? Let's say I have an
employee database. John Smith just got hired on 1/1/2006 with
a salary of $10,000. I need to record this fact somewhere. How
do I do that without variables? Current-employees is a variable.
Even if I have the space to keep all historical data, so I'm not
deleting anything, I still have to have a variable for the latest
version of the accumulated data. I can solve this without
pointers, but I can't solve it without variables.


   Pointers on the other hand add nothing except
  efficiency and a lot of confusion. They should be considered an
  implementation technique only, hidden behind some pointerless
  computational model.
 
  I recognize that this is likely to be a controversial opinion.

 Indeed.

 In fact modes are a way to restrict pointer aliasing.

I should like to learn more about these. I have some vague
perception of the existence of linear logic, but not much
else. However, I also already have an excellent solution
to the pointer aliasing problem, so I'm less motivated.


  I heartily support immutability as the default, for these and other
  reasons.

 OK, then we're in agreement here.

  Some functional languages restrict assignment so that there can exist at
  most a single reference to any mutable data structure. That way, there's
  still no aliasing problems, but you can still update in place where it's
  really, really necessary.
 
  Are we speaking of uniqueness types now? I haven't read much about
  them, but it certainly seems like an intriguing concept.

 Yup.
 It's called modes in some other languages (Mercury or Clean IIRC).

Cool.


  I know of no professional language that doesn't have references of some
  kind.
 
  Ah, well. I suppose I could mention prolog or mercury, but then
  you used that troublesome word professional. So I will instead
  mention a language which, if one goes by number of search results
  on hotjobs.com for xxx progammer for different value of xxx, is
  more popular than Java and twice as popular as C++. It lacks
  pointers (although I understand they are beginning to creep in
  in the latest version of the standard.) It also posesses a quite
  sophisticated set of facilities for declarative integrity constraints.
  Yet for some reason it is typically ignored by language designers.
 
  http://hotjobs.yahoo.com/jobseeker/jobsearch/search_results.html?keywords_all=sql+programmer

 Oh, right. SQL is an interesting case of getting all the problems of
 pointers without having them ;-)

Oh, pooh. SQL has plenty of problems, sure, but the problems
of pointers are not among its faults.


 Actually SQL has references - they are called primary keys, but they
 are references nevertheless.

I strongly object; this is quite incorrect. I grant you that from the
50,000 foot level they appear identical, but they are not. To
qualify as a reference, there need to be reference and dereference
operations on the reference datatype; there is no such operation
is SQL.

Would you say the relational algebra has references?

Or, consider the classic prolog ancestor query. Let's say we're

Re: What is a type error?

2006-07-13 Thread Andreas Rossberg
Marshall wrote:
 
 However if the mutable types are not first class, then there
 is no way to have the aliasing. Thus, if we do not have pointers
 or objects or identity but retain mutability, there is no aliasing
 problem.

Yes, technically you are right. But this makes a pretty weak notion of 
mutability. All stateful data structures had to stay within their 
lexical scope, and could never be passed to a function. For example, 
this essentially precludes object-oriented programming, because you 
could not have objects with state (the alternative, second class 
objects, would be even less objective).

Generally, second-classness is an ad-hoc restriction that can work 
around all kinds of problems, but rarely with satisfactory results. So I 
would tend to say that this is not an overly interesting point in the 
design space. But YMMV.

In other words, pointers are essentially just an *aspect* of mutability
in lower-level languages.
 
 Again, I disagree: it is posible to have mutability without
 pointers/identity/objects.

OK, if you prefer: it is an aspect of first-class mutability - which is 
present in almost all imperative languages I know. :-)

- Andreas

-- 
Andreas Rossberg, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


[Way off-topic] Re: What is a type error?

2006-07-13 Thread David Hopwood
Joachim Durchholz wrote:
 A concrete example: the
 first thing that Windows does when accepting userland data structures
 is... to copy them; this were unnecessary if the structures were immutable.

It is necessary also because the userland structures are in a different
privilege domain, and the memory backing them is potentially mapped by
other processes, or accessible by other threads of this process, that can
be running in parallel while the current thread is in the kernel.

Also, it tends to be easier and more secure to treat userland structures
as if they were in a different address space (even if for a particular OS,
the kernel would be able to access user mode addresses directly). The
kernel cannot trust that pointers in userland structures are valid, for
instance.

So, in order to avoid a copy when accepting a userland structure:

 - when servicing a kernel call for a given process, the OS must use
   kernel-mode page table mappings that are a superset of the user-mode
   mappings for that process at the point of the call.

 - the memory backing the structure must be immutable in all processes
   that could map it, and the kernel must trust the means by which it is
   made immutable.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Darren New
Andreas Rossberg wrote:
 Yes, technically you are right. But this makes a pretty weak notion of 
 mutability. All stateful data structures had to stay within their 
 lexical scope, and could never be passed to a function. 

Not really. The way Hermes handles this is with destructive assignment. 
Each variable holds a value, and you (almost) cannot have multiple 
variables referring to the same value.

If you want to assign Y to X, you use
X := Y
after which Y is considered to be uninitialized. If you want X and Y to 
have the same value, you use
X := copy of Y
after which X and Y have the same value but are otherwise unrelated, and 
changes to one don't affect the other.

(As almost an aside, the non-scalar data structures were very similar to 
SQL data tables.)

If you declare (the equivalent to) a function, you can indicate whether 
the paramters matching the arguments are passed destructively, or are 
read-only, or are copy-in-copy-out. So you could declare a function, for 
example, that you pass a table into, and if it's marked as a read-only 
parameter, the compiler ensures the callee does not modify the table and 
the compiler generates code to pass a pointer. One could also mark a 
variable (for example) as uninitialized on entry, intialized on return, 
and uninitalized on the throw of an exception, and this could be used 
(for example) for the read-a-line-from-a-socket routine.

The only value that came close to being shared is an outgoing connection 
to a procedure; the equivalent of the client side of a socket. For 
these, you could make copies, and each copy would point to the same 
receiver. The receiving process could change over time, by passing its 
end of the socket in a message to some other process (live code 
upgrading, for example).

Since everything could be passed as part of a message, including code, 
procedures, tables, and inports and outports (the ends of sockets), 
I don't see that it had any problems with first classness.

 OK, if you prefer: it is an aspect of first-class mutability - which is 
 present in almost all imperative languages I know. :-)

I disagree. It's entirely possible to make sophisticated imperitive 
languages with assignment and without aliasing.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
Joachim Durchholz [EMAIL PROTECTED] wrote:
 OTOH, isn't that the grail that many people have been searching for: 
 programming by simply declaring the results that they want to see?

Possibly.

 No, FPLs are actually just that: compilable postconditions.

This seems to me a bit misleading.  Perhaps someone will explain why I 
should stop thinking this way; but currently I classify statements like 
this in the well, sort of slot of my mind.  If functional programming 
were really just compilable postconditions, then functional programmers 
would be able to skip a good bit of stuff that they really can't.  For 
example, time and space complexity of code is still entirely relevant 
for functional programming.  I can't simply write:

(define fib
  (lambda (x) (if ( x 2) 1 (+ (fib (- x 1)) (fib (- x 2))

and expect the compiler to create an efficient algorithm for me.  This 
is true even though the above is (the LISP transcription of) the most 
natural way to describe the fibonacci sequence from a mathematical 
standpoint.  It still runs in exponential time, and it still matters 
that it runs in exponential time; and LISP programmers adapt their so-
called declarative code to improve time bounds all the time.  This makes 
it harder for me to call it declarative (or compilable postconditions) 
and feel entirely honest.

(Yes, I realize that the above could be optimized in a language that 
does normal order evaluation with common subexpression elimination. and 
become linear-time.  However, that's not true of algorithms in general.  
It is not the case that all that's needed to find an efficient algorithm 
for something is to plug it into a Haskell compiler and observe what 
happens.  Or, if that is the case, there are a few CS professors I know 
who would be quite interested in hearing so.)

 Computability issues aren't more or less a factor than with other kinds 
 of compilers: they do limit what you can do, but these limits are loose 
 enough that you can do really useful stuff within them (in particular, 
 express all algorithms).

Is it really consistent to say that postconditions allow you to express 
algorithms?

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joachim Durchholz
Marshall schrieb:
 Mutability by itself does not imply identity.

Well, the implication certainly holds from identity to mutability.
The only definition of identity that I found to hold up for all kinds of 
references (pointers, shared-memory identifiers, URLs etc.) is this:

Two pieces of data are identical if and only if:
a) they are equal
b) they stay equal after applying an arbitrary operation to one of them.

This means that for immutable objects, there's no observable difference 
between equality and identity (which I think it just fine).


For the implicaton from mutability to identity, I'm not sure whether 
talking about mutation still makes sense without some kind of identity. 
For example, you need to establish that the object after the mutation is 
still the same in some sense, and this the same concept is exactly 
identity.

  I agree that mutability
 plus identity implies aliasing problems, however.

Then we're agreeing about the most important point anyway.

 In other words, pointers are essentially just an *aspect* of mutability
 in lower-level languages.
 
 Again, I disagree: it is posible to have mutability without
 pointers/identity/objects.

I'm sceptical.
Any examples?

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
 Hmmm, well, I cannot agree. You've defined away the pointers
 but then slipped them back in again by assumption (objects
 of these types have identity.)
 
 First let me say that the terminology is somewhat problematic.
 For the specific issue being discussed here, pointers, identity,
 and objects are all the same concept. (I agree that pointer
 connotes a low-level construct, however.)

Unless I'm missing your point, I disagree with your disagreement.  
Mutability only makes sense because of object identity (in the generic 
sense; no OO going on here).  Without object identities, mutability is 
useless.  What's the use of changing something if you're not sure you'll 
ever be able to find it again?

You may limit the scope of object identity arbitrarily, even to the 
point that aliasing is impossible (though with lexical closure, that 
gets even more limiting than it may first appear)... but you're just 
trading off power for simplicity, and the really interesting uses of 
mutations are those that allow access to specific objects from any 
number different bits of code, on a program-wide or at least module-wide 
scope.  Most mediocre programmers could replace assignment with 
recursion if that assignment is limited to local variables of a single 
subroutine.  I don't necessarily agree that the result will be a better 
program despite others' conviction on the matter; however, the 
difference certainly isn't worth complicating the language with mutation 
unless you're willing to allow the interesting uses of mutation as well.

 Mutability by itself does not imply identity. I agree that mutability
 plus identity implies aliasing problems, however.

We might have a terminological issue, then.  I'd tend to say that 
mutability definitely does imply identity, but identity doesn't imply 
aliasing.  Same difference.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Darren New
Chris Smith wrote:
 Unless I'm missing your point, I disagree with your disagreement.  
 Mutability only makes sense because of object identity (in the generic 
 sense; no OO going on here). 

Depends what you mean by object.

int x = 6; int y = 5; x = y;

I'd say x was mutable, with no identity problems involved?

Why is it problematic that variables have identity and are mutable? 
Certainly I can later find whatever value I put into x.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joe Marshall

Marshall wrote:

 Again, I disagree: it is posible to have mutability without
 pointers/identity/objects.

I think you are wrong, but before I make a complete ass out of myself,
I have to ask what you mean by `mutability'.  (And
pointers/identity/objects, for that matter.)

Alan Bawden discusses the phenomenon of `state' in his Ph.D.
dissertation Implementing Distributed Systems Using Linear Naming.
MIT AI Lab Technical Report AITR-1627. March 1993  He makes a
persuasive argument that `state' is associated with cycles in naming.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread David Hopwood
Marshall wrote:
 David Hopwood wrote:
Marshall wrote:

Wouldn't it be possible to do them at compile time? (Although
this raises decidability issues.)

It is certainly possible to prove statically that some assertions cannot fail.

The ESC/Java 2 (http://secure.ucd.ie/products/opensource/ESCJava2/docs.html)
tool for JML (http://www.cs.iastate.edu/~leavens/JML/) is one system that does
this -- although some limitations and usability problems are described in
http://secure.ucd.ie/products/opensource/ESCJava2/ESCTools/papers/CASSIS2004.pdf.
 
 I look forward to reading this. I read a paper on JML a while ago and
 found it quite interesting.
 
Mightn't it also be possible to
leave it up to the programmer whether a given contract
was compile-time or runtime?

That would be possible, but IMHO a better option would be for an IDE to give
an indication (by highlighting, for example), which contracts are dynamically
checked and which are static.

This property is, after all, not something that the program should depend on.
It is determined by how good the static checker currently is, and we want to 
be
able to improve checkers (and perhaps even allow them to regress slightly in
order to simplify them

.. or improve their performance ..

 ) without changing programs.
 
 Hmmm. I have heard that argument before and I'm conflicted.
 
 I can think of more reasons than just runtime safety for which I'd
 want proofs. Termination for example, in highly critical code;
 not something for which a runtime check will suffice.

It is true that some properties cannot be verified directly by a runtime check,
but that does not mean that runtime checks are not indirectly useful in 
verifying
them.

For example, we can check at runtime that a loop variant is strictly decreasing
with each iteration. Then, given that each iteration of the loop body 
terminates,
it is guaranteed that the loop terminates, *either* because the runtime check
fails, or because the variant goes to zero.

In general, we can verify significantly more program properties using a
combination of runtime checks and static proof, than we can using static proof
alone. That may seem like quite an obvious statement, but the consequence is
that any particular property is, in general, not verified purely statically or
purely at runtime.

I am not opposed to being able to annotate an assertion to say that it should
be statically provable and that a runtime check should not be used. However,

 - such annotations should be very lightweight and visually undistracting,
   relative to the assertion itself;

 - a programmer should not interpret such an annotation on a particular 
assertion
   to mean that its static validity is not reliant on runtime checks elsewhere;

 - if the class of assertions that are statically provable changes, then a
   tool should be provided which can *automatically* add or remove these
   annotations (with programmer approval when they are removed).


I'd like to make a couple more comments about when it is sufficient to detect
errors and when it is necessary to prevent them:

 - If a language supports transactions, then this increases the proportion
   of cases in which it is sufficient to detect errors in imperative code.
   When state changes are encapsulated in a transaction, it is much easier
   to recover if an error is detected, because invariants that were true of
   objects used by the transaction when it started will be automatically
   reestablished. (Purely functional code does not need this.)

 - Almost all safety-critical systems have a recovery or safe shutdown
   behaviour which should be triggered when an error is detected in the
   rest of the program. The part of the program that implements this behaviour
   definitely needs to be statically correct, but it is usually only a small
   amount of code.

   Safety-critical systems that must either prevent errors or continue
   functioning in their presence (aircraft control systems, for example) are
   in a separate category that demand *much* greater verification effort. Even
   for these systems, though, it is still useful to detect errors in cases
   where they cannot be prevented. When multiple independent implementations
   of a subsystem are used to check each other, this error detection can be
   used as an input to the decision of which implementation is failing and
   which should take over.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  Mutability by itself does not imply identity.

 Well, the implication certainly holds from identity to mutability.
 The only definition of identity that I found to hold up for all kinds of
 references (pointers, shared-memory identifiers, URLs etc.) is this:

 Two pieces of data are identical if and only if:
 a) they are equal
 b) they stay equal after applying an arbitrary operation to one of them.

 This means that for immutable objects, there's no observable difference
 between equality and identity (which I think it just fine).

Agreed on all counts.


 For the implicaton from mutability to identity, I'm not sure whether
 talking about mutation still makes sense without some kind of identity.
 For example, you need to establish that the object after the mutation is
 still the same in some sense, and this the same concept is exactly
 identity.

Unless we have some specific mechanism to make two named variables
have the same identity, (distinct from having the same value), then
there
is no aliasing. Pointers or references is one such mechanism; lexical
closures over variables is another. (I don't know of any others.)


   I agree that mutability
  plus identity implies aliasing problems, however.

 Then we're agreeing about the most important point anyway.

Yes.


  In other words, pointers are essentially just an *aspect* of mutability
  in lower-level languages.
 
  Again, I disagree: it is posible to have mutability without
  pointers/identity/objects.
 
 I'm sceptical.
 Any examples?

See next post.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread David Hopwood
Chris Smith wrote:
 Joachim Durchholz [EMAIL PROTECTED] wrote:
 
OTOH, isn't that the grail that many people have been searching for: 
programming by simply declaring the results that they want to see?
 
 Possibly.
 
No, FPLs are actually just that: compilable postconditions.
 
 This seems to me a bit misleading.  Perhaps someone will explain why I 
 should stop thinking this way; but currently I classify statements like 
 this in the well, sort of slot of my mind.  If functional programming 
 were really just compilable postconditions, then functional programmers 
 would be able to skip a good bit of stuff that they really can't.  For 
 example, time and space complexity of code is still entirely relevant 
 for functional programming.  I can't simply write:
 
 (define fib
   (lambda (x) (if ( x 2) 1 (+ (fib (- x 1)) (fib (- x 2))
 
 and expect the compiler to create an efficient algorithm for me.

This is true, but note that postconditions also need to be efficient
if we are going to execute them.

That is, the difference you've pointed out is not a difference between
executable postconditions and functional programs. Both the inefficient
functional definition of 'fib' and an efficient one are executable
postconditions. In order to prove that the efficient implementation is
as correct as the inefficient one, we need to prove that, treated as
postconditions, the former implies the latter.

(In this case a single deterministic result is required, so the former
will be equivalent to the latter.)

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Joe Marshall wrote:
 Marshall wrote:
 
  Again, I disagree: it is posible to have mutability without
  pointers/identity/objects.

 I think you are wrong, but before I make a complete ass out of myself,
 I have to ask what you mean by `mutability'.  (And
 pointers/identity/objects, for that matter.)

Responding to requests for examples from Joachim, Joe, and Chris

The very simple example is the one Darren New already mentioned.

Consider the following Java fragment:

void foo() {
  int i = 0;
  int j = 0;

  // put any code here you want

  j = 1;
  i = 2;
  // check value of j here. It is still 1, no matter what you filled in
above.
  // The assignment to i cannot be made to affect the value of j.

}


Those two local primitive variables cannot be made to have the same
identity. But you can update them, so this is an example of mutability
without the possibility of identity.

Earlier I also mentioned SQL tables as an example, although SQL
supports *explicit* aliasing via views.


 Alan Bawden discusses the phenomenon of `state' in his Ph.D.
 dissertation Implementing Distributed Systems Using Linear Naming.
 MIT AI Lab Technical Report AITR-1627. March 1993  He makes a
 persuasive argument that `state' is associated with cycles in naming.

I would like to read that, but my brain generally runs out of gas at
about 21
pages, so it's about an order of magnitude bigger than I can currently
handle. :-( As to cycles in naming that's certainly an issue. But it
it
a requirement for state? Back to Java locals, it seems to me they meet
the standard definition of state, despite the lack of cycles.

As to pointers/references, I earlier mentioned the existence of the
reference/dereference operations as being definitional. Note that
one can go to some lengths to obscure them, but they're still there.
For example, Java has the reference and dereference operators;
Java's . operator is actually C's - operator.

I am not so bold/foolish as to attempt a defintion of object however.
:-)


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Joe Marshall

Marshall wrote:

 Consider the following Java fragment:

 void foo() {
   int i = 0;
   int j = 0;

   // put any code here you want

   j = 1;
   i = 2;
   // check value of j here. It is still 1, no matter what you filled in
 above.
   // The assignment to i cannot be made to affect the value of j.

 }

True, but you have hidden the pointers.  Semantically, the identifiers
i and j refer not to integers but to locations that hold integers.  The
assignment modifies the location.

 Those two local primitive variables cannot be made to have the same
 identity. But you can update them, so this is an example of mutability
 without the possibility of identity.

The identity is temporal:  You use the same variable name at two
different times.  Do you intend for the second `i' to mean the same
variable as the first `i'?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
Darren New [EMAIL PROTECTED] wrote:
 Chris Smith wrote:
  Unless I'm missing your point, I disagree with your disagreement.  
  Mutability only makes sense because of object identity (in the generic 
  sense; no OO going on here). 
 
 Depends what you mean by object.
 
 int x = 6; int y = 5; x = y;
 
 I'd say x was mutable, with no identity problems involved?

The variable x definitely has identity that's independent of its value.  
Some might call that a problem in and of itself, as it complicates the 
formal model of the language and makes it difficult to predict what 
result will be produced by normal order evaluation.

On the other hand, this thread seems to be using identity to mean 
identity with potential for aliasing, in which case it is vacuously 
true that eliminating identity also prevents the problems that arise 
from aliasing.  It is true, and I agree on this with Marshall, that 
eliminating the potential for aliasing solves a lot of problems with 
checking invariants.  I also see, though, that the majority (so far, I'd 
say all) of the potential uses for which it's worth introducing mutation 
into an otherwise mutation-free language allow the possibility of 
aliasing, which sorta makes me wonder whether this problem is worth 
solving.  I'd like to see an example of code that would be harder to 
write without mutation, but which can obey any restriction that's 
sufficient to prevent aliasing.

 Why is it problematic that variables have identity and are mutable? 
 Certainly I can later find whatever value I put into x.

I simply found the language confusing.  I said it would be nonsensical 
for a language to have mutation without identity.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
David Hopwood [EMAIL PROTECTED] wrote:
 This is true, but note that postconditions also need to be efficient
 if we are going to execute them.

If checked by execution, yes.  In which case, I am trying to get my head 
around how it's any more true to say that functional languages are 
compilable postconditions than to say the same of imperative languages.  
In both cases, some statement is asserted through a description of a 
means of computing it.  There may be a distinction worth making here, 
but I'm missing it so far.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Rob Warnock
Marshall [EMAIL PROTECTED] wrote:
+---
| Joachim Durchholz wrote:
|  Actually SQL has references - they are called primary keys, but they
|  are references nevertheless.
| 
| I strongly object; this is quite incorrect. I grant you that from the
| 50,000 foot level they appear identical, but they are not.
+---

Agreed. The only thing different about primary keys from any other
key is uniqueness -- a selection by primary key will return only one
record. Other than that constraint, many databases treat them exactly
the same as non-primary keys [e.g., can form indexes on them, etc.].

+---
| To qualify as a reference, there need to be reference and dereference
| operations on the reference datatype; there is no such operation is SQL.
+---

Not in ANSI SQL92, say, but there might be in most SQL databases!
[See below re OIDs. Also, SQL:1999 had a REF type that was essentially
and OID.]

+---
| Would you say the relational algebra has references?
+---

Don't confuse SQL  relational algebra!! You'll get real
relational algebraists *way* bent out of shape if you do that!

+---
|  (Some SQL dialects also offer synthetic ID fields that are
|  guaranteed to remain stable over the lifetime of a record.
| 
| Primary keys are updatable; there is nothing special about them.
+---

I think he's probably talking about OIDs (object IDs). Most
current SQL-based databases provide them, usually as a normally-
invisible system column that doesn't show up when you say
SELECT * FROM, but that *does* appear if you say SELECT oid, *,
and may be used as a primary key even on tables with no actual
primary key:

rpw3=# select * from toy limit 4;
   c1   |  c2   |   c3   | upd 
+---++-
 fall   | tape  | My Favorite Thanksgiving   |  16
 xmas   | book  | My Favorite Christmas  |   2
 xmas   | video | The Grinch who Stole Christmas |   4
 summer | book  | Unusual 4ths of July   |  17
(4 rows)

rpw3=# select oid, * from toy limit 4;
  oid  |   c1   |  c2   |   c3   | upd 
---++---++-
 19997 | fall   | tape  | My Favorite Thanksgiving   |  16
 19998 | xmas   | book  | My Favorite Christmas  |   2
 1 | xmas   | video | The Grinch who Stole Christmas |   4
 2 | summer | book  | Unusual 4ths of July   |  17
(4 rows)

rpw3=# select * from toy where oid = 19998;
  c1  |  c2  |  c3   | upd 
  --+--+---+-
   xmas | book | My Favorite Christmas |   2
   (1 row)

rpw3=# insert into toy values ('fall','book','Glory Road');
INSERT 32785 1

rpw3=# select oid, * from toy where oid = 32785;
  oid  |  c1  |  c2  | c3 | upd 
---+--+--++-
 32785 | fall | book | Glory Road |  21
 (1 row)

rpw3=# 

See http://www.postgresql.org/docs/8.1/static/datatype-oid.html
for how PostgreSQL treats OIDs [including some critical limitations].


-Rob

-
Rob Warnock [EMAIL PROTECTED]
627 26th Avenue URL:http://rpw3.org/
San Mateo, CA 94403 (650)572-2607

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Joe Marshall wrote:
 Marshall wrote:
 
  Consider the following Java fragment:
 
  void foo() {
int i = 0;
int j = 0;
 
// put any code here you want
 
j = 1;
i = 2;
// check value of j here. It is still 1, no matter what you filled in
  above.
// The assignment to i cannot be made to affect the value of j.
 
  }

 True, but you have hidden the pointers.  Semantically, the identifiers
 i and j refer not to integers but to locations that hold integers.  The
 assignment modifies the location.

What the implementation looks like shouldn't affect how we speak
of the logical model. In the above code, there are no pointers.

By your definition, pointer and variable are synonyms. That doesn't
seem like a good idea to me. (What if i and j end up in registers?
I have not heard it said that registers have addresses.)


  Those two local primitive variables cannot be made to have the same
  identity. But you can update them, so this is an example of mutability
  without the possibility of identity.

 The identity is temporal:  You use the same variable name at two
 different times.  Do you intend for the second `i' to mean the same
 variable as the first `i'?

Okay, so i and i have the same identity. I suppose you could argue
that the language's namespace is an address-space, and variable names
are addresses. At this point, though, you've made the concept of
identity
so broad that it is now necessarily a property of all languages that
use
named values, whether updatable or not. I think you would also have
to call 3 a void pointer by this scheme.

Clearly there is *some* difference between a language which allows
explicit pointers and thus aliasing and one that doesn't. What term
would you use? First-class variables?


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Chris Smith wrote:
 Darren New [EMAIL PROTECTED] wrote:
  Chris Smith wrote:
   Unless I'm missing your point, I disagree with your disagreement.
   Mutability only makes sense because of object identity (in the generic
   sense; no OO going on here).
 
  Depends what you mean by object.
 
  int x = 6; int y = 5; x = y;
 
  I'd say x was mutable, with no identity problems involved?

 The variable x definitely has identity that's independent of its value.

I'm not sure what you mean by that.

  I also see, though, that the majority (so far, I'd
 say all) of the potential uses for which it's worth introducing mutation
 into an otherwise mutation-free language allow the possibility of
 aliasing, which sorta makes me wonder whether this problem is worth
 solving.

What about my example of SQL? Mutation, no pointers, no aliasing.
Yet: useful.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
 Chris Smith wrote:
  Darren New [EMAIL PROTECTED] wrote:
   Chris Smith wrote:
Unless I'm missing your point, I disagree with your disagreement.
Mutability only makes sense because of object identity (in the generic
sense; no OO going on here).
  
   Depends what you mean by object.
  
   int x = 6; int y = 5; x = y;
  
   I'd say x was mutable, with no identity problems involved?
 
  The variable x definitely has identity that's independent of its value.
 
 I'm not sure what you mean by that.

I mean, simply, that when you can assign a value to a variable, then you 
care that it is that variable and not a different one.  That's identity 
in the normal sense of the word.  The code elsewhere in the procedure is 
able to access the value of 'x', and that has meaning even though you 
don't know what value 'x' has.  This has definite implications, and is a 
useful concept; for example, it means that the pure lambda calculus no 
longer sufficient to express the semantics of the programming language, 
but instead something else is required.

What you are asking for is some subset of identity, and I've not yet 
succeeded in understanding exactly what it is or what its limits are... 
except that so far, it seems to have everything to do with pointers or 
aliasing.

   I also see, though, that the majority (so far, I'd
  say all) of the potential uses for which it's worth introducing mutation
  into an otherwise mutation-free language allow the possibility of
  aliasing, which sorta makes me wonder whether this problem is worth
  solving.
 
 What about my example of SQL? Mutation, no pointers, no aliasing.
 Yet: useful.

I'm not yet convinced that this is any different from a language with 
standard pointer aliasing.  If I have two tables A and B, and a foreign 
key from A into B, then I run into the same problems with enforcing 
constraints that I would see with a pointer model... when I update a 
relation, I need to potentially check every other relation that contains 
a foreign key into it, in order to ensure that its constraints are not 
violated by that constraint.  That's the same thing that is being 
pointed out as a negative consequence of aliasing in other languages.  
For example, executing:

UPDATE P SET x = 5 WHERE y = 43;

may result in the database having to re-evaluate the constraint that 
says that for all P(x, y, z), x must be less than 4 when z = 17.  One 
difference is that while in general purpose programming languages this 
appears to be a daunting task, databases are set up to do these kinds of 
things all the time and contain optimizations for it... but the problem 
is still the same, and it would still present the same difficulties in 
doing formal proofs that running the above UPDATE statement doesn't 
violate any invariants.

(If I'm wrong about that, please let me know; I'd very interested if 
that's so.)

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread David Hopwood
Chris Smith wrote:
 David Hopwood [EMAIL PROTECTED] wrote:
 
This is true, but note that postconditions also need to be efficient
if we are going to execute them.
 
 If checked by execution, yes.  In which case, I am trying to get my head 
 around how it's any more true to say that functional languages are 
 compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition [*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

If a subprogram is intended to have side effects, then its postcondition
can describe the results of the side effects, but must not reexecute them.


[*] E.g. see
http://www.spatial.maine.edu/~worboys/processes/hoare%20axiomatic.pdf,
although the term postcondition was introduced later than this paper.

-- 
David Hopwood [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Marshall
Chris Smith wrote:
 Marshall [EMAIL PROTECTED] wrote:
  Chris Smith wrote:
   Darren New [EMAIL PROTECTED] wrote:
Chris Smith wrote:
 Unless I'm missing your point, I disagree with your disagreement.
 Mutability only makes sense because of object identity (in the generic
 sense; no OO going on here).
   
Depends what you mean by object.
   
int x = 6; int y = 5; x = y;
   
I'd say x was mutable, with no identity problems involved?
  
   The variable x definitely has identity that's independent of its value.
 
  I'm not sure what you mean by that.

 I mean, simply, that when you can assign a value to a variable, then you
 care that it is that variable and not a different one.  That's identity
 in the normal sense of the word.

I guess it is, now that you mention it.


 The code elsewhere in the procedure is
 able to access the value of 'x', and that has meaning even though you
 don't know what value 'x' has.  This has definite implications, and is a
 useful concept; for example, it means that the pure lambda calculus no
 longer sufficient to express the semantics of the programming language,
 but instead something else is required.

 What you are asking for is some subset of identity, and I've not yet
 succeeded in understanding exactly what it is or what its limits are...
 except that so far, it seems to have everything to do with pointers or
 aliasing.

Perhaps it is specifically first-class identity, rather than identity
per se.


I also see, though, that the majority (so far, I'd
   say all) of the potential uses for which it's worth introducing mutation
   into an otherwise mutation-free language allow the possibility of
   aliasing, which sorta makes me wonder whether this problem is worth
   solving.
 
  What about my example of SQL? Mutation, no pointers, no aliasing.
  Yet: useful.

 I'm not yet convinced that this is any different from a language with
 standard pointer aliasing.  If I have two tables A and B, and a foreign
 key from A into B, then I run into the same problems with enforcing
 constraints that I would see with a pointer model... when I update a
 relation, I need to potentially check every other relation that contains
 a foreign key into it, in order to ensure that its constraints are not
 violated by that constraint.  That's the same thing that is being
 pointed out as a negative consequence of aliasing in other languages.

No, that's not the same thing. What you are describing here is
not an aliasing issue, but simply the consequences of allowing
constraints to mention more than one variable.

In our i-and-j example above, suppose there was a constraint
such that i  j. We have to re-check this constraint if we
update either i or j. That's not the same thing as saying
that i and j are aliased.

A foreign key constraint is a multi-variable constraint.
Specifically, a foreign key from table A, attribute a
to table B, attribute b is the constraint:

forall a in A, exists b in B such that a = b.

Note that two variables, A and B, are referenced in
the constraint. In general, any constraint on two
variables will have to be rechecked upon update
to either.


 For example, executing:

 UPDATE P SET x = 5 WHERE y = 43;

 may result in the database having to re-evaluate the constraint that
 says that for all P(x, y, z), x must be less than 4 when z = 17.

I don't see any aliasing in this example either.

But consider how much worse this problem is if real aliasing
is possible. We have some pointer variable, and initialize
it with the return value from some function. We don't know
anything about what variable is involved. We update through
the pointer. What constraints must we recheck? Apparently
all of them; unless we have perfect alias analysis, we can't
tell what variables are affected by our update.


 One
 difference is that while in general purpose programming languages this
 appears to be a daunting task, databases are set up to do these kinds of
 things all the time and contain optimizations for it... but the problem
 is still the same, and it would still present the same difficulties in
 doing formal proofs that running the above UPDATE statement doesn't
 violate any invariants.

 (If I'm wrong about that, please let me know; I'd very interested if
 that's so.)

I'm interested to hear your reply.


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-13 Thread Chris Smith
Marshall [EMAIL PROTECTED] wrote:
  What you are asking for is some subset of identity, and I've not yet
  succeeded in understanding exactly what it is or what its limits are...
  except that so far, it seems to have everything to do with pointers or
  aliasing.
 
 Perhaps it is specifically first-class identity, rather than identity
 per se.

As in: the value of one variable can (be/refer to/depend on) the 
identity of another variable?  I can certainly see this as as 
reasonable concept to consider.

  I'm not yet convinced that this is any different from a language with
  standard pointer aliasing.  If I have two tables A and B, and a foreign
  key from A into B, then I run into the same problems with enforcing
  constraints that I would see with a pointer model... when I update a
  relation, I need to potentially check every other relation that contains
  a foreign key into it, in order to ensure that its constraints are not
  violated by that constraint.  That's the same thing that is being
  pointed out as a negative consequence of aliasing in other languages.
 
 No, that's not the same thing. What you are describing here is
 not an aliasing issue, but simply the consequences of allowing
 constraints to mention more than one variable.

 A foreign key constraint is a multi-variable constraint.
 Specifically, a foreign key from table A, attribute a
 to table B, attribute b is the constraint:
 
 forall a in A, exists b in B such that a = b.
 
 Note that two variables, A and B, are referenced in
 the constraint.

There's confusion here coming from different usages of the word 
variable.  Let us talk instead of values, and of the abstract structures 
that gives them meaning.  In both cases (invariants in a hypothetical 
imperative language, and in a relational database), the constraints make 
reference to these structures of values (relations, for example, or 
various kinds of data structures), and not to the individual values or 
objects that they contain.  In both cases, the problem is not that we 
don't know what structures to check to verify the invariant; rather, 
it's that we have to check ALL of the values in that structure.

As someone pointed out, this is to be expected in a world of mutable 
things with identity that are globally locatable.  It is simple fact 
that if I tell you I spoke to Barbara's husband, you may need to trace 
down who Barbara's husband is before you could discover that, for 
example, maybe I actually spoke to your boss, or to your nephew's best-
friend's father.  If databases are capable of modeling these kinds of 
relationships (and of course they are), then they are as susceptible to 
aliasing -- in a logical sense that avoids mention of pointer -- as 
anyone else.

 I don't see any aliasing in this example either.
 

Actually, this was probably a bad example.  Let's stick to the others 
involving relationships between tuples.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Andreas Rossberg
David Hopwood wrote:
 George Neuner wrote:

All of this presupposes that you have a high level of confidence in
the compiler.  I've been in software development for going in 20 years
now and worked 10 years on high performance, high availability
systems.  In all that time I have yet to meet a compiler ... or
significant program of any kind ... that is without bugs, noticeable
or not.
 
 [...]
 
 One of the main reasons for this, IMHO, is that many compilers place too
 much emphasis on low-level optimizations of dubious merit. For C and
 Java, I've taken to compiling all non-performance-critical code without
 optimizations, and that has significantly reduced the number of compiler
 bugs that I see. It has very little effect on overall execution performance
 (and compile times are quicker).
 
 I don't think that over-complex type systems are the cause of more than a
 small part of the compiler bug problem. In my estimation, the frequency of
 bugs in different compilers *for the same language* can vary by an order of
 magnitude.

Also, the use of typed intermediate languages within the compiler might 
actually help drastically cutting down on the more severe problem of 
code transformation bugs, notwithstanding the relative complexity of 
suitable internal type systems.

   - Andreas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 Now, I'm not fully up to speed on DBC. The contract specifications,
 these are specified statically, but checked dynamically, is that
 right?
 That's how it's done in Eiffel, yes.

   In other words, we can consider contracts in light of
 inheritance, but the actual verification and checking happens
 at runtime, yes?
 Sure. Though, while DbC gives rules for inheritance (actually subtypes),
 these are irrelevant to the current discussion; DbC-minus-subtyping can
 still be usefully applied.
 
 Yes, subtyping. Of course I meant to say subtyping.blush

No need to blush for that, it's perfectly OK in the Eiffel context, 
where a subclass ist always assumed to be a subtype. (This assumption 
isn't always true, which is why Eiffel has some serious holes in the 
type system. The DbC ideas are still useful though, saying subtype 
instead of subclass just makes the concept applicable outside of OO 
languages.)

 I can certainly see how DbC would be useful without subtyping.
 But would there still be a reason to separate preconditions
 from postconditions? I've never been clear on the point
 of differentiating them (beyond the fact that one's covariant
 and the other is contravariant.)

There is indeed.
The rules about covariance and contravariance are just consequences of 
the notion of having a subtype (albeit important ones when it comes to 
designing concrete interfaces).

For example, if a precondition fails, something is wrong about the 
things that the subroutine assumes about its environment, so it 
shouldn't have been called. This means the error is in the caller, not 
in the subroutine that carries the precondition.

The less important consequence is that this should be reflected in the 
error messages.

The more important consequences apply when integrating software.
If you have a well-tested library, it makes sense to switch off 
postcondition checking for the library routines, but not their 
precondition checking.
This applies not just for run-time checking: Ideally, with compile-time 
inference, all postconditions can be inferred from the function's 
preconditions and their code. The results of that inference can easily 
be stored in the precompiled libraries.
Preconditions, on the other hand, can only be verified by checking all 
places where the functions are called.

 Wouldn't it be possible to do them at compile time? (Although
 this raises decidability issues.)
 Exactly, and that's why you'd either uses a restricted assertion
 language (and essentially get something that's somewhere between a type
 system and traditional assertion); or you'd use some inference system
 and try to help it along (not a simple thing either - the components of
 such a system exist, but I'm not aware of any system that was designed
 for the average programmer).
 
 As to the average programmer, I heard this recently on
 comp.databases.theory:
 
 Don't blame me for the fact that competent programming, as I view it
 as an intellectual possibility, will be too difficult for the
 average programmer  -you must not fall into the trap of rejecting
 a surgical technique because it is beyond the capabilities of the
 barber in his shop around the corner.   -- EWD512

Given the fact that we have far more need for competently-written 
programs than for competent surgery, I don't think that we'll be able to 
follow that idea.

 Mightn't it also be possible to
 leave it up to the programmer whether a given contract
 was compile-time or runtime?
 I'd agree with that, but I'm not sure how well that would hold up in
 practice.
 
 I want to try it and see what it's like.

So do I :-)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Joachim Durchholz
Darren New schrieb:
 As far as I understand it, Eiffel compilers don't even make use of 
 postconditions to optimize code or eliminate run-time checks (like null 
 pointer testing).

That's correct.

I think a large part of the reasons why this isn't done is that Eiffel's 
semantics is (a) too complicated (it's an imperative language after 
all), and (b) not formalized, which makes it difficult to assess what 
optimizations are safe and what aren't.

(Reason (c) is that Eiffel compiler vendors don't have the manpower for 
this kind of work, mostly in quantity but also, to some degree, in 
quality: people with a solid language semantics background tend to be 
repelled by the language inventor's know-it-all deny-the-problems 
don't-bother-me-with-formalisms attitude. He still has moved the state 
of the art ahead - mostly by pointing out a certain class of problems in 
OO designs and explaining them lucidly, and proposing solutions that 
hold up better than average even if still fundamentally flawed.)

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Marshall
Joachim Durchholz wrote:
 Darren New schrieb:
  As far as I understand it, Eiffel compilers don't even make use of
  postconditions to optimize code or eliminate run-time checks (like null
  pointer testing).

 That's correct.

 I think a large part of the reasons why this isn't done is that Eiffel's
 semantics is (a) too complicated (it's an imperative language after
 all), and (b) not formalized, which makes it difficult to assess what
 optimizations are safe and what aren't.

I can see the lack of a formal model being an issue, but is the
imperative bit really all that much of an obstacle? How hard
is it really to deal with assignment? Or does the issue have
more to do with pointers, aliasing, etc.?


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:

  I can certainly see how DbC would be useful without subtyping.
  But would there still be a reason to separate preconditions
  from postconditions? I've never been clear on the point
  of differentiating them (beyond the fact that one's covariant
  and the other is contravariant.)

 There is indeed.
 The rules about covariance and contravariance are just consequences of
 the notion of having a subtype (albeit important ones when it comes to
 designing concrete interfaces).

 For example, if a precondition fails, something is wrong about the
 things that the subroutine assumes about its environment, so it
 shouldn't have been called. This means the error is in the caller, not
 in the subroutine that carries the precondition.

 The less important consequence is that this should be reflected in the
 error messages.

 The more important consequences apply when integrating software.
 If you have a well-tested library, it makes sense to switch off
 postcondition checking for the library routines, but not their
 precondition checking.
 This applies not just for run-time checking: Ideally, with compile-time
 inference, all postconditions can be inferred from the function's
 preconditions and their code. The results of that inference can easily
 be stored in the precompiled libraries.
 Preconditions, on the other hand, can only be verified by checking all
 places where the functions are called.

Interesting!

So, what exactly separates a precondition from a postcondition
from an invariant? I have always imagined that one writes
assertions on parameters and return values, and those
assertions that only reference parameters were preconditions
and those which also referenced return values were postconditions.
Wouldn't that be easy enough to determine automatically?

And what's an invariant anyway?


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Darren New
Marshall wrote:
 So, what exactly separates a precondition from a postcondition
 from an invariant?

A precondition applies to a routine/method/function and states the 
conditions under which a function might be called. For example, a 
precondition on stack.pop might be not stack.empty, and 
socket.read might have a precondition of socket.is_open, and 
socket.open_a_new_connection might have a precondition of 
socket.is_closed.

A postcondition applies to a routine/method/function and states (at 
least part of) what that routine guarantees to be true when it returns, 
assuming it is called with true preconditions. So not stack.empty 
would be a postcondition of stack.push. If your postconditions are 
complete, they tell you what the routine does.

An invariant is something that applies to an entire object instance, any 
time after the constructor has completed and no routine within that 
instance is running (i.e., you're not in the middle of a call). There 
should probably be something in there about destructors, too. So an 
invariant might be stack.is_empty == (stack.size == 0) or perhaps 
socket.is_open implies (socket.buffer != null) or some such.

The first problem with many of these sorts of things are that in 
practice, there are lots of postconditions that are difficult to express 
in any machine-readable way. The postcondition of socket.read is that 
the buffer passed as the first argument has valid data in the first n 
bytes, where n is the return value from socket.read, unless 
socket.read returned -1.  What does valid mean there? It has to 
match the bytes sent by some other process, possibly on another machine, 
disregarding the bytes already read.

You can see how this can be hard to formalize in any particularly useful 
way, unless you wind up modelling the whole universe, which is what most 
of the really formalized network description techniques do.

Plus, of course, without having preconditions and postconditions for OS 
calls, it's difficult to do much formally with that sort of thing.

You can imagine all sorts of I/O that would be difficult to describe, 
even for things that are all in memory: what would be the postcondition 
for screen.draw_polygon? Any set of postconditions on that routine 
that don't mention that a polygon is now visible on the screen would be 
difficult to take seriously.

There are also problems with the complexity of things. Imagine a 
chess-playing game trying to describe the generate moves routine. 
Precondition: An input board with a valid configuration of chess pieces. 
Postcondition: An array of boards with possible next moves for the 
selected team.  Heck, if you could write those as assertions, you 
wouldn't need the code.

-- 
   Darren New / San Diego, CA, USA (PST)
 This octopus isn't tasty. Too many
 tentacles, not enough chops.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Joachim Durchholz
Marshall schrieb:
 I can see the lack of a formal model being an issue, but is the
 imperative bit really all that much of an obstacle? How hard
 is it really to deal with assignment? Or does the issue have
 more to do with pointers, aliasing, etc.?

Actually aliasing is *the* hard issue.
Just one data point: C compiler designers spend substantial effort to 
determine which data structures cannot have aliasing to be able to apply 
optimizations.

This can bite even with runtime assertion checking.
For example, ordinarily one would think that if there's only a fixed set 
of routines that may modify some data structure A, checking the 
invariants of that structure need only be done at the exit of these 
routines.
Now assume that A has a reference to structure B, and that the 
invariants involve B in some way. (E.g. A.count = A-B.count.)
There's still no problem with that - until you have a routine that 
exports a reference to B, which gets accessed from elsewhere, say via C.
Now if I do
   C-B.count = 27
I will most likely break A's invariant. If that assignment is in some 
code that's far away from the code for A, debugging can become 
exceedingly hard: the inconsistency (i.e. A violating its invariant) can 
live for the entire runtime of the program.

So that innocent optimization don't check all the program's invariants 
after every assignment immediately breaks with assignment (unless you 
do some aliasing analysis).

In an OO language, it's even more difficult to establish that some data 
structure cannot be aliased: even if the code for A doesn't hand out a 
reference to B, there's no guarantee that some subclass of A doesn't. 
I.e. the aliasing analysis has to be repeated whenever there's a new 
subclass, which may happen at link time when you'd ordinarily don't want 
to do any serious semantic analysis anymore.

There's another way around getting destroyed invariants reported at the 
time the breakage occurs: lock any data field that goes into an 
invariant (or a postcondition, too). The rationale behind this is that 
from the perspective of assertions, an alias walks, smells and quacks 
just like concurrent access, so locking would be the appropriate answer. 
The problem here is that this means that updates can induce *very* 
expensive machinery - imagine an invariant that says A-balance is the 
sum of all the transaction-amount fields in the transaction list of A, 
it would cause all these -amount fields to be locked as soon as a 
transaction list is hooked up with the amount. OTOH one could argue that 
such a locking simply makes cost in terms of debugging time visible as 
runtime overhead, which isn't entirely a Bad Thing.


There are all other kinds of things that break in the presence of 
aliasing; this is just the one that I happen to know best :-)

Without mutation, such breakage cannot occur. Invariants are just the 
common postconditions of all the routines that may construct a structure 
of a given type.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Joachim Durchholz
Marshall schrieb:
 So, what exactly separates a precondition from a postcondition
 from an invariant? I have always imagined that one writes
 assertions on parameters and return values, and those
 assertions that only reference parameters were preconditions
 and those which also referenced return values were postconditions.
 Wouldn't that be easy enough to determine automatically?

Sure.
Actually this can be done in an even simpler fashion; e.g. in Eiffel, it 
is (forgive me if I got the details wrong, it's been several years since 
I last coded in it):

   foo (params): result_type is
   require
 -- list of assertions; these become preconditions
   do
 -- subroutine body
   ensure
 -- list of assertions; these become postconditions
   end

 And what's an invariant anyway?

In general, it's a postcondition to all routines that create or modify a 
data structure of a given type.

Eiffel does an additional check at routine entry, but that's just a 
last-ditch line of defense against invariant destruction via aliases, 
not a conceptual thing. Keep aliases under control via modes (i.e. 
unaliasable marks on local data to prevent aliases from leaving the 
scope of the data structure), or via locking, or simply by disallowing 
destructive updates, and you don't need the checks at routine entry anymore.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Marshall
Joachim Durchholz wrote:
 Marshall schrieb:
  I can see the lack of a formal model being an issue, but is the
  imperative bit really all that much of an obstacle? How hard
  is it really to deal with assignment? Or does the issue have
  more to do with pointers, aliasing, etc.?

 Actually aliasing is *the* hard issue.

Okay, sure. Nice explanation.

But one minor point: you describe this as an issue with imperative
languages. But aliasing is a problem associated with pointers,
not with assignment. One can have assignment, or other forms
of destructive update, without pointers; they are not part of the
definition of imperative. (Likewise, one can have pointers without
assignment, although I'm less clear if the aliasing issue is as
severe.)


Marshall

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-12 Thread Joachim Durchholz
Marshall schrieb:
 Joachim Durchholz wrote:
 Marshall schrieb:
 I can see the lack of a formal model being an issue, but is the
 imperative bit really all that much of an obstacle? How hard
 is it really to deal with assignment? Or does the issue have
 more to do with pointers, aliasing, etc.?
 Actually aliasing is *the* hard issue.
 
 Okay, sure. Nice explanation.
 
 But one minor point: you describe this as an issue with imperative
 languages. But aliasing is a problem associated with pointers,
 not with assignment.

Aliasing is not a problem if the aliased data is immutable.

  One can have assignment, or other forms
 of destructive update, without pointers; they are not part of the
 definition of imperative.

Sure.
You can have either of destructive updates and pointers without 
incurring aliasing problems. As soon as they are combined, there's trouble.

Functional programming languages often drop assignment entirely. (This 
is less inefficient than one would think. If everything is immutable, 
you can freely share data structures and avoid some copying, and you can 
share across abstraction barriers. In programs with mutable values, 
programmers are forced to choose the lesser evil of either copying 
entire data structures or doing a cross-abstraction analysis of who 
updates what elements of what data structure. A concrete example: the 
first thing that Windows does when accepting userland data structures 
is... to copy them; this were unnecessary if the structures were immutable.)

Some functional languages restrict assignment so that there can exist at 
most a single reference to any mutable data structure. That way, there's 
still no aliasing problems, but you can still update in place where it's 
really, really necessary.

I know of no professional language that doesn't have references of some 
kind.

Regards,
Jo
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >