Re: What is a type error?
"Marshall" <[EMAIL PROTECTED]> wrote: > In general, I feel that "records" are not the right conceptual > level to think about. Unfortunately, they are the right level. Actually,the right level might even be lower, the fields within a record, but that's moving even farther away from the direction you wish to go. The right level is the lowest level at which the programmer can see and manipulate values. Thus, since a programmer can see and manipulate fields within records in SQL that is the level we need to concern ourselves with when talking about aliasing in SQL. In other words, since a SELECT (or UPDATE) statement can talk about specific fields of specific records within the table (not directly, but reliably, which I'll explain in a moment) then those are the "primitive objects" in SQL that can be aliased. What I meant by "not directly, but reliably": If you have a fixed database, and you do two selects which specify the same sets of fields to be selected and the same keys to select records with, one expects the two selects to return the same values. You do not necessarily expect the records to be returned in the same order, but you do expect the same set of records to be returned and the records to have the same values in the same fields. Further, if you do an update, you expect certain fields of certain records to change (and be reflected in subsequent selects). However, therein lies the rub, if you do a select on some records, and then an update that changes those records, the records you have from the select have either changed or show outdated values. If there is some way to refer to the records you first selected before the update, then you have an aliasing problem, but maybe one can't do that. One could also have an aliasing problem, if one were allowed to do two updates simultaneously, so that one update could changed records in the middle of the other changing the records. However, I don't know if that's allowed in the relational algebra either. (I think I finally see your point, and more on that later.) > I do not see that they have > any identity outside of their value. We can uniquely identify > any particular record via a key, but a table may have more > than one key, and an update may change the values of one > key but not another. So it is not possible in general to > definitely and uniquely assign a mapping from each record > of a table after an update to each record of the table before > the update, and if you can't do that, then where > is the record identity? I don't know if your point of having two keys within one table amkes a difference. If one can uniquely identify a record, then it has identity. The fact that there is another mapping where one may not be able to uniquely identify the record is not necessarily relevant. > But what you descrbe is certainly *not* possible in the > relational algebra; alas that SQL doesn't hew closer > to it. Would you agree? Would you also agree that > if a language *did* adhere to relation semantics, > then relation elements would *not* have identity? > (Under such a language, one could not have two > equal records, for example.) Ok, here I think I will explain my understanding of your point. If we treat each select statement and each update statements as a separate program, i.e. we can't save records from one select statement to a later one (and I think in the relational algebra one cannot), then each single (individual) select statement or update statement is a functional program. That is we can view a single select statement as doing a fold over some incoming data, but it cannot change the data. Similarly, we can view an update statement as creating a new copy of the table with mutations in it, but the update statement can only refer to the original immutable table that was input. (Well, that's atleast how I recall the relational algebra.) It is the second point, about the semantics of the update statement which is important. There are no aliasing issues because in the relational algebra one cannot refer to the mutated values in an update statement, every reference in an update statement is refering to the unmutated input (again, that's my recollection). (Note, if my recollection is off, and one can refer to the mutated records in an update statement, perhaps they can explain why aliasing is not an issue in it.) Now for some minor points: > For example, we might ask in C, if we update a[i], will > the value of b[j] change? And we can't say, because > we don't know if there is aliasing. But in Fortran, we > can say that it won't, because they are definitely > different variables. Unfortunately, your statement about FORTRAN is not true, if a and b are parameters to a subroutine (or members of a common block, or equivalenced, or ...), then an update of a[i] might change b[j]. This is in fact, an important point, it is in subroutines (and subroutine like things), where we have local names for things (bindings) that are actu
Re: What is a type error?
Joachim Durchholz wrote: > > You can have aliasing without pointers; e.g. arrays are fully sufficient. > If i = j, then a [i] and a [j] are aliases of the same object. "Marshall" <[EMAIL PROTECTED]> writes: > I am having a hard time with this very broad definition of aliasing. > Would we also say that a[1+1] and a[2] are aliases? It seems > to me, above, that we have only a, and with only one variable > there can be no aliasing. As you can see from the replies, all these things which you do not consider aliasing are considered aliasing (by most everyone else, in fact these are exactly what is meant by aliasing). There is good reason for this. Let us explain this by looking again at the SQL example that has been kicked around some. >>SELECT * FROM persons WHERE name = "John" >> and >>SELECT * FROM persons WHERE surname = "Doe" Some set of records are shared by both select clauses. Those records are the aliased records. If the set is empty there is no problem. If we do not update the records from one of the select clauses, we also have no problem. However, if we update the records for one of the select clauses and the set of aliased records is not empty, the records returned by the other select clause have changed. Therefore, everything we knew to be true about the "other" select clause may or may not still be true. It's the may not case we are worried about. For example, in the "imperative Mi-5" Q asks Moneypenny to run the first query (name = John) to get a list of agents for infiltrating Spectre. In another department, they're doing sexual reassignments and updating the database with the second query (surname = Doe) and making the name become Jane (along with other changes to make the transformation correct). So, when Q select "John Doe" from the first list and sends that agent on a mission Q is really sending "Jane Doe" and Spectre realizes that the agent is one and kills her. Not a happy result. In the "functional Mi-5", the sexual reassignment group, makes clones of the agents reassigned, so that there is now both a "John Doe" and a "Jane Doe". Of course, the payroll is a little more bloated, so that when Q says "John Doe" isn't needed for any further assignments, the "Garbage Collection" department does a reduction in force and gets rid of "John Doe". The good news is that they didn't send Jane Doe to her death. The problem with aliasing, we want the things we knew true in one part of our program, i.e. the Johns on Q's list are still Johns and not Janes, to be true after we've run another part of our program. If the second part of our program can change what was true after the first part of our program, we can't depend on the results from the first part of our program, i.e. Q can send Jane Doe into certain death. The problem is compounded by the complexities of our programs. When Q selected his list, he didn't know of the department doing the reassignments (and couldn't know because they were part of a top-secret project). So, since bureaucracies grow to meet the increasing needs of the bureaucracy, often the solution is increased complexity, more regulations and paperwork to fill out, semaphores and locks to hold on critical data, read-only data, status requests, etc. All to keep Q from killing Jane by accident. Sometimes they work. the reassignment department has to wait 30-days for the clearance to perform their operation in which time John Doe completes the infiltration of Spectre saves the world from destruction and is ready for his next assignment. The key point is that each record (and even each field in each record) if it can be known by two names, is an alias. It is not sufficient to talk about "whole" variables as not being aliased if there is some way to refer to some part of the variable and change that part of the variable. Thus, a[1+1] is an alias for a[2] if you can have one part of the code talking about one and another part of the code talking about the other. To put it one still final way, consider the following code; assert(sorted(array a)); a[1] = 2; assert(sorted(array a)); // is this still true? Because a[1] is an alias of array a, you cannot be certain that the 2nd assert will not fire (fail) without additional analysis. assert(sorted(array a)); assert(a[0] <= 2); assert(2 <= a[2]); a[1] = 2; assert(sorted(array a)); // this is still true! -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
"Greg Buchholz" <[EMAIL PROTECTED]> writes: > Chris F Clark wrote: > > Thus, as we traverse a list, the first element might be an integer, > > the second a floating point value, the third a sub-list, the fourth > > and fifth, two more integers, and so on. If you look statically at > > the head of the list, we have a very wide union of types going by. > > However, perhaps there is a mapping going on that can be discerned. > > For example, perhaps the list has 3 distinct types of elements > > (integers, floating points, and sub-lists) and it circles through the > > types in the order, first having one of each type, then two of each > > type, then four of each type, then eight, and so on. > > Sounds like an interesting problem. Although not the exact type > specified above, here's something pretty similar that I could think of > implementing in Haskell. (I don't know what a "sub-list" is, for > example). Maybe some Haskell wizard could get rid of the tuples? > > > data Clark a b c = Nil | Cons a (Clark b c (a,a)) deriving Show > > clark = (Cons 42 (Cons 3.14 (Cons "abc" > (Cons (1,2) (Cons (1.2,3.4) (Cons ("foo","bar") > (Cons ((9,8),(7,6)) (Cons ((0.1,0.2),(0.3,0.4)) Nil > > main = print clark Very impressive. It looks right to me and simple enough to understand. I must find the time to learn a modern FP language. Can you write a fold for this that prints the data as a binary tree of triples? I have to believe it isn't that hard -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
I wrote: > The important thing is the dynamicism of lisp allowed one to write > polymorphic programs, before most of us knew the term. Chris Smith <[EMAIL PROTECTED]> writes: > Sure. In exchange for giving up the proofs of the type checker, you > could write all kinds of programs. To this day, we continue to write > programs in languages with dynamic checking features because we don't > have powerful enough type systems to express the corresponding type > system. And to me the question is what kinds of types apply to these dynamic programs, where in fact you may have to solve the halting problem to know exactly when some statement is executed. I expect that some programs have type signatures that exceed the expressibility of any static system (that is Turing complete). Therefore, we need something that "computes" the appropriate type at run-time, because we need full Turing power to compute it. To me such a system is a "dynamic type system". I think dynamic tags are a good approximation, because they only compute what type the expression has this time. > I believe that, in fact, it would be trivial to imagine a type system > which is capable of expressing that type. Okay, not trivial, since it > appears to be necessary to conceive of an infinite family of integer > types with only one value each, along with range types, and > relationships between them; but it's probably not completely beyond the > ability of a very bright 12-year-old who has someone to describe the > problem thoroughly and help her with the notation. Well, it look like you are right in that I see following is a Haskell program that looks essentially correct. I wanted something that was simple enough that one could see that it could be computed, but which was complex enough that it had to be computed (and couldn't be statically expressed with a system that did no "type computations"). Perhaps, there is no such beast. Or, perhaps I just can't formulate it. Or, perhaps we have static type checkers which can do computations of unbounded complexity. However, I thought that one of the characteristics of type systems was that they did not allow unbounded complexity and weren't Turing Complete. > You would, of course, need a suitable type system first. For example, > it appears to me that there is simply no possible way of expressing what > you've described in Java, even with the new generics features. Perhaps > it's possible in ML or Haskell (I don't know). My point is that if you > were allowed to design a type system to meet your needs, I bet you could > do it. Or, I could do as I think the dynamic programmers do, dispense with trying to formulate a sufficiently general type system and just check the tags at the appropriate points. > Sure. The important question, then, is whether there exists any program > bug that can't be formulated as a type error. If you allow Turing Complete type systems, then I would say no--every bug can be reforumlated as a type error. If you require your type system to be less powerful, then some bugs must escape it. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
I wrote: > These informal systems, which may not prove what they claim to prove > are my concept of a "type system". Chris Smith <[EMAIL PROTECTED]> replied: > Okay, that works. I'm not sure where it gets us, though Ok, we'll get there. First, we need to step back in time, to when there was roughly algol, cobol, fortran, and lisp. Back then, at least as I understood things, there were only a few types that generally people understood integer, real, and (perhaps) pointer. Now, with algol or fortran things were generally only of the first two types and variables were statically declared to be one or the other. With lisp any cell could hold any one of the 3, and some clever implementor added "tag bits" to distinguish which the cell held. As I understood it, the tag bits weren't originally for type correctness, so much as they facilitated garbage collection. The garbage collector didn't know what type of data you put in a cell and had to determine which cells contained pointers, so that the mark-and-sweep algorithms could sweep by following the pointers (and not following the integers that looked like pointers). Still, these tag bits did preserve the "dynamic" type, in the sense that we knew types from the other languages. As I remember it, sophistication with type really didn't occur as a phenomena for the general programmer until the introduction of Pascal (and to a lesser extent PL/I). Moreover, as I recall it, perhaps because I didn't learn the appropriate dialect was that lisp dialects kept with the more general notion of type (lists and tuples) and eschewed low-level bit-fiddling where we might want to talk about a 4 bit integer representing 0 .. 15 or -8 .. 7. The important thing is the dynamicism of lisp allowed one to write polymorphic programs, before most of us knew the term. However, we still had a notion of type: integers and floating point numbers were still different and one could not portably use integer operations on floating pointer values or vice versa. However, one could check the tag and "do the right thing" and many lisp primitives did just that, making them polymorphic. The way most of us conceived (or were taught to conceive) of the situation was that there still were types, they were just associated with values and the type laws we all knew and loved still worked, they just worked dynamically upon the types of the operand values that were presented at the time. Can this be made rigorous? Perhaps. Instead of viewing the text of the program staticly, let us view it dynamicly, that is by introducing a time (or execution) dimension. This is easier if you take an imperative view of the dynamic program and imagine things having an operational semantics, which is why we stepped back in time in the first place, so that we are in a world where imperative programming is the default model for most programmers. Thus, as we traverse a list, the first element might be an integer, the second a floating point value, the third a sub-list, the fourth and fifth, two more integers, and so on. If you look statically at the head of the list, we have a very wide union of types going by. However, perhaps there is a mapping going on that can be discerned. For example, perhaps the list has 3 distinct types of elements (integers, floating points, and sub-lists) and it circles through the types in the order, first having one of each type, then two of each type, then four of each type, then eight, and so on. The world is now patterned. However, I don't know how to write a static type annotation that describes exactly that type. That may just be my lack of experience in writing annotations. However, it's well within my grasp to imagine the appropriate type structure, even though **I** can't describe it formally. More importantly, I can easily write something which checks the tags and sees whether the data corresponds to my model. And, this brings us to the point, if the data that my program encounters doesn't match the model I have formulated, then something is of the wrong "type". Equally importantly, the dynamic tags, have helped me discover that type error. Now, the example may have seemed arbitrary to you, and it was in some sense arbitrary. However, if one imagines a complete binary tree with three kinds elements stored in memory as rows per depth, one can get exactly the structure I described. And, if one were rewriting that unusual representation to a more normal one, one might want exactly the "type check" I proposed to validate that the input binary tree was actually well formed. Does this help explain dynamic typing as a form of typing? -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris F Clark <[EMAIL PROTECTED]> (I) wrote: > Do you reject that there could be something more general than what a > type theorist discusses? Or do you reject calling such things a type? Chris Smith <[EMAIL PROTECTED]> wrote: > I think that the correspondence partly in the wrong direction to > describe it that way. ... > What I can't agree to is that what you propose is actually more general. > It is more general in some ways, and more limited in others. As such, > the best you can say is that is analogous to formal types in some ways, > but certainly not that it's a generalization of them. Yes, I see your point. Me: > I'm particularly interested if something unsound (and perhaps > ambiguous) could be called a type system. Chris Smith: > Yes, although this is sort of a pedantic question in the first place, I > believe that an unsound type system would still generally be called a > type system in formal type theory. However, I have to qualify that with > a clarification of what is meant by unsound. We mean that it DOES > assign types to expressions, but that those types are either not > sufficient to prove what we want about program behavior, or they are not > consistent so that a program that was well typed may reduce to poorly > typed program during its execution. Obviously, such type systems don't > prove anything at all, but they probably still count as type systems. Yes, and you see mine. These informal systems, which may not prove what they claim to prove are my concept of a "type system". That's because at some point, there is a person reasoning informally about the program. (There may also people reasoning formally about the program, but I am going to neglect them.) It is to the people who are reasoning informally about the program we wish to communicate that there is a type error. That is, we want someone who is dealing with the program informally to realize that there is an error, and that this error is somehow related to some input not being kept within proper bounds (inside the domain set) and that as a result the program will compute an unexpected and incorrect result. I stressed the informal aspect, because the intended client of the program is almost universally *NOT* someone who is thinking rigorously about some mathematical model, but is dealing with some "real world" phenomena which they wish to model and may not have a completely formed concrete representation of. Even the author of the program is not likely to have a completely formed rigorous model in mind, only some approxiamtion to such a model. I also stress the informality, because beyond a certain nearly trivial level of complexity, people are not capable of dealing with truly formal systems. As a compiler person, I often have to read sections of language standards to try to discern what the standard is requiring the compiler to do. Many language standards attempt to describe some formal model. However, most are so imprecise and ambiguous when they attempt to do so that the result is only slightly better than not trying at all. Only when one understands the standard in the context of typical practice can one begin to fathom what the standard is trying to say and why they say it the way they do. A typical example of this is the grammars for most programming languages, they are expressed in a BNF variant. However, most of the BNF descriptions omit important details and gloss over certain inconsistencies. Moreover, most BNF descriptions are totally unsuitable to a formal translation (e.g. an LL or LR parser generator, or even an Earley or CYK one). Yet, when considered in context of typical implementations, the errors in the formal specification can easily be overlooked and resolved to produce what the standard authors intended. For example, a few years ago I wrote a Verilog parser by transliterating the grammar in the standard (IEEE 1364-1995) into the notation used by my parser generator, Yacc++. It took a couple of days to do so. The resulting parser essentially worked on the first try. However, the reason it did so, is that I had a lot of outside knowledge when I was making the transliteration. There were places where I knew that this part of the grammar was lexical for lexing and this part was for parsing and cut the grammar correctly into two parts, despite there being nothing in the standard which described that dichotomy. There were other parts, like the embedded preprocessor, that I knew were borrowed from C, so that I could use the reoultions of issues the way a C compiler would do so, to guide resolving the ambiguities in the Verilog standard. There were other parts, I could tell the standard was simply written wrong, that is where the grammar did not specify the correct BNF, because the BNF they had written did not specifiy something which would make se
Re: What is Expressiveness in a Computer Language
Chris F Clark (I) wrote: > I'm particularly interested if something unsound (and perhaps > ambiguous) could be called a type system. I definitely consider such > things type systems. "Marshall" <[EMAIL PROTECTED]> wrote: > I don't understand. You are saying you prefer to investigate the > unsound over the sound? ... > Again, I cannot understand this. In a technical realm, vagueness > is the opposite of understanding. At the risk of injecting too much irrelevant philosophy into the discussion, I will with great trepdiation reply. First in the abstrtact: No, understanding is approximating. The world is inherently vague. We make false symbolic models of the world which are consistent, but at some level they do not reflect reality, because reality isn't consistent. Only by abtracting away the inherent infinite amout of subtlety present in the real universe can we come to comprehensible models. Those models can be consistent, but they are not the universe. The models in their consistency, prove things which are not true about the real universe. Now in the concrete: In my work productivity is ultimately important. Therefore, we live by the 80-20 rule in numerous variations. One of ths things we do to achieve productivity is simplify things. In fact, we are more interested in an unsound simplification that is right 80% of the time, but takes only 20% of the effort to compute, than a completely sound (100% right) model which takes 100% of the time to compute (which is 5 times as long). We are playing the probabilities. It's not that we don't respect the sound underpining, the model which is consistent and establishes certain truths. However, what we want is the rule of thumb which is far simpler and approximates the sound model with reasonable accuracy. In particular, we accept two types of unsoundness in the model. One, we accept a model which gives wrong answers which are detectable. We code tests to catch those cases, and use a different model to get the right answer. Two, we accept a model which gets the right answer only when over-provisioned. for example, if we know a loop will occassionaly not converge in the n-steps that it theoretically should, we will run the loop for n+m steps until the approximation also converges--even if that requires allocating m extra copies of some small element than are theoretically necessary. A small waste of a small resource, is ok if it saves a waste of a more critical resource--the most critical resource of all being our project timeliness. We do the same things in our abstract reasoning (including our type model). The problems we are solving are too large for us to understand. Therefore, we make simplifications, simplifications we know are inaccurate and potentially unsound. Our algorithm then solves the simplified model, which we think covers the main portion of the problem. It also attempts to detect when it has encountered a problem that is outside of the simplified region, on the edge so to speak. Now, in general, we can figure out how to handle certain edge cases, and we do it. However, some times the edge of one part of the algorithm intereacts with an edge of another part of the algorithm and we get a "corner case". Because our model is too simple and the reasoning it allows is thus unsound, the algorithm will occassionally compute the wrong results for some of those corners. We use the same techniques for dealing with them. Dynamic tagging is one way of catching when we have gotten the wrong answer in our type simplification. Marshall's last point: > I flipped a coin to see who would win the election; it came > up "Bush". Therefore I *knew* who was going to win the > election before it happened. See the probem? Flipping one coin to determine an election is not playing the probabilities correctly. You need a plausible explanation for why the coin should predict the right answer and a track record of it being correct. If you had a coin that had correctly predicted the previous 42 presidencies and you had an explanation why the coin was right, then it would be credible and I would be willing to wager that it could also predict that the coin could tell us who the 44th president would be. One flip and no explanation is not sufficient. (And to the abstract point, to me that is all knowledge is, some convincing amount of examples and a plausible explanation--anyone who thinks they have more is appealing to a "knowledge" of the universe that I don't accept.) I do not want incredible unsound reasoning, just reasoning that is marginal, within a margin of safety that I can control. Everyone that I have know of has as far as I can tell made simplifying assumptions to make the world tractable. Very rarely do we deal with the world completely formally. Look at where that got Russell and Whitehead. I'm just trying to be
Re: What is Expressiveness in a Computer Language
Chris Smith <[EMAIL PROTECTED]> writes: > Unfortunately, I have to again reject this idea. There is no such > restriction on type theory. Rather, the word type is defined by type > theorists to mean the things that they talk about. Do you reject that there could be something more general than what a type theorist discusses? Or do you reject calling such things a type? Let you write: > because we could say that anything that checks types is a type system, > and then worry about verifying that it's a sound type system without > worrying about whether it's a subset of the perfect type system. I'm particularly interested if something unsound (and perhaps ambiguous) could be called a type system. I definitely consider such things type systems. However, I like my definitions very general and vague. Your writing suggests the opposite preference. To me if something works in an analogous way to how a known type system, I tend to consider it a "type system". That probably isn't going to be at all satisfactory to someone wanting a more rigorous definition. Of course, to my mind, the rigorous definitions are just an attempt to capture something that is already known informally and put it on a more rational foundation. > So what is the domain of a function? (Heck, what is a function? ... > (I need a word here that implies something like a set, but I don't care > to verify the axioms of set theory. I'm just going to use set. Hope > that's okay.) Yes, I meant the typical HS algebra definition of domain, which is a set, same thing for function. More rigorous definitions can be sustituted as necessary and appropriate. > 1. The domain is the set of inputs to that expression which are going to > produce a correct result. > > 2. The domain is the set of inputs that I expected this expression to > work with when I wrote it. > > 3. The domain is the set of inputs for which the expression has a > defined result within the language semantics. > > So the problem, then, more clearly stated, is that we need something > stronger than #3 and weaker than #1, but #2 includes some psychological > nature that is problematic to me (though, admittedly, FAR less > problematic than the broader uses of psychology to date.) Actually, I like 2 quite well. There is some set in my mind when I'm writing a particular expression. It is likely an ill-defined set, but I don't notice that. That set is exactly the "type". I approximate that set and it's relationships to other sets in my program with the typing machinery of the language I am using. That is the static (and well-founded) type. It is however, only an approximation to the ideal "real" type. If I could make that imaginary mental set concrete (and well-defined), that would be exactly my concept of type. Now, that overplays the conceptual power in my mind. My mental image is influenced by my knowledge of the semantics of the language and also any implementations I may have used. Thus, I will weaken 2 to be closer to 3, because I don't expect a perfectly ideal type system, just an approximation. To the extent that I am aware of gotchas in the language or a relavent implementation, I amy write extra code to try and limit things to model 1. Note that in my fallible mind, 1 and 2 are identical. In my hubris, I expect that the extra checks I have made have restricted my inputs to where model 3 and 1 coincide. Expanding that a little. I expect the language to catch type errors where I violate model 3. To the extent model 3 differs from model 1 and my code hasn't added extra checks to catch it, I have bugs resulting from undetected type errors or perhaps modelling errors. To me they are type errors, because I expect that there is a typing system that captures 1 for the program I am writing, even though I know that that is not generally true. As I reflect on this more, I really do like 2 in the sense that I believe there is some abstract Platonic set that is an ideal type (like 1) and that is what the type system is approximating. to the sense that languages approximate either 1 or 2, those facilites are type facilities of a language. That puts me very close to your (rejected) definition of type as the well-defined semantics of the language. Except I think of types as the sets of values that the language provides, so it isn't the entire semantics of the language, just that part which divides values into discrete sets which are approriate to different operations (and detect inaapropriate values). -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris Smith <[EMAIL PROTECTED]> writes: > I thought about this in the context of reading Anton's latest post to > me, but I'm just throwing out an idea. I wrote: > I think there is some sense of convergence here. Apologies for following-up to my own post, but I forgot to describe the convergence. The convergence is there is something more general that what type theorists talk about when discussing types. Type theory is an abstraction and systemization of reasoning about types. This more general reasoning doesn't need to be sound, it just needs to be based aound "types" and computations. My own reasoning uses such an unsound system. It has clear roots in sound reasoning, and I wish I could make it sound, but In any case, dynamic tagging helps support my unsound reasoning by providing evidence when I have made an error. I hope this makes it less mystical for Marshall. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris Smith <[EMAIL PROTECTED]> writes: > I thought about this in the context of reading Anton's latest post to > me, but I'm just throwing out an idea. I think there is some sense of convergence here. In particular, I reason about my program using "unsound types". That is, I reason about my program using types which I can (at least partially) formalize, but for which there is no sound axiomatic system. Now, these types are locally sound, i.e. I can prove properties that hold for regions of my programs. However, the complexity of the programs prevent me from formally proving the entire program is correct (or even entirely type-safe)--I generally work in a staticly, but weakly-typed language. Sometimes, for performance reasons, the rules need to be bent--and the goal is to bend the rules, but not break them. At other times, I want to strenghten the rules, make parts of the program more provably correct. How does this work in practice? Well, there is a set of ideal types, I would like to use in my program. Those types would prove that my program is correct. However, because there are parts of the program that for perfomnace reasons need me to widen the ideal types to something less tight, pragmatic types. The system can validate that I have not broken the pragmatic types. That is not tight enough to prove the program correct, but it provides some level of security. To improve my confidence in the program, I borrow the tagging concept to supplement the static type system. At key points in the program I can check the type tag to ensure that the data being manipulated actual matches the unprovable ideal type system. If it doesn't, I have a witness to the error in my reasoning. It is not perfect, but it is better than nothing. Why am I in this state, because I believe that most interesting programs are beyond my ability to formally prove them, or at least prove them in "reasonable" time bounds. Therefore, I prove (universally quantify) some properties of my programs, but many other (and most of the "interesting") properties, I can only existentially quantify--it worked in the cases that it was tested on. It is a corrolary of this, that most programs have bugs--places where the formal reasoning was incomplete (did not reason down to relevant interesting property) and the informal reasoning was wrong. Thus, the two forms of reasoning, static typing to check the formal properties that it can, and dynamic tagging to catch the errors that it missed work like a belt and suspenders system. I would like the static type system to catch more errors, but I need it to express more to do so, and many of the properties that are interesting require n-squared operations to validate. Equally important, the dynamic tags help when working in an essentially untrustworthy world. Not only do our own programs have bugs, so do the systems we run them on, both hardware and software. Moreover, alpha radiation and quantum effects can change "constants" in a program (some of my work has to do with soft-error rates on chips where we are trying to guarantee that the chip will have a reasonable MBTF for certain algorithms). That's not to ignore the effects of malware and other ways your program can be made less reliable. Yes, I want a well-proven program, but I also want some guarantee that the program is actually executing the algorithm specified. That requires a certain amount of "redundant" run-time checks. Knuth had a quote about a program only being proven and not tested that reflects the appropriate cynicism. I consider any case where a program gives a function outside of its domain a "type error", because an ideal (but unacheivable) type system would have prevented it. That does not make all errors, type errors, because if I give a program an input within its domain and it mis-computes the result, that is not a type error. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Pascal Costanza wrote: > Consider a simple expression like 'a + b': In a dynamically typed > language, all I need to have in mind is that the program will attempt to > add two numbers. In a statically typed language, I additionally need to > know that there must a guarantee that a and b will always hold numbers. "Marshall" <[EMAIL PROTECTED]> replied: > I would not expect that the dynamic programmer will be > thinking that this code will have two numbers most of the > time but sometimes not, and fail. I would expect that in both > static and dynamic, the thought is that that code is adding > two numbers, with the difference being the static context > gives one a proof that this is so. I don't think the the problem is down at the atom level. It's really at the structure (tuple) level. If we view 'a + b' as applying the function + to some non-atomic objects a and b. The question is how well do we specify what a and b are. From my limited exposure to programs that use dynamic typing, I would surmise that developers who use dynamic typing have very broad and open definitions of the "types" of a and b, and they don't want the type system complaining that they have nailed the definitions down quite yet. And, if you read some of the arguments in this topic, one assumes that they want to even be able to correct the "types" of a and b at run-time, when they find out that they have made a mistake without taking the system down. So, for a concrete example, let b be a "file" type (either text file or directory) and we get a whole system up and running with those two types. But, we discover a need for "symbolic links". So, we want to change the file type to have 3 sub-types. In a dynamic type system, because the file "type" is loose (it's operationally defined by what function one applies to b), this isn't an issue. As each function is recoded to deal with the 3rd sub-type, the program becomes more functional. If we need to demo the program before we have worked out how symbolic links work for some operation, it is not a problem. Run the application, but don't exercise that combination of type and operation. In many ways, this can also be done in languages with type inference. I don't understand the process by which one does it, so I can't explain it. Perhaps someone else will--please Back to the example, the file type is not an atomic type, but generally some structure/tuple/list/record/class with members or fields. A function which renames files by looking at only the name field may work perfectly adequately with the new symbolic link subtype without change because the new subtype has the same name field and uses it the same way. A spell-check function which works on text files using the "contents" field might need to be recoded for symbolic links if the contents field for that subtype is "different" (e.g. the name of the target). The naive spell check function might appear to work, but really do the wrong thing, (i.e. checking if the target file name is made of legal words). Thus, the type problems are not at the atomic level (where contents is a string), but at the structure level, where one needs to know which fields have which meanings for which subtypes. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris F Clark schrieb: > In that sense, a static type system is eliminating tags, because the > information is pre-computed and not explicitly stored as a part of the > computation. Now, you may not view the tag as being there, but in my > mind if there exists a way of perfoming the computation that requires > tags, the tag was there and that tag has been eliminated. Joachim Durchholz replied: > On a semantic level, the tag is always there - it's the type (and > definitely part of an axiomatic definition of the language). > Tag elimination is "just" an optimization. I agree the tag is always there in the abstract. However, for the work I do the optimization of the tag at runtime is important, and we specifically change things into types when we know the system can do that optimization, because then we are getting the system to do the work we would have to do and validating that the job is done correctly. So, I care that the tag is eliminated in practice (and remains in theory--I have to have both). In the end, I'm trying to fit things which are "too big" and "too slow" into much less space and time, using types to help me reliably make my program smaller and faster is just one trick. It's a really great and non-obvious one though, and one I'm glad I learned. Any algebra I can learn that helps me solve my problems better is appreciated. However, I also know that my way of thinking about it is fringe. Most people don't think that the purpose of types is to help one write reliably tighter code. Still, knowing about dynmic typing (tagging) and static typing, helped me understand this trick. Thus, conflating the two meanings may at some level be confusing. However, for me, they aided understanding something that I needed to learn. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris F Clark wrote: > A static > type system eliminates some set of tags on values by syntactic > analysis of annotations (types) written with or as part of the program > and detects some of the disallowed compuatations (staticly) at compile > time. Adreas relied: > Explicit annotations are not a necessary ingredient of a type system, > nor is "eliminating tags" very relevant to its function. While this is true, I disagree at some level with the second part. By eliminating tags, I mean allowing one to perform "type safe" computations without requiring the values to be tagged. One could argue that the tags were never there. However, many of the interesting polymorphic computations reaquire either that the values be tagged or that some other process assures that at each point one can determine apriori what the correct variant of computation is which applies. To me a static type system is one which does this apriori determination. A dynamic type system does not do a apriori and instead includes explicit information in the values being computed to select the corret variant computations. In that sense, a static type system is eliminating tags, because the information is pre-computed and not explicitly stored as a part of the computation. Now, you may not view the tag as being there, but in my mind if there exists a way of perfoming the computation that requires tags, the tag was there and that tag has been eliminated. To put it another way, I consider the tags to be axiomatic. Most computations involve some decision logic that is driven by distinct values that have previously been computed. The separation of the values which drive the compuation one-way versus another is a tag. That tag can potentially be eliminated by some apriori computation. In what I do, it is very valuable to move information from being explicitly represented in the computed result into the tag, so that I often have distinct "types" (i.e. tags) for an empty list, a list with one element, a list with two elements, and a longer list. In that sense, I agree with Chris Smith's assertion that "static typing" is about asserting general properties of the algorithm/data. These assertions are important to the way I am manipulating the data. They are part of my type model, but they may not be part of anyone else's, and to me toe pass a empty list to a function requiring a list with two elements is a "type error", because it is something I expect the type system to detect as incorrect. The fact that no one else may have that expectation does not seem relevant to me. Now, to carry this farther, since I expect the type system to validate that certain values are of certain types and only be used in certain contexts, I am happy when it does not require certain "tags" to be actualy present in the data. However, because other bits of code are polymorphic, I do expect certain values to require tags. In the end, this is still a win for me. I had certain data elements that in the abstract had to be represented explicitly. I have encoded that information into the type system and in some cases the type system is not using any bits in the computed representation to hold that information. Whenever that happens, I win and that solves one of the problems that I need solved. Thus, a type system is a way for me to express certain axioms about my algorithm. A dynamic type system encodes those facts as part of the computation. A static type system pre-calculates certain "theorems" from my axioms and uses those theorems to allow my algorithm to be computed without all the facts being stored as part of the computation. Hopefully, this makes my point of view clear. -Chris * Chris ClarkInternet : [EMAIL PROTECTED] Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax: (978) 838-0263 (24 hours) -- -- http://mail.python.org/mailman/listinfo/python-list
Re: What is Expressiveness in a Computer Language
Chris Smith wrote: > Marshall <[EMAIL PROTECTED]> wrote: > > While I am quite sympathetic to this point, I have to say that > > this horse left the barn quite some time ago. > > I don't think so. Perhaps it's futile to go scouring the world for uses > of the phrase "dynamic type" and eliminating them. It's not useless to > point out when the term is used in a particularly confusing way, though, > as when it's implied that there is some class of "type errors" that is > strictly a subset of the class of "errors". Terminology is often > confused for historical reasons, but incorrect statements eventually get > corrected. Ok, so you (Chris Smith) object to the term "dynamic type". From a historical perspective it makes sense, perticular in the sense of tags. A dynamic type system involves associating tags with values and expresses variant operations in terms of those tags, with certain disallowed combinations checked (dynamicly) at run-time. A static type system eliminates some set of tags on values by syntactic analysis of annotations (types) written with or as part of the program and detects some of the disallowed compuatations (staticly) at compile time. Type errors are the errors caught by ones personal sense of which annotations are expressible, computable, and useful. Thus, each person has a distinct sense of which errors can (and/or should) be staticly caught. A personal example, I read news using Emacs, which as most readers in these newsgroups will know is a dialect of lisp that includes primitives to edit files. Most of emacs is quite robust, perhaps due to it being lisp. However, some commands (reading news being one of them) are exceptionally fragile and fail in the most undebuggable ways, often just complaining about "nil being an invalid argument" (or "stack overflow in regular expression matcher".) This is particularly true, when I am composing lisp code. If I write some lisp code and make a typographical error, the code may appear to run on some sample case and fail due to a path not explored in my sample case when applied to real data. I consider such an error to be a "type error" because I believe if I used a languages that required more type annotations, the compiler would have caught my typographical error (and no I'm not making a pun on type and typographical). Because I do a lot of this--it is easy enough for me to conjure up a small lisp macro to make some edit that it is a primary tool in my toolkit, I wish I could make my doing so more robust. It would not bother me to have to type more to do so. I simply make too many stupid errors to use emacs lisp as effectively as I would like. Now, this has nothing to do with real lisp programming, and so I do not wish to offend those who do that. However, I personally would like a staticly typed way of writing macros (and more importantly of annotating some of the existing emacs code) to remove some of the fragilities that I experience. I'm not taking avantage of the exploratory nature of lisp, except in the sense that the exploratory nature has created lots of packages which mostly work most of the time. Now, I will return this group to its much more erudite discussion of the issue. Thank you, -Chris * Chris ClarkInternet : [EMAIL PROTECTED] Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax: (978) 838-0263 (24 hours) -- -- http://mail.python.org/mailman/listinfo/python-list
Re: A critic of Guido's blog on Python's lambda
Kenny replied to me saying: > Yep. But with Cells the dependency graph is just a shifting record of > who asked who, shifting because all of a sudden some outlier data will > enter the system and a rule will branch to code for the first time, > and suddenly "depend on" on some new other cell (new as in never > before used by this cell). This is not subject to static analysis > because, in fact, lexically everyone can get to everything else, what > with closures, first-class functions, runtime branching we cannot > predict... fuggedaboutit. > > So we cannot say, OK, here is "the graph" of our application > model. All we can do is let her rip and cross our fingers. :) Yes, if you have Turing completeness in your dependency graph, the problem is unsolvable. However, it's like the static v. dynamic typing debate, you can pick how much you want to allow your graph to be dynamic versus how much "safety" you want. In particular, I suspect that in many applications, one can compute the set of potentially problematic dependencies (and that set will be empty). It's just a matter of structuring and annotating them correctly. Just like one can create type systems that work for ML and Haskell. Of course, if you treat your cell references like C pointers, then you get what you deserve. Note that you can even run the analysis dynamically, recomputing whether the graph is cycle free as each dependency changes. Most updates have local effect. Moreover, if you have used topological sort to compute an ordering as well as proving cycle-free-ness, the edge is only pontentially problemantic when it goes from a later vertex in the order to an earlier one. I wouldn't be surprised to find efficient algorithms for calculating and updating a topological sort already in the literature. It is worth noting that in typical chip circuitry there are constructions, generally called "busses" where the flow of information is sometimes "in" via an edge and sometimes "out" via the same edge and we can model them in a cycle-free manner. If you want to throw up your hands and say the problem is intractable in general, you can. However, in my opinion one doesn't have to give up quite that easily. -Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: A critic of Guido's blog on Python's lambda
David C Ullrich asked: > Q: How do we ensure there are no loops in the dependencies? > > Do we actually run the whole graph through some algorithm > to verify there are no loops? The question you are asking is the dependency graph a "directed acyclic graph" (commonly called a DAG)? One algorithm to determine if it is, is called "topological sort". That algorithm tells you where there are cyclces in your graph, if any, and also tells you the order of the dependencies, i.e. if x is updated, what you have to update downstream AND THE ORDER you have to perform the downstream computations in. We use this algorithm for solving just the kind of dataflow problems you are talking about in our circuit design tools. Circuit designs have one-way dependencies that we want to sort and resolve--similarly, we don't want cycles in our circuits except ones that pass through clocked flip-flops. We solve such problems on circuits with millions of gates, i.e. enough gates to represent the CPU of your computer or a disk controller chip or a router. I believe there are also algorithms that allow you to construct only acyclic (the technical term for non-looping) graphs and don't require you to enter the vertexes (verticies if you prefer) of the graph in any specific order, and in the worst case you can always run the topological sort on any graph and determine if the graph is cyclic. The area is well-studied and you can find a variety of algorithms that solve most interesting graph problems as they all occur over-and-over in numerous diverse fields. Hope this helps, -Chris * Chris ClarkInternet : [EMAIL PROTECTED] Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax: (978) 838-0263 (24 hours) -- -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
Yes, there is literature on the generating side of the regular expression/FSM model. In fact, the matching problem and the generating problems are exactly equivalent. A slight variation of the definition of how a matcher works, turns it into a generator and vice versa. To directly generate (rather than match) from an FSM, one simply does a walk (e.g. depth first search or breath first search) over the machine writing out (rather than matching) the symbols used as labels. Thus, all the theorems about matching apply to generating and also in reverse. You can see this to some extent form the problem posed. If one generates Sigma* and subtracts out the elements from some regular language (say by matching them), that is exactly equivalent (in strings generated) to generating the complement of the regular language. In fact, it is quite easy (with the correct regular expression tool, i.e. one that handles regular expression differences) to take the problems posed and generate provably minimal (i.e. provably maximally efficient) generation (or matching) programs as FSMs. The provably minimal FSM won't go down any paths that don't have some sequence that generates an "accept" value. It is worth noting the regular languages are closed under the difference operator, so that resulting language from substracting one regular expression from another is still a regular language, which can be used to prove that the machine is minimal. Therefore, while the output can be exponentially larger than the input, one should expect that implementations should be able to generate the output in linear time (relative to the size of the output), since FSMs run in linear time relative to the string they are processing (whether generating or matching). Under a reasonable model of computation, one cannot do better than linear in the size of string processed. I'm sure if you asked under comp.theory, you would get tons of other relevant facts from someone who understands automata theory better than I. Note, if one does ask there, one should correct the notation. The "*" symbol was used as in globbing, not as commonly used in regular expressions. The notation ".*" (as someone corrected in one of their replies) is the normal notation for what the original poster wanted by "*". Hope this helps, -Chris * Chris ClarkInternet : [EMAIL PROTECTED] Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax: (978) 838-0263 (24 hours) -- -- http://mail.python.org/mailman/listinfo/python-list