Re: Termination and type systems
David Hopwood [EMAIL PROTECTED] wrote: Dirk Thierbach wrote: That's interesting, but linear typing imposes some quite severe restrictions on programming style. From the example of 'h' on page 2, it's clear that the reason for the linearity restriction is just to ensure polynomial-time termination, and is not needed if we only want to prove termination. Yes. It's just an example of what can be actually done with a typesystem. It's already hard enough to guarantee termination with the extra information present in the type annotation. If this information is not present, then the language has to be probably restricted so severely to ensure termination that it is more or less useless. I think you're overestimating the difficulty of the problem. The fact that *in general* it can be arbitrarily hard to prove termination, can obscure the fact that for large parts of a typical program, proving termination is usually trivial. Yes. The problem is the small parts where it's not trivial (and the size of this part depends on the actual problem the program is trying to solve). Either you make the language so restrictive that these parts are hard (see your remark above) or impossible to write, or you'll be left with a language where the compiler cannot prove termination of those small parts, so it's not a type system in the usual sense. Of course one could write an optional tool that automatically proves termination of the easy parts, and leaves the rest to the programmer, but again, that's a different thing. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Termination and type systems
David Hopwood [EMAIL PROTECTED] wrote: Marshall wrote: David Hopwood wrote: A type system that required an annotation on all subprograms that do not provably terminate, OTOH, would not impact expressiveness at all, and would be very useful. Interesting. I have always imagined doing this by allowing an annotation on all subprograms that *do* provably terminate. Maybe the paper Linear types and non-size-increasing polynomial time computation by Martin Hofmann is interesting in this respect. From the abstract: We propose a linear type system with recursion operators for inductive datatypes which ensures that all definable functions are polynomial time computable. The system improves upon previous such systems in that recursive definitions can be arbitrarily nested; in particular, no predicativity or modality restrictions are made. It does not only ensure termination, but termination in polynomial time, so you can use those types to carry information about this as well. If the annotation marks not-provably-terminating subprograms, then it calls attention to those subprograms. This is what we want, since it is less safe/correct to use a nonterminating subprogram than a terminating one (in some contexts). That would be certainly nice, but it may be almost impossible to do in practice. It's already hard enough to guarantee termination with the extra information present in the type annotation. If this information is not present, then the language has to be probably restricted so severely to ensure termination that it is more or less useless. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
Dinko Tenev [EMAIL PROTECTED] wrote: Dirk Thierbach wrote: [A lot of stuff] Now clearer? Let's leave it there, and take a break. Maybe it would help to just take a concrete example, and work through it. Then you'll see exactly what happens. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
Dinko Tenev [EMAIL PROTECTED] wrote: Dirk Thierbach wrote: [One cannot escape exponential behaviour] But you cannot get rid of this. Consider S = {a, b}, W = {a}. Then there are |S|^n result elements for n 1, and you have to enumerate all of them. Yes, but then, they are in the target set. Which is the point. If they are in the target set, you have to enumerate them. If the target set is of exponential size with respect to n, then you'll need exponential time to do that. The point here is whether you can generate S^n - W in Theta( n * |S^n - W| ), which may be dramatically different from Theta( n * |S^n| ). Exactly. Hence, you use a construction that guarantees that the time needed is proportional to n*|S^n - W|: Every step you do will be enecessary to produce at least one word in the output set. Now, if |S^n - W| is still exponential, then you'll still need exponential time. But nevertheless, that's the best you can hope for. The automaton is: S: a - A, b - B A: a - A B: b - B The target set is specified as S^n - W, where W is everything matching (.*a.*b|.*b.*a). Following the construction procedure in point, this exclusion set is matched exactly by my DFA with S initial and F final. Then, swapping final and non-final states makes {S, A, B} final, and F non-final. Your DFA above may be equivalent, but to me it is far from clear exactly what algorithm would build it from the given data. Well, it's just the result from the minimazation algorithm, where my variant of the algorithm just prunes away the stuck state which can never produce any output. The point is not to cut efficiently, the point is to enumerate only those words that are actually in the result set. No, this is not the point. Naive filtering already does that. No, it doesn't. Naive filtering always will look at the complete input set, so, no matter what size |S^n - W| actually is, it will always take time in proportion to |S^n|. By cutting efficiently I mean skipping over search sub-trees that don't contain any results from the target set. Yes. Consider a different example: With the wildcard expressions W = { b*, aa*, ab* }, you'll get S^* - W = { a }. The resulting minimum FSM will just accept 'a' (start state, one final state, and the stuck state if you insist on it), so you skip over every other subtree when enumerating results from that automaton. And for the previous example, you'll need something like 2*n time to enumerate the output set instead of 2^n, because once you're in the a-branch, you're producing only a's, and you're pruning away all the subtrees that start with a b. Similarly in the b-branch. Now clearer? - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Call a wc 'free' if it satisfies the propery that every letter 'a' in it appears only in the form '*a*', and 'anchored' otherwise. Would '*ab*' be free or anchored? What if all wc's are free? How does this affect the DFA? I don't know. The important point here is the interaction of all the wc's. I don't think properties like this do reduce the complexity of interaction in an obvious way. Does it minimize nontrivially? I am not sure what you mean by this. Every DFA minimizes to some other DFA, which is unique up to renaming of states. Now the question is if that minimization reduces the complexity enough to be noticeable (maybe that's what you mean by nontrivially). In general, I don't think one can say much about that without looking at concrete examples. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
Dinko Tenev [EMAIL PROTECTED] wrote: Dirk Thierbach wrote: If more time during preprocessing is allowed, another idea is to treat the wildcard expressions as regular expressions, convert each into a finite state machine, construct the intersection of all these state machines, minimize it and then swap final and non-final states. Given the requirements, did you mean taking the *union* and swapping states? Or maybe swapping states first, and then taking the intersection? Whatever the requirements were. Take your pick. :-) Then you can use the resulting automaton to efficiently enumerate S^n - W. In the above case, the resulting FSM would have just three states. I don't see immediately how exactly this is going to work. Unless I'm very much mistaken, a FSA in the classical sense will accept or reject only after the whole sequence has been consumed, and this spells exponential times. Exponential in respect to what? You just recursively walk the automaton for every word of length n, so at most you'll have to check every word in S^n once. Hence, the complexity is not worse than the generate and test approach (it's in fact better, since testing is trivial). However, if the result set is simple (as in the example), then the result FSM will have states that won't have transitions for every letters, so I guess the average case will be a lot better. For improved asymptotic complexity in this case, you need to be able to at least reject in mid-sequence, One can do that if there is no valid transition out of some state. I guess one could even optimize on that: In the minimal automaton, every state has some transition sequence that will end up in a final state. Annotate each state with the minimum number of steps for such a sequence. While walking the automaton, keep the maximum number of letters to produce in a variable. If this number is small then the number in the annotation, stop and backtrace. This should guarantee that only those states are visited that really produce some output. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
Dinko Tenev [EMAIL PROTECTED] wrote: I don't see immediately how exactly this is going to work. Unless I'm very much mistaken, a FSA in the classical sense will accept or reject only after the whole sequence has been consumed, and this spells exponential times. Exponential in respect to what? With respect to sequence length. But you cannot get rid of this. Consider S = {a, b}, W = {a}. Then there are |S|^n result elements for n 1, and you have to enumerate all of them. The best thing one can hope for is to just actually process those elements that are really in the result set. With the FSM, you're getting at least close to that. However, if the result set is simple (as in the example), then the result FSM will have states that won't have transitions for every letters, so I guess the average case will be a lot better. I don't believe this case should ever arise out of the current definition of the problem: labels are specified explicitly only for the excluded subset, and you have to accept everything else by default. If the result set is {a^n, b^n | n \in N}, then you have an automaton where exactly this happens. Since minimum automatons are unique up to renaming of states, that's the result you'll get. For improved asymptotic complexity in this case, you need to be able to at least reject in mid-sequence, One can do that if there is no valid transition out of some state. One possibly can, but such states should never be encountered (see above.) The automaton is: S: a - A, b - B A: a - A B: b - B All the states are final. A and B have just one transition, so you'll be able to generate either a^n or b^n efficiently. Looking at the minimal DFA for the above example, it may be more appropriate to detect being in a state from which there's no path to a final state: This cannot happen, because minimizing removes all those states. (Or, in case you have a different definition of automaton in mind: There'll be just one stuck state, where all transitions that never go to a final state end up. Remove this state, and you'll end up with the other definition). I guess one could even optimize on that: In the minimal automaton, every state has some transition sequence that will end up in a final state. Annotate each state with the minimum number of steps for such a sequence. While walking the automaton, keep the maximum number of letters to produce in a variable. If this number is small then the number in the annotation, stop and backtrace. I don't think this could cut efficiently for patterns like *a*b. The point is not to cut efficiently, the point is to enumerate only those words that are actually in the result set. If you are enumerating all words shorter than a given length, the above method should guarantee this. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
[Had to drop alt.comp.lang.haskell, otherwise my newsserver doesn't accept it] Dinko Tenev [EMAIL PROTECTED] wrote: OK, here's a case that will make your program run in exponential time: S = { a, b }, W = { *a*b, *b*a } -- on my machine, it starts getting ugly as soon as n is 15 or so. Note that S^n - W = { a^n, b^n }. In general, whenever all the patterns in the set match against the last position, your current implementation is guaranteed to have to sift through all of S^n. I'd say the very idea of checking against a blacklist is fundamentally flawed, as far as performance is concerned. If more time during preprocessing is allowed, another idea is to treat the wildcard expressions as regular expressions, convert each into a finite state machine, construct the intersection of all these state machines, minimize it and then swap final and non-final states. Then you can use the resulting automaton to efficiently enumerate S^n - W. In the above case, the resulting FSM would have just three states. And it doesn't really matter what language you use to implement this algorithm, it's the idea that counts. Notation aside, all implementations will be quite similar. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming challenge: wildcard exclusion in cartesian products
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote: What are some good references for finite state machines? Minimization? A classic is Introduction to automata theory, languages and computation by Hopcroft and Ullman. But any other book about finite state machines should cover these topics, too. There are good chances that you can just google for a detailed explanation. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Puzzling OO design problem
George Sakkis [EMAIL PROTECTED] wrote: A1 - A2 - A3 - A4 - ... |||| B1 - B2 - + - B4 - ... |||| C1 - + - C3 - + - ... |||| D1 - D2 - + - D4 - ... |||| The solution is simply to include C3 in the list of parents of D4, as in D4(C3,B4,D2). So for every hole in a column, you have to include the first class (or classes, if the hole spans multiple rows) to the left of the hole as parents if the class just below the hole, in order from bottom to top. Nice. I had taken for granted that you need to fill in the holes (D3,B3,C2), either manually or automa[tg]ically, but if you allow a class to inherit from more than two bases, you can pick a set of parents that does the job, without any boilerplate code or __metaclass__ magic. The downside of this approach is that it's even harder to see the big picture, as in the schematic notation above; remember that each column is a different version that resides in a separate module, so it's not obvious which classes should be the parents of each variation. It's obvious if you know which versions are available, and which aren't. If you don't have this information, and you're only looking at each module locally, than it's probably even safer (i.e., less likely to confuse a casual reader of the source) to just generate all the dummy classes manually. But that sort of makes your original problem irrelevant :-) - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: Puzzling OO design problem
George Sakkis [EMAIL PROTECTED] wrote: 1. There is a (single inheritance) hierarchy of domain classes, say A-B-..-Z (arrows point to the parent in the inheritance tree). 2. This hierarchy evolved over time to different versions for each class. So for example, version's 1 hierarchy would be A_v1 -B_v1 -..-Z_v1. 3. At runtime, only one version is selected by a factory function. Up to this point, the inheritance graph would be the following: A - A_V1 ... - A_Vn ^ ^ ^ | | | B - B_V1 ... - B_Vn . . . . . . . . . ^ ^ ^ | | | Z - Z_V1 ... - Z_Vn Interesting problem. This could be implemented either with multiple inheritance (e.g. B_V1(B,A_V1)) To help myself thinking about that, let's make a somewhat complicated example, somewhere in the middle of the graph, with the possibility of introducing a hole at B3. I also shorten A_Vn to An etc. Consider the subgraph (with nonstandard annotations of method definition after the bar (to save space) as explained below): A2 | f - A3 | f ^ ^ | | B2 - B3 ^ ^ | | C2 | g - C3 | h Assume a method g that is present in C2 but not changed in C3. Now g calls a method f, which is inherited unchanged in C2 from A2 (and not changed in B2, B3 or C3, either), but changed in A3. Finally, let h call g in C3. As here the inheritance columns have priority, one would expect then g to call f in A3, and not in A2, for example. So what you need is that every method, even if not originating from the real class, is looked up first in the column above the real class, then in the column left to that, and so on. Ok. Multiple inheritance can often select priority for conflicting methods. If you can specify yhat tou want column priority for each class, you're fine. or using the bridge design pattern |Z| times, one per each row. When I read your description above, I also thought immediately bridge pattern, but then I tried to write down details, and got stuck. How would you do it? Now the problem is that there are 'holes' in this inheritance lattice: Not all versions introduced new variations of all types; [...] My first thought was to create all the missing classes dynamically, but it's somewhat obscure and it may not be that simple. Is there a more elegant solution, either a general design pattern or some clever python metaprogramming hack ? In the most general case, you need to access time the whole upper left subgraph at class creation, collect all methods defined in this subgraph with column priority, and overwrite or add to that any methods defined in the newly defined class. I don't know enough about the intricacies of Python's class creation to make a concrete suggestion, but I'd think that would be possible with the help of __metaclass__. You would need some sort of repository for the complete inheritance. One way to do that would be to create the chain A ... Z first, letting A inherit from some special class with __metaclass__ set, and then store an array of all versions somewhere inside the class namespace. You'd also need some special syntax to create a new version of a class (say, again inheriting from some special class with __metaclass__ set). You could set the version inside the class definition, and then let the __metaclass__ routine disable the normal inheritance mechanism, and add missing methods as appropriate. This could for example look like class A(Versioning): ... class B(A): ... class C(B): def h ... ... class A2(NewVersion,A): __version__ = 2 def f(): ... class B2(NewVersion,B): __version__ = 2 class C2(NewVersion,C): __version__ = 2 def g(): ... f() ... class A3(NewVersion,A): __version__ = 3 def f(): ... class C3(NewVersion,C): __version__ = 3 def h(): ... g() ... with a hole at B3, as in the example. C3 will get g from C2 and f from A3, and hence the call chain will work correctly. Also, C3 will have no base classes (or maybe only the __metaclass__ ones), the inherited class A, B, C are just used by the class creation process to find out where to look for the inheritance matrix. Others who know more about the class creation mechanism will no doubt improve this suggestion, point out my errors, and tell you how to implement it :-) Note that you're basically completely changing the normal inheritance mechanism, and the class objects will be larger, because they'll have to copy all the necessary methods. I cannot think of any pattern that would give similar flexibility. - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: list of unique non-subset sets
Raymond Hettinger [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] ] OK, so I need to be more precise. Given a list of sets, output the largest list of sets (from this list, order does not matter) such that: 1) there is no set that is a PROPER subset of another set in this list 2) no two sets have exactly the same members (100% overlap) [Bengt Richter] two from the above come out length 5: With multiple outputs satisfying the constraints, I would suspect that there is something wrong with the original spec (not as a stand-alone problem, but as component of a real application). I don't know what the application is trying to do, but if I understand it correctly, he is asking for something called a maximal anti-chain. For any partial order (e.g., subset relation), a chain is a set of elements which are all comparable (and hence there is a linear or total order on this set). In a similar way, an anti-chain is a set of elements consisting of elements that are incomparable to each other. Anti-chains themselves can be ordered by subset inclusion, and thefore maximal (largest) anti-chains can be found, but in general there is no unique greatest anti-chain. So the spec is perfectly ok and makes a lot of sense mathematically, it's just that there is no unique solution. Maybe googling for maximal anti-chain digs up some more information (I haven't tried). - Dirk -- http://mail.python.org/mailman/listinfo/python-list
Re: lambda closure question
Ted Lilley [EMAIL PROTECTED] wrote: As a side note, I'm familiar with the term currying from a friend who learned ML and Scheme quite some time ago. Not sure if that's the true origin, but it was a sufficiently different context from Python (or at least I thought) that I didn't want to rely on its meaning. I was also sufficiently unsure of it's _exact_ meaning, The exact meaning is reasonably well explained, together with the origins, on Wikipedia: http://en.wikipedia.org/wiki/Currying That meaning shouldn't change if applied to different computer languages. I can understand the temptation to confuse partial application with currying (for languages like Python which take tuples of arguments by default, currying is one way to make partial application possible; supplying default arguments is another), but nonetheless they are different things. - Dirk -- http://mail.python.org/mailman/listinfo/python-list