Re: Termination and type systems

2006-06-27 Thread Dirk Thierbach
David Hopwood [EMAIL PROTECTED] wrote:
 Dirk Thierbach wrote:

 That's interesting, but linear typing imposes some quite severe
 restrictions on programming style. From the example of 'h' on page 2,
 it's clear that the reason for the linearity restriction is just to
 ensure polynomial-time termination, and is not needed if we only want
 to prove termination.

Yes. It's just an example of what can be actually done with a typesystem.

 It's already hard enough to guarantee termination with the extra
 information present in the type annotation. If this information is
 not present, then the language has to be probably restricted so
 severely to ensure termination that it is more or less useless.

 I think you're overestimating the difficulty of the problem. The fact
 that *in general* it can be arbitrarily hard to prove termination, can
 obscure the fact that for large parts of a typical program, proving
 termination is usually trivial.

Yes. The problem is the small parts where it's not trivial (and the
size of this part depends on the actual problem the program is trying
to solve). Either you make the language so restrictive that these
parts are hard (see your remark above) or impossible to write, or
you'll be left with a language where the compiler cannot prove
termination of those small parts, so it's not a type system in the
usual sense.

Of course one could write an optional tool that automatically proves
termination of the easy parts, and leaves the rest to the programmer,
but again, that's a different thing.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Termination and type systems

2006-06-26 Thread Dirk Thierbach
David Hopwood [EMAIL PROTECTED] wrote:
 Marshall wrote:
 David Hopwood wrote:

 A type system that required an annotation on all subprograms that
 do not provably terminate, OTOH, would not impact expressiveness
 at all, and would be very useful.

 Interesting. I have always imagined doing this by allowing an
 annotation on all subprograms that *do* provably terminate. 

Maybe the paper Linear types and non-size-increasing polynomial time
computation by Martin Hofmann is interesting in this respect.
From the abstract:

  We propose a linear type system with recursion operators for inductive
  datatypes which ensures that all definable functions are polynomial
  time computable. The system improves upon previous such systems in
  that recursive definitions can be arbitrarily nested; in particular,
  no predicativity or modality restrictions are made.

It does not only ensure termination, but termination in polynomial time,
so you can use those types to carry information about this as well.

 If the annotation marks not-provably-terminating subprograms, then it
 calls attention to those subprograms. This is what we want, since it is
 less safe/correct to use a nonterminating subprogram than a terminating
 one (in some contexts).

That would be certainly nice, but it may be almost impossible to do in
practice. It's already hard enough to guarantee termination with the
extra information present in the type annotation. If this information
is not present, then the language has to be probably restricted so
severely to ensure termination that it is more or less useless.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-25 Thread Dirk Thierbach
Dinko Tenev [EMAIL PROTECTED] wrote:
 Dirk Thierbach wrote:

 [A lot of stuff]

 Now clearer?

 Let's leave it there, and take a break.

Maybe it would help to just take a concrete example, and work through
it. Then you'll see exactly what happens.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-24 Thread Dirk Thierbach
Dinko Tenev [EMAIL PROTECTED] wrote:
 Dirk Thierbach wrote:

[One cannot escape exponential behaviour]

 But you cannot get rid of this. Consider S = {a, b}, W = {a}.
 Then there are |S|^n result elements for n  1, and you have to enumerate
 all of them.

 Yes, but then, they are in the target set.  

Which is the point. If they are in the target set, you have to enumerate
them. If the target set is of exponential size with respect to n,
then you'll need exponential time to do that.

 The point here is whether
 you can generate S^n - W in Theta( n * |S^n - W| ), which may be
 dramatically different from Theta( n * |S^n| ).

Exactly. Hence, you use a construction that guarantees that the time
needed is proportional to n*|S^n - W|: Every step you do will be
enecessary to produce at least one word in the output set. Now, if 
|S^n - W| is still exponential, then you'll still need exponential
time. But nevertheless, that's the best you can hope for.

 The automaton is:

 S: a - A, b - B
 A: a - A
 B: b - B

 The target set is specified as S^n - W, where W is everything matching
 (.*a.*b|.*b.*a).  Following the construction procedure in point, this
 exclusion set is matched exactly by my DFA with S initial and F final.
 Then, swapping final and non-final states makes {S, A, B} final, and F
 non-final.  Your DFA above may be equivalent, but to me it is far from
 clear exactly what algorithm would build it from the given data.

Well, it's just the result from the minimazation algorithm, where my
variant of the algorithm just prunes away the stuck state which can
never produce any output.

 The point is not to cut efficiently, the point is to enumerate
 only those words that are actually in the result set.

 No, this is not the point.  Naive filtering already does that. 

No, it doesn't. Naive filtering always will look at the complete
input set, so, no matter what size |S^n - W| actually is, it will
always take time in proportion to |S^n|.

  By cutting efficiently I mean skipping over search sub-trees that
 don't contain any results from the target set.

Yes. Consider a different example: With the wildcard expressions
W = { b*, aa*, ab* }, you'll get S^* - W = { a }. The resulting
minimum FSM will just accept 'a' (start state, one final state, and
the stuck state if you insist on it), so you skip over every
other subtree when enumerating results from that automaton.

And for the previous example, you'll need something like 2*n time
to enumerate the output set instead of 2^n, because once you're in
the a-branch, you're producing only a's, and you're pruning
away all the subtrees that start with a b. Similarly in the 
b-branch.

Now clearer?

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-24 Thread Dirk Thierbach
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Call a wc 'free' if it satisfies the propery that every letter 'a' in
 it appears only in the form '*a*', and 'anchored' otherwise. 

Would '*ab*' be free or anchored?

 What if all wc's are free? How does this affect the DFA?

I don't know. The important point here is the interaction of all
the wc's. I don't think properties like this do reduce the complexity
of interaction in an obvious way.

 Does it minimize nontrivially?

I am not sure what you mean by this. Every DFA minimizes to some
other DFA, which is unique up to renaming of states. Now the
question is if that minimization reduces the complexity enough
to be noticeable (maybe that's what you mean by nontrivially).
In general, I don't think one can say much about that without
looking at concrete examples.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-23 Thread Dirk Thierbach
Dinko Tenev [EMAIL PROTECTED] wrote:
 Dirk Thierbach wrote:
 If more time during preprocessing is allowed, another idea is to
 treat the wildcard expressions as regular expressions, convert
 each into a finite state machine, construct the intersection of
 all these state machines, minimize it and then swap final and non-final
 states.

 Given the requirements, did you mean taking the *union* and swapping
 states?  Or maybe swapping states first, and then taking the
 intersection?

Whatever the requirements were. Take your pick. :-)

 Then you can use the resulting automaton to efficiently
 enumerate S^n - W. In the above case, the resulting FSM would have just
 three states.

 I don't see immediately how exactly this is going to work.  Unless I'm
 very much mistaken, a FSA in the classical sense will accept or reject
 only after the whole sequence has been consumed, and this spells
 exponential times.  

Exponential in respect to what? You just recursively walk the
automaton for every word of length n, so at most you'll have to check
every word in S^n once. Hence, the complexity is not worse than the
generate and test approach (it's in fact better, since testing is
trivial).

However, if the result set is simple (as in the example), then the
result FSM will have states that won't have transitions for every
letters, so I guess the average case will be a lot better.

 For improved asymptotic complexity in this case,
 you need to be able to at least reject in mid-sequence, 

One can do that if there is no valid transition out of some state.

I guess one could even optimize on that: In the minimal automaton,
every state has some transition sequence that will end up in a final
state. Annotate each state with the minimum number of steps for
such a sequence. While walking the automaton, keep the maximum
number of letters to produce in a variable. If this number is small
then the number in the annotation, stop and backtrace.

This should guarantee that only those states are visited that really
produce some output. 

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-23 Thread Dirk Thierbach
Dinko Tenev [EMAIL PROTECTED] wrote:
  I don't see immediately how exactly this is going to work.  Unless I'm
  very much mistaken, a FSA in the classical sense will accept or reject
  only after the whole sequence has been consumed, and this spells
  exponential times.

 Exponential in respect to what?

 With respect to sequence length.  

But you cannot get rid of this. Consider S = {a, b}, W = {a}.
Then there are |S|^n result elements for n  1, and you have to enumerate 
all of them.

The best thing one can hope for is to just actually process those
elements that are really in the result set. With the FSM, you're
getting at least close to that.

 However, if the result set is simple (as in the example), then the
 result FSM will have states that won't have transitions for every
 letters, so I guess the average case will be a lot better.

 I don't believe this case should ever arise out of the current
 definition of the problem: labels are specified explicitly only for the
 excluded subset, and you have to accept everything else by default.

If the result set is {a^n, b^n | n \in N}, then you have an automaton
where exactly this happens. Since minimum automatons are unique
up to renaming of states, that's the result you'll get.

  For improved asymptotic complexity in this case,
  you need to be able to at least reject in mid-sequence,

 One can do that if there is no valid transition out of some state.

 One possibly can, but such states should never be encountered (see
 above.)

The automaton is:

S: a - A, b - B
A: a - A
B: b - B

All the states are final. A and B have just one transition, so
you'll be able to generate either a^n or b^n efficiently.

 Looking at the minimal DFA for the above example, it may be more
 appropriate to detect being in a state from which there's no path to a
 final state:

This cannot happen, because minimizing removes all those states.
(Or, in case you have a different definition of automaton in mind:
There'll be just one stuck state, where all transitions that never
go to a final state end up. Remove this state, and you'll end up
with the other definition).

 I guess one could even optimize on that: In the minimal automaton,
 every state has some transition sequence that will end up in a final
 state. Annotate each state with the minimum number of steps for
 such a sequence. While walking the automaton, keep the maximum
 number of letters to produce in a variable. If this number is small
 then the number in the annotation, stop and backtrace.

 I don't think this could cut efficiently for patterns like *a*b.

The point is not to cut efficiently, the point is to enumerate
only those words that are actually in the result set. If you are
enumerating all words shorter than a given length, the above method
should guarantee this.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-22 Thread Dirk Thierbach
[Had to drop alt.comp.lang.haskell, otherwise my newsserver doesn't accept it]

Dinko Tenev [EMAIL PROTECTED] wrote:
 OK, here's a case that will make your program run in exponential time:
 S = { a, b }, W = { *a*b, *b*a } -- on my machine, it starts getting
 ugly as soon as n is 15 or so.  Note that S^n - W = { a^n, b^n }.

 In general, whenever all the patterns in the set match against the last
 position, your current implementation is guaranteed to have to sift
 through all of S^n.  I'd say the very idea of checking against a
 blacklist is fundamentally flawed, as far as performance is concerned.

If more time during preprocessing is allowed, another idea is to
treat the wildcard expressions as regular expressions, convert
each into a finite state machine, construct the intersection of
all these state machines, minimize it and then swap final and non-final
states. Then you can use the resulting automaton to efficiently 
enumerate S^n - W. In the above case, the resulting FSM would have just 
three states.

And it doesn't really matter what language you use to implement this
algorithm, it's the idea that counts. Notation aside, all
implementations will be quite similar.

- Dirk



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-22 Thread Dirk Thierbach
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 What are some good references for finite state machines? Minimization?

A classic is Introduction to automata theory, languages and computation
by Hopcroft and Ullman. But any other book about finite state machines
should cover these topics, too. There are good chances that you can just
google for a detailed explanation.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzling OO design problem

2005-04-10 Thread Dirk Thierbach
George Sakkis [EMAIL PROTECTED] wrote:
A1 - A2 - A3 - A4 - ...
||||
B1 - B2 - +  - B4 - ...
||||
C1 - +  - C3 - +  - ...
||||
D1 - D2 - +  - D4 - ...
||||

 The solution is simply to include C3 in the list of parents of D4,
 as in D4(C3,B4,D2). So for every hole in a column, you have to
 include the first class (or classes, if the hole spans multiple
 rows) to the left of the hole as parents if the class just below
 the hole, in order from bottom to top. 

 Nice. I had taken for granted that you need to fill in the holes
 (D3,B3,C2), either manually or automa[tg]ically, but if you allow a
 class to inherit from more than two bases, you can pick a set of
 parents that does the job, without any boilerplate code or
 __metaclass__ magic. The downside of this approach is that it's even
 harder to see the big picture, as in the schematic notation above;
 remember that each column is a different version that resides in a
 separate module, so it's not obvious which classes should be the
 parents of each variation. 

It's obvious if you know which versions are available, and which
aren't. If you don't have this information, and you're only looking at
each module locally, than it's probably even safer (i.e., less likely
to confuse a casual reader of the source) to just generate all the
dummy classes manually. But that sort of makes your original problem 
irrelevant :-)

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzling OO design problem

2005-04-09 Thread Dirk Thierbach
George Sakkis [EMAIL PROTECTED] wrote:

 1. There is a (single inheritance) hierarchy of domain classes, say
 A-B-..-Z (arrows point to the parent in the inheritance tree).
 2. This hierarchy evolved over time to different versions for each
 class. So for example, version's 1 hierarchy would be A_v1 -B_v1
 -..-Z_v1.
 3. At runtime, only one version is selected by a factory function.

 Up to this point, the inheritance graph would be the following:
 
 A - A_V1 ... - A_Vn
 ^ ^   ^
 | |   |
 B - B_V1 ... - B_Vn
 . .   .
 . .   .
 . .   .
 ^ ^   ^
 | |   |
 Z - Z_V1 ... - Z_Vn

Interesting problem.

 This could be implemented either with multiple inheritance (e.g.
 B_V1(B,A_V1)) 

To help myself thinking about that, let's make a somewhat complicated
example, somewhere in the middle of the graph, with the possibility of
introducing a hole at B3. I also shorten A_Vn to An etc. Consider the
subgraph (with nonstandard annotations of method definition after the bar
(to save space) as explained below):

A2 | f   -   A3 | f
^ ^
| |
B2   -   B3   
^ ^
| |
C2 | g   -   C3 | h

Assume a method g that is present in C2 but not changed in C3. Now g
calls a method f, which is inherited unchanged in C2 from A2 (and not
changed in B2, B3 or C3, either), but changed in A3. Finally, let h
call g in C3. As here the inheritance columns have priority, one
would expect then g to call f in A3, and not in A2, for example.

So what you need is that every method, even if not originating from
the real class, is looked up first in the column above the real
class, then in the column left to that, and so on.

Ok. Multiple inheritance can often select priority for conflicting
methods. If you can specify yhat tou want column priority for
each class, you're fine.

 or using the bridge design pattern |Z| times, one per each row.

When I read your description above, I also thought immediately
bridge pattern, but then I tried to write down details,
and got stuck. How would you do it?

 Now the problem is that there are 'holes' in this
 inheritance lattice: Not all versions introduced new variations of
 all types; [...]

 My first thought was to create all the missing classes dynamically,
 but it's somewhat obscure and it may not be that simple. Is there a
 more elegant solution, either a general design pattern or some
 clever python metaprogramming hack ?

In the most general case, you need to access time the whole upper
left subgraph at class creation, collect all methods defined in this
subgraph with column priority, and overwrite or add to that any
methods defined in the newly defined class.

I don't know enough about the intricacies of Python's class creation
to make a concrete suggestion, but I'd think that would be possible
with the help of __metaclass__. You would need some sort of
repository for the complete inheritance. One way to do that would
be to create the chain A ... Z first, letting A inherit from some
special class with __metaclass__ set, and then store an array
of all versions somewhere inside the class namespace.

You'd also need some special syntax to create a new version of a class
(say, again inheriting from some special class with __metaclass__
set).  You could set the version inside the class definition, and then
let the __metaclass__ routine disable the normal inheritance
mechanism, and add missing methods as appropriate.

This could for example look like

class A(Versioning):
  ...

class B(A):
  ...

class C(B):
  def h ...
  ...

class A2(NewVersion,A):
  __version__ = 2
  def f(): ...

class B2(NewVersion,B):
  __version__ = 2

class C2(NewVersion,C): 
  __version__ = 2
  def g(): ... f() ...

class A3(NewVersion,A):
  __version__ = 3
  def f(): ...

class C3(NewVersion,C): 
  __version__ = 3
  def h(): ... g() ...

with a hole at B3, as in the example. C3 will get g from C2 and f from
A3, and hence the call chain will work correctly. Also, C3 will have no
base classes (or maybe only the __metaclass__ ones), the inherited
class A, B, C are just used by the class creation process to find
out where to look for the inheritance matrix.

Others who know more about the class creation mechanism will no doubt
improve this suggestion, point out my errors, and tell you how to
implement it :-) Note that you're basically completely changing the
normal inheritance mechanism, and the class objects will be larger,
because they'll have to copy all the necessary methods.

I cannot think of any pattern that would give similar flexibility.

- Dirk

  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list of unique non-subset sets

2005-03-19 Thread Dirk Thierbach
Raymond Hettinger [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] ]

 OK, so I need to be more precise.
 Given a list of sets, output the largest list of sets (from this list,
 order does not matter) such that:
 1) there is no set that is a PROPER subset of another set in this list
 2) no two sets have exactly the same members (100% overlap)

 [Bengt Richter]
 two from the above come out length 5:

 With multiple outputs satisfying the constraints, I would suspect
 that there is something wrong with the original spec (not as a
 stand-alone problem, but as component of a real application).

I don't know what the application is trying to do, but if I understand
it correctly, he is asking for something called a maximal anti-chain.

For any partial order (e.g., subset relation), a chain is a set of
elements which are all comparable (and hence there is a linear or
total order on this set). In a similar way, an anti-chain is a set of
elements consisting of elements that are incomparable to each
other. Anti-chains themselves can be ordered by subset inclusion, and
thefore maximal (largest) anti-chains can be found, but in general
there is no unique greatest anti-chain. 

So the spec is perfectly ok and makes a lot of sense mathematically,
it's just that there is no unique solution. 

Maybe googling for maximal anti-chain digs up some more information
(I haven't tried).

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lambda closure question

2005-02-20 Thread Dirk Thierbach
Ted Lilley [EMAIL PROTECTED] wrote:
 As a side note, I'm familiar with the term currying from a friend who
 learned ML and Scheme quite some time ago.  Not sure if that's the true
 origin, but it was a sufficiently different context from Python (or at
 least I thought) that I didn't want to rely on its meaning.  I was also
 sufficiently unsure of it's _exact_ meaning, 

The exact meaning is reasonably well explained, together with the
origins, on Wikipedia:

http://en.wikipedia.org/wiki/Currying

That meaning shouldn't change if applied to different computer languages.
I can understand the temptation to confuse partial application with
currying (for languages like Python which take tuples of arguments
by default, currying is one way to make partial application possible;
supplying default arguments is another), but nonetheless they are different 
things.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list