Re: [Haskell-cafe] design question: decision tree from "Programming Collective Intelligence"

2010-08-04 Thread S. Doaitse Swierstra
I have added the permutation parsers from uulib to uu-parsinglib:

http://hackage.haskell.org/packages/archive/uu-parsinglib/2.5.1.1/doc/html/Text-ParserCombinators-UU-Perms.html,

where you find reference to the paper

Doaitse


On 22 jun 2010, at 09:24, Stephen Tetley wrote:

> Hello
> 
> Maybe "permutation trees" are a viable starting point?
> 
> See the paper "Parsing Permutation Phrases" which appears to be on CiteSeer.
> 
> Some slides are also here - the data type definitions and Functor
> instance for permutation trees are on page 18 (pdf index page 19):
> http://www.comlab.ox.ac.uk/jeremy.gibbons/wg21/meeting56/loeh-slides.pdf
> 
> An alternative implementation for applicative functors is here:
> http://hackage.haskell.org/package/action-permutations
> 
> Note the use of existentials here is pretty cunning, I didn't get very
> far the time I attempted to use the technique for my own purposes.
> 
> Best wishes
> 
> Stephen
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] design question: decision tree from "Programming Collective Intelligence"

2010-06-24 Thread Jason Dagit
On Mon, Jun 21, 2010 at 11:22 AM, Daniel Lyons wrote:

> Hi,
>
> I'm having a little trouble figuring out precisely how to port the decision
> tree code from the book "Programming Collective Intelligence." You can see
> the code here:
> http://code.google.com/p/memothing/source/browse/trunk/PCI/ch7/treepredict.py?r=29
>
> The design issue is that this code depends on dynamic typing pretty
> heavily. He has a table of heterogenous data. Each column has the same type,
> but that's implicit, not explicit. He depends on all of the values
> supporting "==", which he uses to get a list of distinct values from each
> column. These values are used by the divide set function which takes a
> column and a value as parameters. Based on the type of the value, he chooses
> a different partitioning function; for strings, it's just equality again,
> but for numeric types he uses >=. The decision tree nodes then collect not
> just the left and right branches but also the column number and value on
> which the split was performed.
>
> I have thought about this for several days and can't seem to come to a
> design that I like, so I'm wondering how others would approach the problem.
> I guess you could implement it as a list of lists of some data type that is
> either a string or numeric, but this feels a bit like a cop-out. How would
> you support creating a decision tree with different types of data? I imagine
> possibilities using existential quantification, SYB, Data.Data and other
> stuff, but I don't know how to make it work. I wonder if there is an obvious
> functional design hiding in here that doesn't depend on any fancy stuff, but
> I'm blinded to it by having read and understood the Python version of the
> code.
>

Perhaps I'm missing the point of your objection but I would go with the
simplest design possible that works here.

For example, if you know you will only have numbers and strings then use
something like this:
data Foo = N Int | S String

If want to leave the set of possibilities open, you could make a type class
that has functions for the operations you'll want to use on the data.  Then
you have at least two possibilities.
1. If you're okay with the collections being homogeneous then you're done.
 Just make the table parameterized over your type class.
2. If you want the collections to be heterogeneous then I would try to
discourage you because it will become harder to maintain and reason about
your code :) Having said that, there are ways to move forward in that
direction if you feel it's necessary.  It sounds like you're already aware
of those solutions.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] design question: decision tree from "Programming Collective Intelligence"

2010-06-22 Thread Stephen Tetley
Hello

Maybe "permutation trees" are a viable starting point?

See the paper "Parsing Permutation Phrases" which appears to be on CiteSeer.

Some slides are also here - the data type definitions and Functor
instance for permutation trees are on page 18 (pdf index page 19):
http://www.comlab.ox.ac.uk/jeremy.gibbons/wg21/meeting56/loeh-slides.pdf

An alternative implementation for applicative functors is here:
http://hackage.haskell.org/package/action-permutations

Note the use of existentials here is pretty cunning, I didn't get very
far the time I attempted to use the technique for my own purposes.

Best wishes

Stephen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] design question: decision tree from "Programming Collective Intelligence"

2010-06-21 Thread Daniel Lyons
Hi,

I'm having a little trouble figuring out precisely how to port the decision 
tree code from the book "Programming Collective Intelligence." You can see the 
code here: 
http://code.google.com/p/memothing/source/browse/trunk/PCI/ch7/treepredict.py?r=29

The design issue is that this code depends on dynamic typing pretty heavily. He 
has a table of heterogenous data. Each column has the same type, but that's 
implicit, not explicit. He depends on all of the values supporting "==", which 
he uses to get a list of distinct values from each column. These values are 
used by the divide set function which takes a column and a value as parameters. 
Based on the type of the value, he chooses a different partitioning function; 
for strings, it's just equality again, but for numeric types he uses >=. The 
decision tree nodes then collect not just the left and right branches but also 
the column number and value on which the split was performed.

I have thought about this for several days and can't seem to come to a design 
that I like, so I'm wondering how others would approach the problem. I guess 
you could implement it as a list of lists of some data type that is either a 
string or numeric, but this feels a bit like a cop-out. How would you support 
creating a decision tree with different types of data? I imagine possibilities 
using existential quantification, SYB, Data.Data and other stuff, but I don't 
know how to make it work. I wonder if there is an obvious functional design 
hiding in here that doesn't depend on any fancy stuff, but I'm blinded to it by 
having read and understood the Python version of the code.

Anybody have any suggestions? I'd greatly appreciate any help.

Thanks,

— 
Daniel Lyons

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe