On Monday, July 24, 2017 at 10:04:36 PM UTC+2, Daniel Prager wrote:
> Jon wrote:
> > Aside: if I read Daniel's solution correct, he avoids the first issue by 
> >assuming that it's a binary classification task (that is, that there are 
> >only two classes).
> 
> 
> Yep: I'm assuming binary classification.
> 
> 
> David wrote:
> > Out of curiosity, how much of that 0.5 seconds is overhead?  Could you run 
> >a simple 'add 1 and 1' procedure and see how long it takes?
> 
> 
> 
> I'm not exactly sure what you mean. Please feel free to profile however you 
> like on the supplied code.
> 
> My observation (primarily to Zelphir) on performance is that lists don't seem 
> like a bad choice for this algorithm.
> 
> If it hadn't been reasonably quick I might have tried replacing the dataset 
> (a list of lists) with a list of vectors, but otherwise I'd be looking at 
> modifying the exhaustive, greedy algorithm itself for possible speedups 
> rather than data structures.
> 
> Zelphir:
> > Maybe you could put it in a repository, so that other people are more 
> >likely to find your code.
> 
> If I ever get back into ML I might, but don't have the time to do a proper 
> write up.
> 
> Please feel free to include it in your github repository, with or without 
> attribution.
> 
> 
> Dan

With my implementation of a list of vectors I only get down to:

cpu time: 996 real time: 994 gc time: 52

on my machine. Now I don't know what kind of machine you have, but I guess with 
such small data sets it does not matter that much and the list of lists 
implementation is faster, at least for low dimensional data :) It seems vectors 
involve a bit of overhead or you did some other optimization, which I still 
have to add to my code. (Maybe assuming binary class, but that should not make 
that much of a difference, I think. Might try that soon.)

I added your code as a new file and added a comment at the top of the file:

#|
Attribution:

This implementation of decision trees in Racket was written by Daniel Prager and
was originally shared at:

https://groups.google.com/forum/#!topic/racket-users/cPuTr8lrXCs

With permission it was added to the project.
|#

My project on Github is GPLv3.



I think I implemented the suggestions made so far in this discussion, except 
memoization of columns. That could be another big time saver.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to