Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Philip McGrath
If I'm following correctly, I think that's what I was trying to do, but I'm
unclear how to give `make-deserialize-info` a variant of `make-adder` that
has a contract. The initial example with `define/contract` was the closest
I've come: it at least reported violations in terms of `make-adder` rather
than `+`, but (as I now understand) it blamed the `server` module for all
violations.

-Philip

On Sun, Jul 23, 2017 at 9:27 PM, Matthew Flatt  wrote:

> The original example had an explicit deserializer:
>
> At Sun, 23 Jul 2017 19:54:43 -0500, Philip McGrath wrote:
> >   (define deserialize-info:adder-v0
> > (make-deserialize-info make-adder
> >(λ () (error 'adder
> > "can't have cycles"
>
> You're constructing the deserializer with `make-adder` --- the variant
> from inside the `server` module, so it doesn't have a contract.
>
> I think this is where you want to draw a new boundary by giving
> `make-deserialize-info` a variant of `make-adder` that has a contract.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 10:09 PM, Jon Zeppieri  wrote:
>
> Even after implementing my own suggestions, it's still much slower
> than the python example it was based. Maybe there's an algorithmic
> problem somewhere (aside from the vector iteration I mentioned
> before). At any rate, I'm intrigued now... -J

And... it turned out to be a very small thing indeed. When you iterate
over the class labels, you're supposed to iterate over the set of
*distinct* class labels. In the Python source, this is:

   class_values = list(set(row[-1] for row in dataset))

In your code, you have:

   (let* ([class-labels (data-get-col data label-column-index)] ...)

... where `data-get-col` returns a list the same length as `data`.
And that's where the huge slowdown comes from, since it means many
more iterations in `gini-index`.

-Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Matthew Flatt
The original example had an explicit deserializer:

At Sun, 23 Jul 2017 19:54:43 -0500, Philip McGrath wrote:
>   (define deserialize-info:adder-v0
> (make-deserialize-info make-adder
>(λ () (error 'adder
> "can't have cycles"

You're constructing the deserializer with `make-adder` --- the variant
from inside the `server` module, so it doesn't have a contract.

I think this is where you want to draw a new boundary by giving
`make-deserialize-info` a variant of `make-adder` that has a contract.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Matthias Felleisen

I see. Not surprisingly serialization strips the contract of (structural) 
functions as you can see with this slightly different example: 


#lang racket

(module server racket
  (require racket/serialize)
  (provide (contract-out
[adder (-> natural-number/c (-> natural-number/c 
natural-number/c))]))
  
  (serializable-struct adder (base)
#:property prop:procedure
(λ (this x)
  (+ (adder-base this) x

(require 'server racket/serialize)

;; would report a contract violation in terms of adder
;; and blame this module
;((adder 5) 'not-a-number)

;; reports a contract violation in terms of +

(define f (adder 5))
(with-handlers ((exn? (λ (xn) (displayln xn
  (f 'not-a-number))
   
((deserialize (serialize f)) 'not-a-number)



> On Jul 23, 2017, at 10:06 PM, Philip McGrath  wrote:
> 
> Here is the problem with serialization, without my attempts to mitigate it:
> 
> #lang racket
> 
> (module server racket
>   (require racket/serialize)
>   (provide (contract-out
> [adder (-> natural-number/c (-> natural-number/c
> natural-number/c))]))
>   (serializable-struct adder (base)
> #:property prop:procedure
> (λ (this x)
>   (+ (adder-base this) x
> (require 'server racket/serialize)
> 
> ;; would report a contract violation in terms of adder
> ;; and blame this module
> ;((adder 5) 'not-a-number)
> 
> ;; reports a contract violation in terms of +
> ((deserialize (serialize (adder 5))) 'not-a-number)
> 
> -Philip
> 
> On Sun, Jul 23, 2017 at 9:02 PM, Matthias Felleisen  > wrote:
> [replying to myself]
> 
> 
> > On Jul 23, 2017, at 9:58 PM, Matthias Felleisen  > > wrote:
> >
> >
> > At some point I wrote all this up for the contract doc (as the opening 
> > paragraphs). I can’t see it right now.
> 
> 
> Still there:
> 
>http://docs.racket-lang.org/guide/contract-boundaries.html 
> 
> 
> 
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
 wrote:
> Hi Racket Users,
>
> The last few days I've been working on implementing decision trees in Racket 
> and I've been following the following guide: 
> http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
>
> Now I have the following code: https://github.com/ZelphirKaltstahl/racket-ml
>
> I also wrote some tests, I think for every procedure so far.
>
> However, my implementation seems very very slow. It seems each iteration of 
> `iter-features` takes way too much time.
>
> I've tried to stick to the guide and sometimes "outsourced" some procedure.
>
> I started out with using vectors, as I thought I might gain better 
> performance than from lists. In the code I introduced an abstraction layer, 
> which provides things like `data-length`, so that I could in theory change 
> the representation of data and only change those accessors/getters. In the 
> test cases I sometimes did not use the abstraction though.
>
> So far I am not having much side effects in the code and I'd like to avoid 
> them and unsafe operations.
>
> A small `TEST-DATA` set is in the code and another data set I downloaded from 
> the data set repositories. When running with `TEST-DATA` to calculate the 
> best split, it only takes a few milliseconds, while it takes minutes with the 
> other `data-set`.
>
> How can I make my code more efficient, without changing the basic logic of it?
> Should I not use vectors (what else?)?
> Would I gain anything from using typed Racket or flonums?
>


Even after implementing my own suggestions, it's still much slower
than the python example it was based. Maybe there's an algorithmic
problem somewhere (aside from the vector iteration I mentioned
before). At any rate, I'm intrigued now... -J

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Philip McGrath
Here is the problem with serialization, without my attempts to mitigate it:

#lang racket

(module server racket
  (require racket/serialize)
  (provide (contract-out
[adder (-> natural-number/c (-> natural-number/c
natural-number/c))]))
  (serializable-struct adder (base)
#:property prop:procedure
(λ (this x)
  (+ (adder-base this) x
(require 'server racket/serialize)

;; would report a contract violation in terms of adder
;; and blame this module
;((adder 5) 'not-a-number)

;; reports a contract violation in terms of +
((deserialize (serialize (adder 5))) 'not-a-number)

-Philip

On Sun, Jul 23, 2017 at 9:02 PM, Matthias Felleisen 
wrote:

> [replying to myself]
>
>
> > On Jul 23, 2017, at 9:58 PM, Matthias Felleisen 
> wrote:
> >
> >
> > At some point I wrote all this up for the contract doc (as the opening
> paragraphs). I can’t see it right now.
>
>
> Still there:
>
>http://docs.racket-lang.org/guide/contract-boundaries.html
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Matthias Felleisen
[replying to myself]


> On Jul 23, 2017, at 9:58 PM, Matthias Felleisen  wrote:
> 
> 
> At some point I wrote all this up for the contract doc (as the opening 
> paragraphs). I can’t see it right now. 


Still there: 

   http://docs.racket-lang.org/guide/contract-boundaries.html



-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Matthias Felleisen

> On Jul 23, 2017, at 9:43 PM, Philip McGrath  wrote:
> 
> Aha — so it isn't really an issue with serialization at all. If I (now) 
> understand this correctly, when a function produces a contracted higher-order 
> result, it is the responsibility of the caller of the original function to 
> ensure that the result function is always applied to appropriate arguments. 
> That would explain why this version blames intermediary:
> 
> #lang racket
> 
> (module server racket
>   (provide (contract-out
> [adder (-> natural-number/c (-> natural-number/c
> natural-number/c))]))
>   (struct adder (base)
> #:property prop:procedure
> (λ (this x)
>   (+ (adder-base this) x
> (module intermediary racket
>   (require (submod ".." server))
>   (provide add5)
>   (define add5
> (adder 5)))
> (require 'intermediary)
> (add5 'not-a-number)
> 
> I had previously intuited that the obligation would be on the caller of the 
> result function, whoever that might be.


A contract is always between two parties.  For define/contract, it’s the 
definition and the surrounding module (which is btw a nested contract party). 
If a party promises to always apply some function to an odd number (say) but 
then hands out the function to other parties — without protection — it is its 
fault if the function is misused (abused). 

For module exports, it’s obviously the module and its client(s). Here server 
and intermediary enter into a contract that obliges the latter to apply adder 
to a natural number and the function that it creates also to an N. The 
intermediary module uses add5 correctly but then hands out the result w/o 
protection. So when the client (main module) misuses add5, intermediary must be 
blamed. It promised to hand an N to the (curried) second argument of adder and 
didn’t. — The fix is to either not hand out add5 or to equip it with a contract 
so that the client (main) module is also obliged to call it with an N. 

At some point I wrote all this up for the contract doc (as the opening 
paragraphs). I can’t see it right now. 


> When serialization is in the mix, is there a correct way for server to 
> protect itself from instances of adder being abused after they are 
> deserialized? 


Serialization is semantically the identify function. I can’t see how it plays a 
role. 


> 
> 
> -Philip
> 
> On Sun, Jul 23, 2017 at 8:16 PM, Matthias Felleisen  > wrote:
> 
>> On Jul 23, 2017, at 8:54 PM, Philip McGrath > > wrote:
>> 
>> I'm confused about why the following program is blaming the server for the 
>> client's misuse of an applicable struct instance. More generally, I've tried 
>> doing this in several different ways, and I can't figure out how to make 
>> applicable structs that are still protected by contracts after 
>> deserialization and blame the client module for misusing them.
>> 
>> Thanks,
>> Philip
>> 
>> #lang racket
>> 
>> (module server racket
>>   (require racket/serialize)
>>   (provide (contract-out
>> [adder (-> natural-number/c (-> natural-number/c
>> natural-number/c))]))
>>   (struct adder (base)
>> #:property prop:procedure
>> (λ (this x)
>>   (+ (adder-base this) x))
>> #:property prop:serializable
>> (make-serialize-info (λ (this) (vector (adder-base this)))
>>  #'deserialize-info:adder-v0
>>  #f
>>  (or (current-load-relative-directory)
>>  (current-directory
>>   (define/contract make-adder
>> (-> natural-number/c (-> natural-number/c
>>  natural-number/c))
>> adder)
> 
> 
> 
> You defined make-adder with a contract. As far as it is concerned, its 
> contract is with the surrounding module, which is server. Hence if it is 
> misapplied, the server broke the contract of always protecting its entry 
> channels (with a natural-number/c test). 
> 
> 
> 
> 
> 
>>   (define deserialize-info:adder-v0
>> (make-deserialize-info make-adder
>>(λ () (error 'adder
>> "can't have cycles"
>>   (module+ deserialize-info
>> (provide deserialize-info:adder-v0)))
>>   
>> 
>> (require 'server racket/serialize)
>> 
>> ((deserialize (serialize (adder 5))) 'not-a-number)
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users+unsubscr...@googlegroups.com 
>> .
>> For more options, visit https://groups.google.com/d/optout 
>> .
> 
> 

-- 
You received this message because 

Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Philip McGrath
Aha — so it isn't really an issue with serialization at all. If I (now)
understand this correctly, when a function produces a contracted
higher-order result, it is the responsibility of the caller of the original
function to ensure that the result function is always applied to
appropriate arguments. That would explain why this version
blames intermediary:

#lang racket

(module server racket
  (provide (contract-out
[adder (-> natural-number/c (-> natural-number/c
natural-number/c))]))
  (struct adder (base)
#:property prop:procedure
(λ (this x)
  (+ (adder-base this) x
(module intermediary racket
  (require (submod ".." server))
  (provide add5)
  (define add5
(adder 5)))
(require 'intermediary)
(add5 'not-a-number)


I had previously intuited that the obligation would be on the caller of the
result function, whoever that might be.

When serialization is in the mix, is there a correct way for server to
protect itself from instances of adder being abused after they are
deserialized?


-Philip

On Sun, Jul 23, 2017 at 8:16 PM, Matthias Felleisen 
wrote:

>
> On Jul 23, 2017, at 8:54 PM, Philip McGrath 
> wrote:
>
> I'm confused about why the following program is blaming the server for the
> client's misuse of an applicable struct instance. More generally, I've
> tried doing this in several different ways, and I can't figure out how to
> make applicable structs that are still protected by contracts after
> deserialization and blame the client module for misusing them.
>
> Thanks,
> Philip
>
> #lang racket
>
> (module server racket
>   (require racket/serialize)
>   (provide (contract-out
> [adder (-> natural-number/c (-> natural-number/c
> natural-number/c))]))
>   (struct adder (base)
> #:property prop:procedure
> (λ (this x)
>   (+ (adder-base this) x))
> #:property prop:serializable
> (make-serialize-info (λ (this) (vector (adder-base this)))
>  #'deserialize-info:adder-v0
>  #f
>  (or (current-load-relative-directory)
>  (current-directory
>   (define/contract make-adder
> (-> natural-number/c (-> natural-number/c
>  natural-number/c))
> adder)
>
>
>
>
> You defined make-adder with a contract. As far as it is concerned, its
> contract is with the surrounding module, which is server. Hence if it is
> misapplied, the server broke the contract of always protecting its entry
> channels (with a natural-number/c test).
>
>
>
>
>
>   (define deserialize-info:adder-v0
> (make-deserialize-info make-adder
>(λ () (error 'adder
> "can't have cycles"
>   (module+ deserialize-info
> (provide deserialize-info:adder-v0)))
>
>
> (require 'server racket/serialize)
>
> ((deserialize (serialize (adder 5))) 'not-a-number)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 9:07 PM, Jon Zeppieri  wrote:

> Struct update does, however, involve a full copy[...]

Err, immutable struct update, that is (in case it wasn't obvious).

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
 wrote:
>
> How can I make my code more efficient, without changing the basic logic of it?
>

In addition to what I wrote before, there are a couple of places where
you're constructing new lists when you don't need to. In `gini-index`,
for example, you don't actually need the intermediate lists that
you're summing over. You can instead use `for*/sum` and make the whole
thing a lot simpler and more efficient:

(define (gini-index subsets class-labels label-column-index)
  (for*/sum ([subset (in-list subsets)]
 [class-label (in-list class-labels)])
(calc-proportion subset class-label label-column-index)))

(This is under the assumption that you've changed your data
representation to a list, which you really should.)

And you can make a similar simplification to `calc-proportion`.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Matthias Felleisen

> On Jul 23, 2017, at 8:54 PM, Philip McGrath  wrote:
> 
> I'm confused about why the following program is blaming the server for the 
> client's misuse of an applicable struct instance. More generally, I've tried 
> doing this in several different ways, and I can't figure out how to make 
> applicable structs that are still protected by contracts after 
> deserialization and blame the client module for misusing them.
> 
> Thanks,
> Philip
> 
> #lang racket
> 
> (module server racket
>   (require racket/serialize)
>   (provide (contract-out
> [adder (-> natural-number/c (-> natural-number/c
> natural-number/c))]))
>   (struct adder (base)
> #:property prop:procedure
> (λ (this x)
>   (+ (adder-base this) x))
> #:property prop:serializable
> (make-serialize-info (λ (this) (vector (adder-base this)))
>  #'deserialize-info:adder-v0
>  #f
>  (or (current-load-relative-directory)
>  (current-directory
>   (define/contract make-adder
> (-> natural-number/c (-> natural-number/c
>  natural-number/c))
> adder)



You defined make-adder with a contract. As far as it is concerned, its contract 
is with the surrounding module, which is server. Hence if it is misapplied, the 
server broke the contract of always protecting its entry channels (with a 
natural-number/c test). 





>   (define deserialize-info:adder-v0
> (make-deserialize-info make-adder
>(λ () (error 'adder
> "can't have cycles"
>   (module+ deserialize-info
> (provide deserialize-info:adder-v0)))
>   
> 
> (require 'server racket/serialize)
> 
> ((deserialize (serialize (adder 5))) 'not-a-number)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 8:43 PM, David Storrs  wrote:
>
>
> On Sun, Jul 23, 2017 at 8:05 PM, Jon Zeppieri  wrote:
>> - Use a struct instead of a hash to represent a split.
>
>
> Oh, are structs faster than hashes?

Can't make a blanket statement, and the best reason to use a struct in
this case has nothing to do with performance; it's simply that the
code wants an associative data structure with a set of labels (keys)
that are known in advance. The keys aren't really data in this case.

Anyhow, on the matter of performance:

First, there's no hashing involved in struct lookup (which is just an
indexed load) or update. That's already a point in the struct's favor.
Struct update does, however, involve a full copy, so overall
performance will depend on just how much data needs to be copied. In
this case, the structs/hashes have only four keys/labels. And
immutable hashes have a more complex representation and a larger
constant-time overhead. So, I would expect structs to perform better
here, but I doubt that the difference is a big deal.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Blame for contracts on applicable serializable structs

2017-07-23 Thread Philip McGrath
I'm confused about why the following program is blaming the server for the
client's misuse of an applicable struct instance. More generally, I've
tried doing this in several different ways, and I can't figure out how to
make applicable structs that are still protected by contracts after
deserialization and blame the client module for misusing them.

Thanks,
Philip

#lang racket

(module server racket
  (require racket/serialize)
  (provide (contract-out
[adder (-> natural-number/c (-> natural-number/c
natural-number/c))]))
  (struct adder (base)
#:property prop:procedure
(λ (this x)
  (+ (adder-base this) x))
#:property prop:serializable
(make-serialize-info (λ (this) (vector (adder-base this)))
 #'deserialize-info:adder-v0
 #f
 (or (current-load-relative-directory)
 (current-directory
  (define/contract make-adder
(-> natural-number/c (-> natural-number/c
 natural-number/c))
adder)
  (define deserialize-info:adder-v0
(make-deserialize-info make-adder
   (λ () (error 'adder
"can't have cycles"
  (module+ deserialize-info
(provide deserialize-info:adder-v0)))


(require 'server racket/serialize)

((deserialize (serialize (adder 5))) 'not-a-number)

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread David Storrs
On Sun, Jul 23, 2017 at 8:05 PM, Jon Zeppieri  wrote:

> On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
>  wrote:
> > Hi Racket Users,
> >
> > The last few days I've been working on implementing decision trees in
> Racket and I've been following the following guide:
> http://machinelearningmastery.com/implement-decision-tree-
> algorithm-scratch-python/
> >
> > Now I have the following code: https://github.com/
> ZelphirKaltstahl/racket-ml
> >
> > I also wrote some tests, I think for every procedure so far.
> >
> > However, my implementation seems very very slow. It seems each iteration
> of `iter-features` takes way too much time.
> >
> > I've tried to stick to the guide and sometimes "outsourced" some
> procedure.
> >
> > I started out with using vectors, as I thought I might gain better
> performance than from lists. In the code I introduced an abstraction layer,
> which provides things like `data-length`, so that I could in theory change
> the representation of data and only change those accessors/getters. In the
> test cases I sometimes did not use the abstraction though.
> >
> > So far I am not having much side effects in the code and I'd like to
> avoid them and unsafe operations.
> >
> > A small `TEST-DATA` set is in the code and another data set I downloaded
> from the data set repositories. When running with `TEST-DATA` to calculate
> the best split, it only takes a few milliseconds, while it takes minutes
> with the other `data-set`.
> >
> > How can I make my code more efficient, without changing the basic logic
> of it?
> > Should I not use vectors (what else?)?
> > Would I gain anything from using typed Racket or flonums?
> >
>
>
> I haven't taken a close enough look to evaluate the algorithm itself,
> but on a micro-scale:
>
> - Using `vector-take-right` to get the tail of a vector in a loop is
> expensive. From what I can see, you'll be better off representing your
> data as a list of vectors. (I see you have a `data-get-row` function,
> suggesting a need for random access, but you don't appear to be using
> it.)
>
As a general rule, all of your code appears to be using 'generate a new
vector with  modification' functions.  I would probably look at creating
new vectors only for mutation while otherwise reusing the vectors and
instead manipulating a list of indices.



> - Use a struct instead of a hash to represent a split.
>

Oh, are structs faster than hashes?

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Matthias Felleisen

More generally, this seems to be a raw translation of an algorithm stated in a 
language that has a rather lowlevel view of data. Perhaps this is needed for 
this world, but I doubt it. 

I teach trees and decision trees to freshman students who have never programmed 
before, and Racket’s forms of data and functions are extremely suitable to this 
domain. 




> On Jul 23, 2017, at 8:05 PM, Jon Zeppieri  wrote:
> 
> On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
> > wrote:
>> Hi Racket Users,
>> 
>> The last few days I've been working on implementing decision trees in Racket 
>> and I've been following the following guide: 
>> http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
>> 
>> Now I have the following code: https://github.com/ZelphirKaltstahl/racket-ml
>> 
>> I also wrote some tests, I think for every procedure so far.
>> 
>> However, my implementation seems very very slow. It seems each iteration of 
>> `iter-features` takes way too much time.
>> 
>> I've tried to stick to the guide and sometimes "outsourced" some procedure.
>> 
>> I started out with using vectors, as I thought I might gain better 
>> performance than from lists. In the code I introduced an abstraction layer, 
>> which provides things like `data-length`, so that I could in theory change 
>> the representation of data and only change those accessors/getters. In the 
>> test cases I sometimes did not use the abstraction though.
>> 
>> So far I am not having much side effects in the code and I'd like to avoid 
>> them and unsafe operations.
>> 
>> A small `TEST-DATA` set is in the code and another data set I downloaded 
>> from the data set repositories. When running with `TEST-DATA` to calculate 
>> the best split, it only takes a few milliseconds, while it takes minutes 
>> with the other `data-set`.
>> 
>> How can I make my code more efficient, without changing the basic logic of 
>> it?
>> Should I not use vectors (what else?)?
>> Would I gain anything from using typed Racket or flonums?
>> 
> 
> 
> I haven't taken a close enough look to evaluate the algorithm itself,
> but on a micro-scale:
> 
> - Using `vector-take-right` to get the tail of a vector in a loop is
> expensive. From what I can see, you'll be better off representing your
> data as a list of vectors. (I see you have a `data-get-row` function,
> suggesting a need for random access, but you don't appear to be using
> it.)
> - Use a struct instead of a hash to represent a split.
> - `split-data` could partition the data list in a single pass, instead
> of making two passes. (You can use the `partition` function from
> racket/list.)
> - If I'm reading this right, for a given data set, you should be able
> to memoize calls to `data-get-col`.
> - From a quick glance, it looks like you're already using floats,
> rather than exact rationals.
> - Are you running this code inside DrRacket? If so, have you timed the
> difference between running it with debugging enabled and with no
> debugging or profiling? (Language -> Choose Language... -> Details)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Jon Zeppieri
On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
 wrote:
> Hi Racket Users,
>
> The last few days I've been working on implementing decision trees in Racket 
> and I've been following the following guide: 
> http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
>
> Now I have the following code: https://github.com/ZelphirKaltstahl/racket-ml
>
> I also wrote some tests, I think for every procedure so far.
>
> However, my implementation seems very very slow. It seems each iteration of 
> `iter-features` takes way too much time.
>
> I've tried to stick to the guide and sometimes "outsourced" some procedure.
>
> I started out with using vectors, as I thought I might gain better 
> performance than from lists. In the code I introduced an abstraction layer, 
> which provides things like `data-length`, so that I could in theory change 
> the representation of data and only change those accessors/getters. In the 
> test cases I sometimes did not use the abstraction though.
>
> So far I am not having much side effects in the code and I'd like to avoid 
> them and unsafe operations.
>
> A small `TEST-DATA` set is in the code and another data set I downloaded from 
> the data set repositories. When running with `TEST-DATA` to calculate the 
> best split, it only takes a few milliseconds, while it takes minutes with the 
> other `data-set`.
>
> How can I make my code more efficient, without changing the basic logic of it?
> Should I not use vectors (what else?)?
> Would I gain anything from using typed Racket or flonums?
>


I haven't taken a close enough look to evaluate the algorithm itself,
but on a micro-scale:

- Using `vector-take-right` to get the tail of a vector in a loop is
expensive. From what I can see, you'll be better off representing your
data as a list of vectors. (I see you have a `data-get-row` function,
suggesting a need for random access, but you don't appear to be using
it.)
- Use a struct instead of a hash to represent a split.
- `split-data` could partition the data list in a single pass, instead
of making two passes. (You can use the `partition` function from
racket/list.)
- If I'm reading this right, for a given data set, you should be able
to memoize calls to `data-get-col`.
- From a quick glance, it looks like you're already using floats,
rather than exact rationals.
- Are you running this code inside DrRacket? If so, have you timed the
difference between running it with debugging enabled and with no
debugging or profiling? (Language -> Choose Language... -> Details)

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Decision Tree in Racket - Performance

2017-07-23 Thread Zelphir Kaltstahl
Hi Racket Users,

The last few days I've been working on implementing decision trees in Racket 
and I've been following the following guide: 
http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/

Now I have the following code: https://github.com/ZelphirKaltstahl/racket-ml

I also wrote some tests, I think for every procedure so far.

However, my implementation seems very very slow. It seems each iteration of 
`iter-features` takes way too much time.

I've tried to stick to the guide and sometimes "outsourced" some procedure.

I started out with using vectors, as I thought I might gain better performance 
than from lists. In the code I introduced an abstraction layer, which provides 
things like `data-length`, so that I could in theory change the representation 
of data and only change those accessors/getters. In the test cases I sometimes 
did not use the abstraction though.

So far I am not having much side effects in the code and I'd like to avoid them 
and unsafe operations.

A small `TEST-DATA` set is in the code and another data set I downloaded from 
the data set repositories. When running with `TEST-DATA` to calculate the best 
split, it only takes a few milliseconds, while it takes minutes with the other 
`data-set`.

How can I make my code more efficient, without changing the basic logic of it?
Should I not use vectors (what else?)?
Would I gain anything from using typed Racket or flonums?

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Struct declaration conflict if a file is required implicitly

2017-07-23 Thread Alejandro Sanchez
Thank you so much! I feel so stupid now, that file path is a leftover from when 
the directory structure was different. Now it works perfectly.

> On 23 Jul 2017, at 17:43, Ryan Culpepper  wrote:
> 
> On 07/23/2017 07:26 AM, Alejandro Sanchez wrote:
>> Hello everyone,
>> I am working on this project: https://gitlab.com/HiPhish/MsgPack.rkt/
>> I am writing test cases and I ran into a problem with my ‘ext’ structure. It 
>> is declared in the file ‘msgpack/main.rkt’, which is required in the file 
>> ‘msgpack/pack.rkt’ (also in ‘msgpack/unpack.rkt’). For my test case the test 
>> file looks like this:
> 
> It looks like msgpack/pack.rkt requires "../main.rkt" rather than "main.rkt". 
> There isn't a "../main.rkt" checked in, so maybe you have a stale file 
> getting loaded? (It could be at "../main.rkt" or possibly 
> "../compiled/main_rkt.zo".)
> 
> Ryan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Struct declaration conflict if a file is required implicitly

2017-07-23 Thread hiphish
On Sunday, July 23, 2017 at 5:43:51 PM UTC+2, Ryan Culpepper wrote:
> On 07/23/2017 07:26 AM, Alejandro Sanchez wrote:
> > Hello everyone,
> > 
> > I am working on this project: https://gitlab.com/HiPhish/MsgPack.rkt/
> > 
> > I am writing test cases and I ran into a problem with my ‘ext’ structure. 
> > It is declared in the file ‘msgpack/main.rkt’, which is required in the 
> > file ‘msgpack/pack.rkt’ (also in ‘msgpack/unpack.rkt’). For my test case 
> > the test file looks like this:
> 
> It looks like msgpack/pack.rkt requires "../main.rkt" rather than 
> "main.rkt". There isn't a "../main.rkt" checked in, so maybe you have a 
> stale file getting loaded? (It could be at "../main.rkt" or possibly 
> "../compiled/main_rkt.zo".)
> 
> Ryan

Thank you so much! I feel so stupid now, that file path is a leftover from when 
the directory structure was different. Now it works perfectly.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Struct declaration conflict if a file is required implicitly

2017-07-23 Thread Ryan Culpepper

On 07/23/2017 07:26 AM, Alejandro Sanchez wrote:

Hello everyone,

I am working on this project: https://gitlab.com/HiPhish/MsgPack.rkt/

I am writing test cases and I ran into a problem with my ‘ext’ structure. It is 
declared in the file ‘msgpack/main.rkt’, which is required in the file 
‘msgpack/pack.rkt’ (also in ‘msgpack/unpack.rkt’). For my test case the test 
file looks like this:


It looks like msgpack/pack.rkt requires "../main.rkt" rather than 
"main.rkt". There isn't a "../main.rkt" checked in, so maybe you have a 
stale file getting loaded? (It could be at "../main.rkt" or possibly 
"../compiled/main_rkt.zo".)


Ryan

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[racket-users] Struct declaration conflict if a file is required implicitly

2017-07-23 Thread Alejandro Sanchez
Hello everyone,

I am working on this project: https://gitlab.com/HiPhish/MsgPack.rkt/

I am writing test cases and I ran into a problem with my ‘ext’ structure. It is 
declared in the file ‘msgpack/main.rkt’, which is required in the file 
‘msgpack/pack.rkt’ (also in ‘msgpack/unpack.rkt’). For my test case the test 
file looks like this:

(require
  quickcheck
  rackunit/quickcheck
  (file "../../msgpack/main.rkt")
  (file "../../msgpack/pack.rkt"))

(check-property
  (property ()
(let ([obj (ext #x01 (bytes #x02))])
  (bytes=? (call-with-output-bytes (λ (out) (pack obj out)))
   (bytes-append (bytes #xD4 (ext-type obj))
 (ext-data obj))

Here is what happens from my understanding: when I required ‘main.rkt’ the 
structure declaration got evaluated, creating all the functions that go along 
with it, including ‘ext’ and ‘ext?’. Then when I required the ‘pack.rkt’ file 
those declarations got evaluated again and created a new set of ext-related 
functions that just happen to have the same name. This is why I  the object I’m 
trying to pack falls through all the ‘cont’ cases.

This isn’t limited to the test file, I also tried the following on the REPL 
with the same results:

Welcome to Racket v6.9.
> (require msgpack msgpack/pack)
> (define e (ext 1 (bytes 2 3)))
> (define out (open-output-bytes))
> (pack e out)
; Type not supported by MessagePack [,bt for context]
> (pack-ext e out)
; pack-ext: contract violation
;   expected: ext?
;   given: #
;   in: the 1st argument of
;   (->
;ext?
;(and/c output-port? (not/c port-closed?))
;any)
;   contract from:
;   /msgpack-rkt/msgpack/pack.rkt
;   blaming: top-level
;(assuming the contract is correct)
;   at: /msgpack-rkt/msgpack/pack.rkt:53.5
; [,bt for context]
> ,bt
; pack-ext: contract violation
;   expected: ext?
;   given: #
;   in: the 1st argument of
;   (->
;ext?
;(and/c output-port? (not/c port-closed?))
;any)
;   contract from:
;   /msgpack-rkt/msgpack/pack.rkt
;   blaming: top-level
;(assuming the contract is correct)
;   at: /msgpack-rkt/msgpack/pack.rkt:53.5
;   context...:
;
/usr/local/Cellar/minimal-racket/6.9/share/racket/collects/racket/contract/private/blame.rkt:159:0:
 raise-blame-error16
;
/usr/local/Cellar/minimal-racket/6.9/share/racket/collects/racket/contract/private/arrow-val-first.rkt:357:18
;
/usr/local/Cellar/minimal-racket/6.9/share/racket/pkgs/xrepl-lib/xrepl/xrepl.rkt:1448:0
;
/usr/local/Cellar/minimal-racket/6.9/share/racket/collects/racket/private/misc.rkt:88:7
>

Adding a ‘#:prefab’ to the end of the struct declaration does not solve the 
problem. What am I doing wrong? Both the packing and unpacking need the ‘ext’ 
type for conformance with MessagePack. Do I have to make a large umbrella 
module that provides the entire API? I would prefer is users could just 
‘require’ parts of the library as they need them (i.e. only ‘(require 
msgpack/pack)’ if you only want to unpack data).

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.