Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Matthew Butterick
As a Racket rule of thumb, I find that most efforts toward "custom-built loops" 
end in defeat, because the Racket macro expander and JIT compiler are aware of 
better optimizations. If I were writing another book on Racket, it would be 
High-Performance Racket, which I know more about than I used to, but still not 
very much ;)


> On Mar 21, 2019, at 12:15 PM, Joel Dueck  wrote:
> 
> Yes, first-words-regex2 is pretty much identical in performance to my longer 
> regex-less version. Thanks for the pointer! I was not familiar with the use 
> of regexp-match functions on an input port. It’s a little wild to me how even 
> using a string port, a general-purpose pattern matching function can be just 
> about as fast as a custom-built loop that knows exactly what it wants. But 
> the regex library has probably been pretty well optimized by now.

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Joel Dueck
Yes, first-words-regex2 is pretty much identical in performance to my 
longer regex-less version. Thanks for the pointer! I was not familiar with 
the use of regexp-match functions on an input port. It’s a little wild to 
me how even using a string port, a general-purpose pattern matching 
function can be just about as fast as a custom-built loop that knows 
exactly what it wants. But the regex library has probably been pretty well 
optimized by now.

raco test: (submod "test-func.rkt" test)
cpu time: 140 real time: 140 gc time: 9  ; first-words
cpu time: 117 real time: 117 gc time: 0   ; first-words-regex2
"She counted one two silently"
"She counted one two silently"
cpu time: 124 real time: 124 gc time: 0
cpu time: 122 real time: 122 gc time: 0
"Stop! she called. She was"
"Stop she called. She was"
cpu time: 345 real time: 346 gc time: 0
cpu time: 358 real time: 358 gc time: 2
"In a 2005 episode of"
"In a 2005 episode of"

On Thursday, March 21, 2019 at 12:39:24 AM UTC-5, Matthew Butterick wrote:
>
>
>
> On Mar 20, 2019, at 8:19 PM, Joel Dueck > 
> wrote:
>
> The regular-expression version is slower, much more so, it seems, the 
> larger the first txexpr that you give it.
>
> I am sure both functions could be made much faster. In particular the 
> regex version matches all the words in each string, there is probably a 
> better pattern that would stop after the first N words.
>
>
> Your regexp-based function is slower because of `regexp-match*`, which 
> eagerly finds all the matches (whether you need them or not). Whereas the 
> port-based function is faster because it works incrementally. 
>
> But you can do both at the same time, by passing an input port as the 
> argument to `regexp-match`. In this example, the pattern is matched 
> incrementally, and if we don't get enough words, we incrementally process 
> the next txexpr.
>
> (require racket/string)
> (define (first-words-regex2 txs n)
>   (define words
> (let loop ([txs txs][n n])
>   (define ip (open-input-string (tx-strs (car txs
>   (define words (for*/list ([i (in-range n)]
> [bs (in-value (regexp-match #px"\\w+" ip))]
> #:break (not bs))
>   (bytes->string/utf-8 (car bs
>   (if (= (length words) n)
>   words
>   (append words (loop (cdr txs) (- n (length words)))
>   (string-join words " "))
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.