Re: [pollen] Getting the first N words: speed comparison

2019-03-20 Thread Matthew Butterick


> On Mar 20, 2019, at 8:19 PM, Joel Dueck  wrote:
> 
> The regular-expression version is slower, much more so, it seems, the larger 
> the first txexpr that you give it.
> 
> I am sure both functions could be made much faster. In particular the regex 
> version matches all the words in each string, there is probably a better 
> pattern that would stop after the first N words.

Your regexp-based function is slower because of `regexp-match*`, which eagerly 
finds all the matches (whether you need them or not). Whereas the port-based 
function is faster because it works incrementally. 

But you can do both at the same time, by passing an input port as the argument 
to `regexp-match`. In this example, the pattern is matched incrementally, and 
if we don't get enough words, we incrementally process the next txexpr.

(require racket/string)
(define (first-words-regex2 txs n)
  (define words
(let loop ([txs txs][n n])
  (define ip (open-input-string (tx-strs (car txs
  (define words (for*/list ([i (in-range n)]
[bs (in-value (regexp-match #px"\\w+" ip))]
#:break (not bs))
  (bytes->string/utf-8 (car bs
  (if (= (length words) n)
  words
  (append words (loop (cdr txs) (- n (length words)))
  (string-join words " "))

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Joel Dueck
Yes, first-words-regex2 is pretty much identical in performance to my 
longer regex-less version. Thanks for the pointer! I was not familiar with 
the use of regexp-match functions on an input port. It’s a little wild to 
me how even using a string port, a general-purpose pattern matching 
function can be just about as fast as a custom-built loop that knows 
exactly what it wants. But the regex library has probably been pretty well 
optimized by now.

raco test: (submod "test-func.rkt" test)
cpu time: 140 real time: 140 gc time: 9  ; first-words
cpu time: 117 real time: 117 gc time: 0   ; first-words-regex2
"She counted one two silently"
"She counted one two silently"
cpu time: 124 real time: 124 gc time: 0
cpu time: 122 real time: 122 gc time: 0
"Stop! she called. She was"
"Stop she called. She was"
cpu time: 345 real time: 346 gc time: 0
cpu time: 358 real time: 358 gc time: 2
"In a 2005 episode of"
"In a 2005 episode of"

On Thursday, March 21, 2019 at 12:39:24 AM UTC-5, Matthew Butterick wrote:
>
>
>
> On Mar 20, 2019, at 8:19 PM, Joel Dueck > 
> wrote:
>
> The regular-expression version is slower, much more so, it seems, the 
> larger the first txexpr that you give it.
>
> I am sure both functions could be made much faster. In particular the 
> regex version matches all the words in each string, there is probably a 
> better pattern that would stop after the first N words.
>
>
> Your regexp-based function is slower because of `regexp-match*`, which 
> eagerly finds all the matches (whether you need them or not). Whereas the 
> port-based function is faster because it works incrementally. 
>
> But you can do both at the same time, by passing an input port as the 
> argument to `regexp-match`. In this example, the pattern is matched 
> incrementally, and if we don't get enough words, we incrementally process 
> the next txexpr.
>
> (require racket/string)
> (define (first-words-regex2 txs n)
>   (define words
> (let loop ([txs txs][n n])
>   (define ip (open-input-string (tx-strs (car txs
>   (define words (for*/list ([i (in-range n)]
> [bs (in-value (regexp-match #px"\\w+" ip))]
> #:break (not bs))
>   (bytes->string/utf-8 (car bs
>   (if (= (length words) n)
>   words
>   (append words (loop (cdr txs) (- n (length words)))
>   (string-join words " "))
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Matthew Butterick
As a Racket rule of thumb, I find that most efforts toward "custom-built loops" 
end in defeat, because the Racket macro expander and JIT compiler are aware of 
better optimizations. If I were writing another book on Racket, it would be 
High-Performance Racket, which I know more about than I used to, but still not 
very much ;)


> On Mar 21, 2019, at 12:15 PM, Joel Dueck  wrote:
> 
> Yes, first-words-regex2 is pretty much identical in performance to my longer 
> regex-less version. Thanks for the pointer! I was not familiar with the use 
> of regexp-match functions on an input port. It’s a little wild to me how even 
> using a string port, a general-purpose pattern matching function can be just 
> about as fast as a custom-built loop that knows exactly what it wants. But 
> the regex library has probably been pretty well optimized by now.

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Joel Dueck
I could have used such a book during the 2018 Advent of Code. I seem to 
recall I spent a lot of time scrounging for faster ways to do things after 
the simple, “idiomatic” Racket approach (or rather my novice-level 
conceptions of those approaches) resulted in extremely slow code. I was 
more or less successful in most cases[1] but I was never able to match the 
speeds that the Python users said they were achieving.

[1]: https://thenotepad.org/repos/aoc2018/

On Thursday, March 21, 2019 at 7:10:49 PM UTC-5, Matthew Butterick wrote:
>
> As a Racket rule of thumb, I find that most efforts toward "custom-built 
> loops" end in defeat, because the Racket macro expander and JIT compiler 
> are aware of better optimizations. If I were writing another book on 
> Racket, it would be *High-Performance Racket*, which I know more about 
> than I used to, but still not very much ;)
>
>
> On Mar 21, 2019, at 12:15 PM, Joel Dueck > 
> wrote:
>
> Yes, first-words-regex2 is pretty much identical in performance to my 
> longer regex-less version. Thanks for the pointer! I was not familiar with 
> the use of regexp-match functions on an input port. It’s a little wild to 
> me how even using a string port, a general-purpose pattern matching 
> function can be just about as fast as a custom-built loop that knows 
> exactly what it wants. But the regex library has probably been pretty well 
> optimized by now.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [pollen] Getting the first N words: speed comparison

2019-03-21 Thread Matthew Butterick
I like Eric Wastl’s work (so much that I licensed one of his puzzles for 
Beautiful Racket). He often designs them to foil naive algorithms. What other 
languages call a list is often better modeled in Racket as a vector. I did the 
initial 2018 puzzles specifically with the goal of making them fast. In one 
case I switched to mutable pairs; in another, a double linked list (rather than 
the usual, which is single linked) to avoid excessive traversal. In general I 
often try to think of algorithmic problems in terms of physical movement, and 
then how to minimize that movement. 

https://github.com/mbutterick/aoc-racket/tree/master/2018

> On Mar 21, 2019, at 7:24 PM, Joel Dueck  wrote:
> 
> I could have used such a book during the 2018 Advent of Code. I seem to 
> recall I spent a lot of time scrounging for faster ways to do things after 
> the simple, “idiomatic” Racket approach (or rather my novice-level 
> conceptions of those approaches) resulted in extremely slow code. I was more 
> or less successful in most cases[1] but I was never able to match the speeds 
> that the Python users said they were achieving.
> 
> [1]: https://thenotepad.org/repos/aoc2018/
> 
>> On Thursday, March 21, 2019 at 7:10:49 PM UTC-5, Matthew Butterick wrote:
>> As a Racket rule of thumb, I find that most efforts toward "custom-built 
>> loops" end in defeat, because the Racket macro expander and JIT compiler are 
>> aware of better optimizations. If I were writing another book on Racket, it 
>> would be High-Performance Racket, which I know more about than I used to, 
>> but still not very much ;)
>> 
>> 
>>> On Mar 21, 2019, at 12:15 PM, Joel Dueck  wrote:
>>> 
>>> Yes, first-words-regex2 is pretty much identical in performance to my 
>>> longer regex-less version. Thanks for the pointer! I was not familiar with 
>>> the use of regexp-match functions on an input port. It’s a little wild to 
>>> me how even using a string port, a general-purpose pattern matching 
>>> function can be just about as fast as a custom-built loop that knows 
>>> exactly what it wants. But the regex library has probably been pretty well 
>>> optimized by now.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Pollen" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pollenpub+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pollenpub+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.