> On Mar 20, 2019, at 8:19 PM, Joel Dueck <dueckofe...@gmail.com> wrote: > > The regular-expression version is slower, much more so, it seems, the larger > the first txexpr that you give it. > > I am sure both functions could be made much faster. In particular the regex > version matches all the words in each string, there is probably a better > pattern that would stop after the first N words.
Your regexp-based function is slower because of `regexp-match*`, which eagerly finds all the matches (whether you need them or not). Whereas the port-based function is faster because it works incrementally. But you can do both at the same time, by passing an input port as the argument to `regexp-match`. In this example, the pattern is matched incrementally, and if we don't get enough words, we incrementally process the next txexpr. (require racket/string) (define (first-words-regex2 txs n) (define words (let loop ([txs txs][n n]) (define ip (open-input-string (tx-strs (car txs)))) (define words (for*/list ([i (in-range n)] [bs (in-value (regexp-match #px"\\w+" ip))] #:break (not bs)) (bytes->string/utf-8 (car bs)))) (if (= (length words) n) words (append words (loop (cdr txs) (- n (length words))))))) (string-join words " ")) -- You received this message because you are subscribed to the Google Groups "Pollen" group. To unsubscribe from this group and stop receiving emails from it, send an email to pollenpub+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.