Re: [racket-users] Word Count program/benchmark performance

2021-03-20 Thread Bogdan Popa
On top of these changes, replacing the hash function with Chez's
`equal-hash` saves another 50ms.  That gets the runtime down to around
900ms on my machine, including Racket startup time.  Ignoring startup
time, this version beats the `simple.c` implementation in the original
repo by about 50ms (800ms vs 850ms).

https://gist.github.com/Bogdanp/b9256e1a91de9083830cb616b3659ff8

Gustavo Massaccesi writes:

> With two additional tricks I saved like 100ms.
>
> * Saving the output port instead of reading the parameter implicitly each
> time.
>
> * Replacing (write (cdr p)) with (write-fx cdr p)) where
>
> (define (write-fx n [o (current-output-port)])
>   ; TODO: Add negatives :)
>   (if (fx> n 0)
>   (let loop ([n n])
> (when (fx> n 10)
>   (loop (fxquotient n 10)))
> (write-byte (fx+ 48 (fxremainder n 10)) o))
>   (write-byte 48 o)))
>
> and at the end of a program something like
>
> (define o (current-output-port))
>   (time (for ([p (in-vector items)]
> #:break (not (pair? p)))
> (write-bytes (car p) o)
> (write-byte 32 o)
> (write-fx (cdr p) o)
> (write-byte 10 o)))  ; and a closing )
>
> Gustavo
>
> On Fri, Mar 19, 2021 at 1:22 PM Sam Tobin-Hochstadt 
> wrote:
>
>> I went from numbers around 1000 ms to 950 ms to 900 ms. There was
>> variance around those numbers, but it was pretty consistent.
>>
>> For more precise answers, there are a few things you can try. One is
>> to measure instructions instead of time (ie, with perf). Another is to
>> run it a bunch of times and take an average. The `hyperfine` tool is
>> good for that. But probably the best advice is to make the program
>> take longer so differences are more apparent -- variation usually
>> increases sub-linearly.
>>
>> Sam
>>
>> On Fri, Mar 19, 2021 at 12:17 PM Laurent  wrote:
>> >
>> > Sam: How do you accurately measure such small speed-ups? On my machines,
>> if I run the same program twice, I can sometimes see more than 10% time
>> difference.
>> >
>> > On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt <
>> sa...@cs.indiana.edu> wrote:
>> >>
>> >> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
>> >> ms on my machine.
>> >>
>> >> Then getting rid of `set!` and just re-binding the relevant variables
>> >> produced another 50 ms speedup.
>> >>
>> >> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
>> >>
>> >> Sam
>> >>
>> >> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
>> >>  wrote:
>> >> >
>> >> > One minor additional suggestion: if you use #:authentic for the
>> struct, it will generate slightly better code for the accessors.
>> >> >
>> >> > Sam
>> >> >
>> >> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
>> >> >>
>> >> >> I updated the gist with some cleanups and additional improvements
>> that
>> >> >> get the runtime down to a little over 1s (vs ~350ms for the
>> optimized C
>> >> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>> >> >>
>> >> >> Pawel Mosakowski writes:
>> >> >>
>> >> >> > Hi Bogdan,
>> >> >> >
>> >> >> > This is a brilliant solution and also completely over my head. It
>> finishes
>> >> >> > in ~3.75s on my PC and is faster than the Python version which
>> basically
>> >> >> > delegates all the work to C. I will need to spend some time on
>> >> >> > understanding it but I am looking forward to learning something
>> new.
>> >> >> >
>> >> >> > Many thanks,
>> >> >> > Pawel
>> >> >> >
>> >> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>> >> >> >
>> >> >> >> I managed to get it about as fast as Python by making it really
>> >> >> >> imperative and rolling my own hash:
>> >> >> >>
>> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>> >> >> >>
>> >> >> >> Sam Tobin-Hochstadt writes:
>> >> >> >>
>> >> >> >> > Here are several variants of the code:
>> >> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >> >> >> >
>> >> >> >> > The enabled version is about the fastest I can get without using
>> >> >> >> > `unsafe` (which the rules said not to do). It's possible to
>> optimize a
>> >> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds
>> -- it
>> >> >> >> > would be more significant if there were more different words.
>> >> >> >> >
>> >> >> >> > Switching to bytes works correctly for the given task, but
>> wouldn't
>> >> >> >> > always work in the case of general UTF8 input. But those
>> versions
>> >> >> >> > appeared not be faster for me. Also, writing my own
>> string-downcase
>> >> >> >> > didn't help. And using a big buffer and doing my own newline
>> splitting
>> >> >> >> > didn't help either.
>> >> >> >> >
>> >> >> >> > The version using just a regexp matching on a port (suggested by
>> >> >> >> > Robby) turned out not to be faster either, so my suspicion is
>> that the
>> >> >> >> > original slowness is just using regexps for splitting words.
>> >> >> >> >
>> >> >> >> > Sam
>> >> >> >> >
>> >> >> >> > On Thu, Mar

Re: [racket-users] Re: [ANNOUNCE] Xiden is now in beta

2021-03-20 Thread Sage Gerard
Hi Joel,

> On some future day when Xiden is out of beta, what are authors of “normal” 
> Racket packages doing to make their packages available to Xiden users?

Basically cross-posting to zcpkg.com (don't visit, nothing is there yet). I'd 
expect the bureaucracy of signing/hashing would be handled by a specialized 
client, since I'm trying to keep Xiden open-ended. The value would be in added 
safeguards (better crytographic hash functions + hosting signatures), and 
additional options for transporting artifacts (e.g. torrents). There's also the 
fact that Xiden handles software in general, not just Racket packages. So I'd 
give this answer to the same question in other mailing lists.

> For example, are we zipping, hashing and signing every “release” and 
> uploading it somewhere (our own web server or a 3rd party catalog)? Or is the 
> typical Xiden user manually creating their own catalogs and packages from 
> others’ code after a thorough vetting?

One can do all of that now. I'm not asking for people to try that here, but it 
would be helpful to know what their experience was like if they tried.

Vetting is a case-by-case problem that deals more with trust than Guix-specific 
knowledge. For example, Racket downloads do not include signatures, and use 
SHA1 digests. The self-hosting example in the repository installs a 
self-contained Racket + Xiden stack, and the Racket installation script is 
signed with my private key. That's why package definitions identify me as a 
provider (as in "distributor") and not as an author. That's an entirely 
different picture from a vetting perspective, because someone might trust a 
Racket distribution simply because it came from racket-lang.org over HTTPS, but 
they might not want to trust my public key outside of the context of that 
example. But if this were a Python source release, I'd just paste in the 
related signature and let the user affirm trust in a public key belonging to 
Pablo G. Salgado.

On 3/20/21 1:39 PM, 'Joel Dueck' via Racket Users wrote:

> Racket’s existing package system doesn’t pose any felt problems for me, but I 
> still find this project very interesting.
>
> On some future day when Xiden is out of beta, what are authors of “normal” 
> Racket packages doing to make their packages available to Xiden users? For 
> example, are we zipping, hashing and signing every “release” and uploading it 
> somewhere (our own web server or a 3rd party catalog)? Or is the typical 
> Xiden user manually creating their own catalogs and packages from others’ 
> code after a thorough vetting? (Maybe if I were more familiar with Guix I 
> would already know the answer to this)
>
> On Friday, March 19, 2021 at 3:56:18 PM UTC-5 Sage Gerard wrote:
>
>> Hi folks,
>>
>> About a year, 1384 commits, 489 tests, ~10k LOC, and 2" on my waistline 
>> later, Xiden is in beta. An update is pending on the default catalog.
>>
>> https://github.com/zyrolasting/xiden
>>
>> Xiden is a dependency manager I wrote to support use cases that I could not 
>> get working with `raco pkg`.
>>
>> Dependency management is hard, so Xiden was something I originally didn't 
>> want to make. However, it ended up becoming one of my most aspirational 
>> projects, and I'm proud of how it ended up. If you could take the time to 
>> read a longer email, I'd like to share a bit about how it might be helpful 
>> to you.
>>
>> ***
>>
>> Like Guix, Xiden supports deterministic and atomic installations. Unlike 
>> Guix, Xiden is cross-platform.
>>
>> The Racket programs I write no longer have to assume that code comes in 
>> collections (outside of the built-in ones).
>>
>> You can force dependencies of different versions to resolve to the same data 
>> to avoid issues with non-eq? bindings [multiver].
>>
>> Dependencies are accessed by symbolic links with names defined by the 
>> dependent. So if two packages are called "uri", you can still install them 
>> both under names that are meaningful to you. Dependencies are fulfilled the 
>> same way, regardless if the dependent is a human or more software.
>>
>> Explicit, affirmative consent is fundamental to Xiden's workings. The 
>> default configuration is zero-trust (a.k.a. "Deny All"). Trust in 
>> cryptographic hash functions and public keys (or any bytes lacking either) 
>> must be declared to authenticate bytesfrom any source (even hard coded!). 
>> Not doing so will cause Xiden to reject data, but print an error that 
>> helpfully instructs you how to consent to the scenario. For those wanting 
>> convenience, there are "blanket" configuration options to consent to every 
>> instance of those scenarios. This makes Xiden a way to educate users on the 
>> exact shape and nature of the risks they accept with something from the 
>> Internet. In this sense, Xiden does not invent anything new with security. 
>> It only aims to get ahead of the "Allow Some" arms-race in other dependency 
>> managers like NPM.
>>
>> Customization comes from a plugin modul

Re: [racket-users] Word Count program/benchmark performance

2021-03-20 Thread Gustavo Massaccesi
With two additional tricks I saved like 100ms.

* Saving the output port instead of reading the parameter implicitly each
time.

* Replacing (write (cdr p)) with (write-fx cdr p)) where

(define (write-fx n [o (current-output-port)])
  ; TODO: Add negatives :)
  (if (fx> n 0)
  (let loop ([n n])
(when (fx> n 10)
  (loop (fxquotient n 10)))
(write-byte (fx+ 48 (fxremainder n 10)) o))
  (write-byte 48 o)))

and at the end of a program something like

(define o (current-output-port))
  (time (for ([p (in-vector items)]
#:break (not (pair? p)))
(write-bytes (car p) o)
(write-byte 32 o)
(write-fx (cdr p) o)
(write-byte 10 o)))  ; and a closing )

Gustavo

On Fri, Mar 19, 2021 at 1:22 PM Sam Tobin-Hochstadt 
wrote:

> I went from numbers around 1000 ms to 950 ms to 900 ms. There was
> variance around those numbers, but it was pretty consistent.
>
> For more precise answers, there are a few things you can try. One is
> to measure instructions instead of time (ie, with perf). Another is to
> run it a bunch of times and take an average. The `hyperfine` tool is
> good for that. But probably the best advice is to make the program
> take longer so differences are more apparent -- variation usually
> increases sub-linearly.
>
> Sam
>
> On Fri, Mar 19, 2021 at 12:17 PM Laurent  wrote:
> >
> > Sam: How do you accurately measure such small speed-ups? On my machines,
> if I run the same program twice, I can sometimes see more than 10% time
> difference.
> >
> > On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt <
> sa...@cs.indiana.edu> wrote:
> >>
> >> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
> >> ms on my machine.
> >>
> >> Then getting rid of `set!` and just re-binding the relevant variables
> >> produced another 50 ms speedup.
> >>
> >> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
> >>
> >> Sam
> >>
> >> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
> >>  wrote:
> >> >
> >> > One minor additional suggestion: if you use #:authentic for the
> struct, it will generate slightly better code for the accessors.
> >> >
> >> > Sam
> >> >
> >> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
> >> >>
> >> >> I updated the gist with some cleanups and additional improvements
> that
> >> >> get the runtime down to a little over 1s (vs ~350ms for the
> optimized C
> >> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
> >> >>
> >> >> Pawel Mosakowski writes:
> >> >>
> >> >> > Hi Bogdan,
> >> >> >
> >> >> > This is a brilliant solution and also completely over my head. It
> finishes
> >> >> > in ~3.75s on my PC and is faster than the Python version which
> basically
> >> >> > delegates all the work to C. I will need to spend some time on
> >> >> > understanding it but I am looking forward to learning something
> new.
> >> >> >
> >> >> > Many thanks,
> >> >> > Pawel
> >> >> >
> >> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
> >> >> >
> >> >> >> I managed to get it about as fast as Python by making it really
> >> >> >> imperative and rolling my own hash:
> >> >> >>
> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
> >> >> >>
> >> >> >> Sam Tobin-Hochstadt writes:
> >> >> >>
> >> >> >> > Here are several variants of the code:
> >> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >> >> >> >
> >> >> >> > The enabled version is about the fastest I can get without using
> >> >> >> > `unsafe` (which the rules said not to do). It's possible to
> optimize a
> >> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds
> -- it
> >> >> >> > would be more significant if there were more different words.
> >> >> >> >
> >> >> >> > Switching to bytes works correctly for the given task, but
> wouldn't
> >> >> >> > always work in the case of general UTF8 input. But those
> versions
> >> >> >> > appeared not be faster for me. Also, writing my own
> string-downcase
> >> >> >> > didn't help. And using a big buffer and doing my own newline
> splitting
> >> >> >> > didn't help either.
> >> >> >> >
> >> >> >> > The version using just a regexp matching on a port (suggested by
> >> >> >> > Robby) turned out not to be faster either, so my suspicion is
> that the
> >> >> >> > original slowness is just using regexps for splitting words.
> >> >> >> >
> >> >> >> > Sam
> >> >> >> >
> >> >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >> >> >> >  wrote:
> >> >> >> >>
> >> >> >> >> Here's a somewhat-optimized version of the code:
> >> >> >> >>
> >> >> >> >> #lang racket/base
> >> >> >> >> (require racket/string racket/vector racket/port)
> >> >> >> >>
> >> >> >> >> (define h (make-hash))
> >> >> >> >>
> >> >> >> >> (time
> >> >> >> >> (for* ([l (in-lines)]
> >> >> >> >> [w (in-list (string-split l))]
> >> >> >> >> [w* (in-value (string-downcase w))])
> >> >> >> >> (hash-update! h w* add1 0)))
> >> >> >> >>
> >> >> >> >> (define v
> >> >> >> >> 

[racket-users] Re: [ANNOUNCE] Xiden is now in beta

2021-03-20 Thread 'Joel Dueck' via Racket Users
Racket’s existing package system doesn’t pose any felt problems for me, but 
I still find this project very interesting. 

On some future day when Xiden is out of beta, what are authors of “normal” 
Racket packages doing to make their packages available to Xiden users? For 
example, are we zipping, hashing and signing every “release” and uploading 
it somewhere (our own web server or a 3rd party catalog)? Or is the typical 
Xiden user manually creating their own catalogs and packages from others’ 
code after a thorough vetting? (Maybe if I were more familiar with Guix I 
would already know the answer to this)

On Friday, March 19, 2021 at 3:56:18 PM UTC-5 Sage Gerard wrote:

> Hi folks,
>
> About a year, 1384 commits, 489 tests, ~10k LOC, and 2" on my waistline 
> later, Xiden is in beta. An update is pending on the default catalog.
>
> https://github.com/zyrolasting/xiden
>
> Xiden is a dependency manager I wrote to support use cases that I could 
> not get working with `raco pkg`.
>
> Dependency management is hard, so Xiden was something I originally didn't 
> want to make. However, it ended up becoming one of my most aspirational 
> projects, and I'm proud of how it ended up. If you could take the time to 
> read a longer email, I'd like to share a bit about how it might be helpful 
> to you.
>
> ***
> Like Guix, Xiden supports deterministic and atomic installations. Unlike 
> Guix, Xiden is cross-platform.
>
> The Racket programs I write no longer have to assume that code comes in 
> collections (outside of the built-in ones).
>
> You can force dependencies of different versions to resolve to the same 
> data to avoid issues with non-eq? bindings [multiver].
>
> Dependencies are accessed by symbolic links with names defined by the 
> dependent. So if two packages are called "uri", you can still install them 
> both under names that are meaningful to you. Dependencies are fulfilled the 
> same way, regardless if the dependent is a human or more software.
>
> Explicit, affirmative consent is fundamental to Xiden's workings. The 
> default configuration is zero-trust (a.k.a. "Deny All"). Trust in 
> cryptographic hash functions and public keys (or any bytes lacking either) 
> *must* be declared to authenticate bytes from *any* source (even hard 
> coded!). Not doing so will cause Xiden to reject data, but print an error 
> that helpfully instructs you how to consent to the scenario. For those 
> wanting convenience, there are "blanket" configuration options to consent 
> to every instance of those scenarios. This makes Xiden a way to educate 
> users on the exact shape and nature of the risks they accept with something 
> from the Internet. In this sense, Xiden does not invent anything new with 
> security. It only aims to get ahead of the "Allow Some" arms-race in other 
> dependency managers like NPM.
>
> Customization comes from a plugin module. You can use a plugin to 
> integrate GPG, use a different archive format, or otherwise fill in gaps in 
> Xiden's functionality. Xiden keeps authentication and integrity checking 
> decoupled in this way so that users can transition on their own in the 
> event a smart person finds a collision in a CHF, or cracks a cipher. 
> Similarly, Xiden's data sources are any data type declared with a path to 
> an input port, including queries to a catalog. A neat effect of this is 
> that you can configure your own syntax for data sources in your command 
> lines.
>
> Even though I call Xiden a dependency manager, it is generalized enough to 
> be useful as a component for a CI system, as a self-hosted OS development 
> environment, or even as a back-end for a more specialized dependency 
> manager. 
>
> If this is something that interests you, please consider trying the 
> examples with the guide [ex][guide]. Like all software, Xiden is not 
> perfect, so I depend on your feedback to make Xiden better for you, and to 
> decide what interfaces should be declared stable.
>
> [ex]: https://github.com/zyrolasting/xiden/tree/master/examples
> [guide]: https://docs.racket-lang.org/xiden-guide@xiden/index.html
> [ethos]: 
> https://groups.google.com/g/racket-users/c/4iI-SanIbzk/m/sGHYijLPAAAJ
> [multiver]: 
> https://github.com/zyrolasting/xiden/tree/master/examples/01-differing-versions
>
> --
> ~slg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/932af4c4-3713-4305-970b-e608e434a71en%40googlegroups.com.