Re: [racket-users] Compilation/Embedding leaves syntax traces

'Paulo Matos' via Racket Users Tue, 25 Sep 2018 23:36:42 -0700

On 25/09/2018 23:44, Philip McGrath wrote:
> On Tue, Sep 25, 2018 at 3:46 PM 'Paulo Matos' via Racket Users
> <racket-users@googlegroups.com <mailto:racket-users@googlegroups.com>>
> wrote:
> 
>     OK, so I understand now that what I want is an unimplemented feature,
>     but in most compilers these days and certainly those based in LLVM and
>     GCC there's a feature called whole-program optimization or link time
>     optimization. Basically the compiler will get the whole program in
>     memory after compiling each module/file and run the optimizations on the
>     whole thing again therefore being able to optimize things it wasn't able
>     to optimize before when it only had a module/file view.
> 
> 
> I know that `raco demodularize` can flatten a whole program
> (https://docs.racket-lang.org/raco/demod.html). I think I remember
> reading a paper that looked at using this as the basis for whole-program
> optimization, but I don't remember the results of the paper, much less
> anything about doing this in practice.
>  

You are right, and as Ryan mentioned in another post -g will re-compile
after the de-modularization - in effect doing whole-program
optimization. I am amazed and scared at the same time. :)

If you recall the paper please let me know.

> 
>     >> My software has several compile time options that use environment
>     >> variables to be read (since I can't think of another way to do
>     it) so I
>     >> define a compile time variable as:
>     >>
>     >> (define-for-syntax enable-contracts?
>     >> (and (getenv "S10_ENABLE_CONTRACTS") #true))
>     >>
>     >> And then I create a macro to move this compile-time variable to
>     runtime:
>     >> (define-syntax (compiled-with-contracts? stx)
>     >> (datum->syntax stx enable-contracts?))
>     >>
>     >> I have a few of these so when I create a distribution, I first
>     create an
>     >> executable with (I use create-embedding-executable but for
>     simplicity,
>     >> lets say I am using raco):
>     >> S10_ENABLE_CONTRACTS=1 raco exe ...
> 
> 
> More broadly, the thing that first struck me is that most Racket
> programs don't seem to use this sort of build process. I do think they
> should be /able/ to, and there may be good reasons to do that in your
> case, but I've been trying to think about what it is about this that
> strikes me as un-Racket-ey and what alternatives might be.
> 

I am keen on hearing about alternatives. The reason to do like this is
to minimize friction with clients. Clients in the area of development
tools expect something that they can execute and generally are not too
keen on scripty calls like `python foo.py`, so if I said: Please run the
program with `racket s10.rkt` ... it would very quickly lower my
possibility of a sale. Racket distribution creates essentially the
package that they expect, something they can run out of the box without
thinking much about dependencies in something that looks like an
executable - even if it's just more or less a racket shell. Your product
appearance is important and therefore I want to give something they are
already used to.

> I think one part of it is that compiling differently based on
> environment variables seems to go against the principle that "Racket
> internalizes extra-linguistic mechanisms"
> (http://felleisen.org/matthias/manifesto/sec_intern.html). For a
> practical example, environment variables are vexing on Mac OS for GUI
> programs.

Not sure about Mac OS because I never used one but I hadn't had the need
before this project to do something like this and when I started I was
pointed to how contracts are doing this:
(define-for-syntax enable-contracts? (and (getenv "PLT_TR_CONTRACTS") #t))

in
https://github.com/racket/typed-racket/blob/1825355c4879b6263b0c8fe88b30e11d79fc0d31/typed-racket-lib/typed-racket/utils/utils.rkt#L43


So, I created a racket app to pack my application, so you can actually
do something like:
racket create-s10-distribution.rkt --with-contracts --with-statistics
./release-folder

This will put an S10 release in release folder that can be moved to
another system.

Side note: This is also useful for benchmarking as I can easily
dockerize it, and get it running on a big AWS machine without worrying
about installing racket, etc.

> 
> Another issue is that this approach means that only one compiled
> configuration exists at a time. In some cases maybe that's right: I've
> had files in which I manually switch the definition of a macro to, say,
> add some `printf`s that I only want when I'm actively working on that
> specific file. But more often, if it makes sense for multiple versions
> of a program to exist—say, your example of different architectures—I
> think it also makes sense for them to be able to exist simultaneously.
> 

At the moment I have no issues with this. I usually run it in debug mode
with say: --with-contracts --with-places-debug --with-statistics
--with-timing etc.

Then when I push a change, CI will compile quite a few different
versions and test them all properly.

When I ship something for demo'ing or to a client it's always the
executable distribution, therefore I want to offer a single option and
the client cannot change it.

> Most seriously, depending on exactly how you use these compile-time
> environment variables, it seems like you could negate some of the
> benefits of the "separate compilation guarantee," particularly if you
> are assuming that all of your modules are compiled at the same time.
> 

Why would that be a problem?

> In terms of an alternative, I will pass on an approach suggested to me
> by Jack Firth, Tony Garnock-Jones, and others on this list
> (https://groups.google.com/d/msg/racket-users/ftr4GDy7LG0/Qcm2LaNQDAAJ):
> instead of having one "main module" that is your program, define each
> variant of your program as a module that declares its configuration. I
> have been doing this myself, and I'm very happy with the approach. In my
> case the configuration consists of using `parameterize` at runtime, but
> I expect you could find a way to express whatever you need to.
> 

That is a possibility but I run into the problem of having too many
modules as the number of possibly configurations can easily explode as I
add more configurations and backends to my application. Getting one file
per compile-time configuration is essentially not workable.

My CI script currently creates 48 configurations for testing. That's 4
boolean options plus 3 backend architectures that I either completely
support or are under development. If I get a couple of customers wanting
different architectures, I can easily go to maybe 5 or 6 backends and
have more than 100 configurations to test. I would probably have to stop
testing _every_ config at some point and choose which configs are more
important but this seems to be the problem with this kind of choice - in
GCC world we have the same problem and ended up having to create several
tiers of backends/configurations to be able to do a proper job at
testing and releasing.

If I am missing some fundamental way Racket can help with this, I am
open to other options but it all boils down to moving as many things to
compile time config so I can drop unnecessary code and try to make this
as fast as possible.

Thanks,

Paulo

> -Philip
>  
> 
>     >>
>     >> I have a bunch of other options that don't matter for the moment.
>     >>
>     >> One of the things I noticed is that in some cases when I run my
>     >> executable, compile time code living inside begin-for-syntax to
>     check if
>     >> a variable has been defined during compilation or not is
>     triggered. At a
>     >> point, which I didn't expect any more syntax expansion to occur.
>     >>
>     >> I can't really reproduce the issue with a small example yet but I
>     >> noticed something:
>     >>
>     >> main.rkt:
>     >>
>     >> #lang racket
>     >>
>     >> (require (file "arch-choice.rkt"))
>     >>
>     >> (module+ main
>     >> (printf "arch: ~a~n" (get-path)))
>     >>
>     >> arch-choice.rkt:
>     >>
>     >> #lang racket
>     >>
>     >> (provide get-path)
>     >>
>     >> (begin-for-syntax
>     >>
>     >> (define arch-path (getenv "ARCH"))
>     >>
>     >> (unless arch-path
>     >>   (raise-user-error 'driver "Please define ARCH with a suitable
>     path")))
>     >>
>     >> (define-syntax (get-path stx)
>     >> (datum->syntax stx arch-path))
>     >>
>     >> Then just to make sure nothing is compiled I remove my zos:
>     >> $ find . -type f -name '*.zo' -exec \{\} \;
>     >>
>     >> Then compile it:
>     >> $ ARCH=foo raco exe main.rkt
>     >>
>     >> In this case if you run ./main you'll get 'arch: foo' back which
>     is fine
>     >> so I can't reproduce what I see in my software which is with some
>     >> combinations of compile time options, I see:
>     >> 'driver: Please define ARCH environment variable'
>     >>
>     >> which should even be part of the executable because it's a
>     compile time
>     >> string (or so I thought).
>     >>
>     >> So I did on the above example:
>     >> $ strings main | grep ARCH
>     >> PLANET-ARCHIVE-FILTER
>     >> ARCH"
>     >> ''Please define ARCH with a suitable path
>     >>
>     >>
>     >> OK, so, this agrees with what I see in my program: compile-time error
>     >> strings still exist in the code. Why is that? I thought that only
>     fully
>     >> expanded code (compiled-code) would make it to the executable file.
>     >>
>     >> Another thing that might help me understand what's going on, is
>     there a
>     >> way to extract the bytecode from the executable and decompile it?
>     >>
>     >> Thanks,
>     >>
>     >>
>     >> --
>     >> Paulo Matos
>     >
> 
>     -- 
>     Paulo Matos
> 
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "Racket Users" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to racket-users+unsubscr...@googlegroups.com
>     <mailto:racket-users%2bunsubscr...@googlegroups.com>.
>     For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to racket-users+unsubscr...@googlegroups.com
> <mailto:racket-users+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Paulo Matos

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Compilation/Embedding leaves syntax traces

Reply via email to