civodul pushed a commit to branch master in repository maintenance. commit 5ffdfc4afd03ec251eec2e7bd1a31186d9c54a14 Author: Ludovic Courtès <l...@gnu.org> Date: Thu Jul 6 23:52:34 2017 +0200
gpce-2017: Fixlets. --- doc/gpce-2017/code/gexp-expansion.scm | 4 +- doc/gpce-2017/gpce.skb | 227 +++++++++++++++++++--------------- doc/gpce-2017/staging.sbib | 4 +- 3 files changed, 128 insertions(+), 107 deletions(-) diff --git a/doc/gpce-2017/code/gexp-expansion.scm b/doc/gpce-2017/code/gexp-expansion.scm index 111fa81..ecd5637 100644 --- a/doc/gpce-2017/code/gexp-expansion.scm +++ b/doc/gpce-2017/code/gexp-expansion.scm @@ -34,7 +34,7 @@ #~(let ((x 2)) #$(gen-body #~x)) -⇒ (let ((x0 2)) - (let ((x1 40)) (+ x1 x0))) +⇝ (let ((x-1bd8-0 2)) + (let ((x-4f05-0 40)) (+ x-4f05-0 x-1bd8-0))) ;;!end-gexp-hygiene diff --git a/doc/gpce-2017/gpce.skb b/doc/gpce-2017/gpce.skb index 8101a93..29afd69 100644 --- a/doc/gpce-2017/gpce.skb +++ b/doc/gpce-2017/gpce.skb @@ -157,21 +157,21 @@ (p [GNU Guix is a “functional” package manager that builds upon earlier work on Nix. Guix implements high-level abstractions such as packages and operating system services as domain-specific languages -(DSL) embedded in Scheme, and it also implements build actions and +(DSLs) embedded in Scheme. It also implements build actions and operating system orchestration in Scheme. This leads to a multi-tier programming environment where embedded code snippets are staged for eventual execution.]) - (p [In this paper we present ,(emph [G-expressions]) or “,(emph + (p [This paper presents ,(emph [G-expressions]) or “,(emph [gexps])”, the staging mechanism we devised for Guix. We explain our journey from traditional Lisp S-expressions to G-expressions, which augment the former with contextual information and ensure hygienic code staging. We discuss the implementation of gexps and report on our experience using them in a variety of operating system use cases—from package build processes -to system services. To our knowledge, gexps -provide a unique way to cover many aspects of OS configuration in a -single, multi-tier language, and to facilitate code reuse and code +to system services. Gexps +provide a novel way to cover many aspects of OS configuration in a +single, multi-tier language, while facilitating code reuse and code sharing.])) ;; See <http://dl.acm.org/ccs/ccs_flat.cfm>. @@ -194,7 +194,7 @@ that software build processes are considered as pure functions: given a set of inputs (compiler, libraries, build scripts, and so on), a package’s build function is assumed to always produce the same result. Build results are stored in an immutable persistent data structure, -the store, implemented as a single directory, ,(tt [/gnu/store]). +the ,(emph [store]), implemented as a single directory, ,(tt [/gnu/store]). Each entry in ,(tt [/gnu/store]) has a file name composed of the hash of all the build inputs used to produce it, followed by a symbolic name. For example, ,(tt [/gnu/store/yr9rk90jf…-gcc-7.1.0]) identifies @@ -202,21 +202,21 @@ a specific build of GCC 7.1. A variant of GCC 7.1, for instance one using different build options or different dependencies, would get a different hash. Thus, each store file name uniquely identifies build results, and build processes are ,(emph [referentially transparent]). -This simplifies the reasoning on complex package compositions, but it +This simplifies reasoning on complex package compositions, and also has nice properties such as supporting transactional upgrades and rollback “for free.” While Guix and Nix are package managers, the -Guix System Distribution (or GuixSD) as well as NixOS extends the +Guix System Distribution (or GuixSD) as well as NixOS extend the functional paradigm to whole operating system deployments ,(ref :bib 'dolstra2010:nixos).]) (p [While Guix implements this functional deployment paradigm pioneered by Nix, we explained in previous work that its implementation departs from Nix in interesting ways ,(ref :bib 'courtes2013:functional). First, while Nix relies on a custom -domain-specific language (DSL), the Nix language, Guix instead chooses -to devise a set of DSLs and data structures embedded in the -general-purpose language Scheme. The rationale was that this approach -would ease the development of user interfaces and tools dealing with -packages, and would allow users to benefit from everything a +domain-specific language (DSL), the Nix language, Guix instead +implements a set of DSLs and data structures embedded in the +general-purpose language Scheme. This simplifies +the development of user interfaces and tools dealing with +packages, and allows users to benefit from everything a general-purpose language brings: compiler, debugger, REPL, editor support, libraries, and so on. Four years later, Guix has indeed gained rich tooling that would have been harder to develop for an @@ -264,28 +264,31 @@ perform the build (the ,(emph [build program])), environment variables to be defined, and derivations whose build result it depends on. Derivations are sent to a privileged daemon, which is responsible for building them on behalf of clients. The build daemon creates isolated -environments (isolated ,(emph [containers]) in a chroot) in which it spawns -the build program; since build environments are isolated, this ensures +environments (,(emph [containers]) in a chroot) in which it spawns +the build program; isolated build environments ensure that build programs do not depend on undeclared inputs.]) (p [The second way in which Guix departs from Nix is by using the same language, Scheme, for all its functionality. While package definitions in Nix can embed Bash or Perl snippets to refine build -steps, Guix package definitions would instead embed Scheme snippets. -Consequently, we have two strata of Scheme code: the “host code”, -which provides the package definition, and the “build code”, which is +steps, Guix package definitions instead embed Scheme code. +Consequently, we have two strata of Scheme code: the ,(emph [host code]), +which provides the package definition, and the ,(emph [build code]), which is staged for later execution by the build daemon. Our thesis is that this single-language, “multi-tier” approach facilitates code reuse and code sharing among the several tiers, and that it can avoid a whole class of errors in the staged code—as opposed to generation of code in a “foreign” language, which is treated a mere strings where syntactic and semantic errors cannot be detected by the host code.]) - (p [This paper focus on code staging in Guix. Our contribution + (p [This paper focuses on code staging in Guix. Our contribution is twofold: we present G-expressions (or “gexps”), a new code staging mechanism implemented through mere syntactic extensions of the Scheme -language, and its use in several areas of the “orchestration” programs -of the operating system. ,(numref :text [Section] :ident "design") -describes the evolution of code staging in Guix from its inception, as -described in ,(ref :bib 'courtes2013:functional), to gexps. ,(numref +language; we show the use of gexps in several areas of the “orchestration” programs +of the operating system. ,(numref :text [Section] :ident "origins") +discusses the early attempt at code staging in Guix, as +mentioned in ,(ref :bib 'courtes2013:functional), and its shortcomings. +,(numref :text [Section] :ident "gexps") presents the design and +implementation of gexps. +,(numref :text [Section] :ident "experience") reports on our experience using gexps in a variety of areas in Guix and GuixSD. ,(numref :text [Section] :ident "limitations") discusses limitations and future work. @@ -293,15 +296,16 @@ Finally ,(numref :text [Section] :ident "related") compares gexps to related work and ,(numref :text [Section] :ident "conclusion") concludes.])) - (chapter :title [Design and Implementation] - :ident "design" + (chapter :title [Early Attempt] + :ident "origins" (p [Scheme is a dialect of Lisp, and Lisp is famous for its homoiconicity—the fact that code has a direct representation as a data structure using the same syntax. “S-expressions” or “sexps”, Lisp’s parenthecal expressions, thus look like they lend themselves to code -staging. In this section we show how we started with sexps to end up -with gexps as an “augmented” version of sexps.]) +staging. +In this section we show how we this early experience made it clear that +we needed an ,(emph [augmented]) version of sexps.]) (section :title [Staging Build Expressions] @@ -316,18 +320,20 @@ with gexps as an “augmented” version of sexps.]) (p [In previous work ,(ref :bib 'courtes2013:functional), we presented our first attempt at writing build expressions in Scheme, -which relied solely on Lisp’s famous quotation mechanism ,(ref :bib +which relied solely on Lisp quotation ,(ref :bib 'bawden1999:quasiquotation). Figure ,(ref :figure "fig-build-sexp") shows an example that creates a derivation that, when built, converts the input image to JPEG, using the ,(tt [convert]) program from the -ImageMagick package. In this example, variable ,(tt [store]) +ImageMagick package—this is equivalent to a three-line makefile, but +referentially transparent. In this example, variable ,(tt [store]) represents the connection to the build daemon. The ,(tt [package-derivation]) function takes the ,(tt [imagemagick]) package object and computes its corresponding derivation, while the ,(tt [add-to-store]) remote procedure call (RPC) instructs the daemon to add the file ,(tt [GuixSD.png]) to ,(tt [/gnu/store]). The variable -,(tt [build]) contains our build program as an sexp, thanks to the -apostrophe, which means “quote” in Lisp. Finally, ,(tt +,(tt [build]) contains our build program as an sexp (the +apostrophe is equivalent to ,(tt [quote]); it introduces unevaluated +code). Finally, ,(tt [build-expression->derivation]) takes the build program and computes the corresponding derivation without building it. The user can then make an RPC to the build daemon asking it to build this derivation; @@ -351,20 +357,20 @@ pleasant ,(tt [package]) interface shown in Figure ,(ref :figure derivation and its dependencies, but where does the verbosity come from? First, we have to explicitly call ,(tt [package-derivation]) for each package the expression refers to. Second, we have to -explicitly the inputs with labels at the call site. Third, the build +specify the inputs with labels at the call site. Third, the build code has to use this ,(tt [assoc-ref]) call just to retrieve the ,(tt -[/gnu/store]) file name of its inputs. It is also error-prone: if we +[/gnu/store]) file name of its inputs. It is error-prone: if we omit the ,(tt [#:inputs]) parameter, of if we mispell an input label, we will only find out when we build the derivation.]) (p [Another limitation not visible on a toy example but that -became clear as we developed GuixSD it the cost of carrying this ,(tt +became clear as we developed GuixSD is the cost of carrying this ,(tt [#:inputs]) argument down to the call site. It forces programmers to carry not only the build expression, ,(tt [build]), but also the -corresponding ,(tt [inputs]) argument down to the call site. This -essentially makes it very hard to compose build expressions.]) - (p [While ,(tt [quote]) allowed to easily represent code as -expected, it clearly lacks some of the machinery that would make -staging in Guix more convenient. It boilds down to two things: it +corresponding ,(tt [inputs]) argument, and +makes it very hard to compose build expressions.]) + (p [While ,(tt [quote]) allowed us to easily represent code, it +clearly lacked some of the machinery that would make +staging in Guix more convenient. It boils down to two things: it lacks ,(emph [context])—the set of inputs associated with the expression—and it lacks the ability to serialize high-level objects—to replace a reference to a package object with its ,(tt [/gnu/store]) @@ -373,7 +379,8 @@ file name.]))) (chapter :title [G-Expressions] :ident "gexps" - (p [This section describes the design and implementation of + (p [We devised “G-expressions” as a mechanism to address +these shortcomings. This section describes the design and implementation of G-expressions, as well as extensions we added to address new use cases.]) @@ -388,8 +395,7 @@ cases.]) :start ";!begin-imagemagick-gexp" :stop ";!end-imagemagick-gexp"))) - (p [We devised “G-expressions” as a mechanism to address -these shortcomings. In essence, a gexp bundles an sexp and its inputs + (p [In essence, a gexp bundles an sexp and its inputs and outputs, and it can be serialized with ,(tt [/gnu/store]) file names substituted as needed. We first define two operators: @@ -398,8 +404,8 @@ names substituted as needed. We first define two operators: Scheme’s ,(tt [quasiquote]): it allows users to describe unevaluated code.]) (item [,(tt [ungexp]), abbreviated ,(tt [#$]), is the counterpart -of Scheme’s ,(tt [unquote]): it allows quoted to refer to values in -the host language. These values can be of any of Scheme’s primitive +of Scheme’s ,(tt [unquote]): it allows quoted code to refer to values in +the host program. These values can be of any of Scheme’s primitive data types, but we are specifically interested in values such as package objects that can be “compiled” to elements in the store.]) (item [,(tt [ungexp-splicing]), abbreviated ,(tt [#$@]), allows a @@ -408,7 +414,7 @@ Scheme’s ,(tt [unquote-splicing]).])) The example in Figure ,(ref :figure "fig-build-sexp"), rewritten as a gexp, is shown in Figure ,(ref :figure "fig-build-gexp"). We have all -the properties we were looking for: the gexp contains carries +the properties we were looking for: the gexp carries information about its inputs that does not need to be passed at the ,(tt [gexp->derivation]) call site, and the reference to ,(tt [imagemagick]), which is bound to a package object, is automatically @@ -419,11 +425,11 @@ is because we implemented ,(tt [gexp->derivation]) as a monadic function in the ,(emph [state monad]), where the state threaded through monadic function calls is that store parameter. The use of a monadic interface is completely orthogonal to the gexp design though, -so we will not insist on it.]) ,(tt [local-file]) returns a new -Scheme record that denotes a file from the local file system to be +so we will not insist on it.]). ,(tt [local-file]) returns a new +record that denotes a file from the local file system to be added to the store.]) (p [Under the hood, ,(tt [gexp->derivation]) converts the -gexp to an sexp, the final build program, stored under ,(tt +gexp to an sexp, the residual build program, and stores it under ,(tt [/gnu/store]). In doing that, it replaces the ,(tt [ungexp]) forms ,(tt [#$imagemagick]) and ,(tt [#$image]) with their corresponding ,(tt [/gnu/store]) file names. The special ,(tt [#$output]) form, @@ -447,7 +453,8 @@ is created.]) lexical scope across stages]) ,(ref :bib '(rhiger2012:hygienic kiselyov2008:metascheme kohlbecker1986:hygienic)).] - (figure :legend [Lexical scope preservation across stages.] + (figure :legend [Lexical scope preservation across stages (⇝ +denotes code generation).] :ident "fig-gexp-hygiene" (prog :line #f @@ -460,7 +467,8 @@ well-known properties of hygienic multi-stage programs: first, binding ,(tt [x]) in one stage (outside the gexp) is distinguished from binding ,(tt [x]) in another stage (inside the gexp); second, binding ,(tt [x]) introduced inside ,(tt [gen-body]) does not shadow binding -,(tt [x]) in the outer gexp thanks to the renaming of these variables.])) +,(tt [x]) in the outer gexp thanks to the renaming of these variables +in the residual program.])) (section :title [Implementation] :ident "implementation" @@ -475,16 +483,16 @@ generation (⇝).] :start ";;!begin-gexp-expansion" :stop ";;!end-gexp-expansion"))) - (p [As can be seen from the example above, gexps are + (p [As can be seen from the examples above, gexps are first-class Scheme values: a variable can be bound to a gexp, and gexps can be passed around like any other value. The implementation consists of two parts: a syntactic layer that turns ,(tt [#~]) forms into code that instantiates gexp records, and run-time support -procedures to serialize gexps and to “lower” their inputs.]) - (p [Scheme is extensible through macros, so ,(tt [gexp]) is a -“hygienic” ,(tt [syntax-case]) macro ,(ref :bib +functions to serialize gexps and to ,(emph [lower]) their inputs.]) + (p [Scheme is extensible through macros, and ,(tt [gexp]) is a +,(tt [syntax-case]) macro ,(ref :bib 'dybvig1992:syntax-case); ,(tt [#~]) and ,(tt [#$]) are ,(it [reader -macros]) that expand to a ,(tt [gexp]) or ,(tt [ungexp]) sexps. This +macros]) that expand to ,(tt [gexp]) or ,(tt [ungexp]) sexps. This is implemented as a library for GNU,(~)Guile, an R5RS/R6RS Scheme implementation, ,(emph [without any modification to its compiler]). Figure ,(ref :figure "fig-gexp-expansion") shows what our ,(tt [gexp]) @@ -492,20 +500,22 @@ macro expands to. In the expanded code, ,(tt [gexp-input]) returns a record representing a dependency, while ,(tt [make-gexp]) returns a record representing the whole gexp. The expanded code defines a function of two arguments, ,(tt [proc]), that returns an sexp; the -sexp is simply the body of the gexp with these two arguments inserted +sexp is the body of the gexp with these two arguments inserted at the point where the original ,(tt [ungexp]) forms appeared. -Intenally, ,(tt [gexp->sexp]), the function that converts gexps to +Internally, ,(tt [gexp->sexp]), the function that converts gexps to sexps, calls this two-argument procedure passing it the store file names of ImageMagick and Emacs. This strategy gives us constant-time substitutions.]) - (p [The internal ,(tt [gexp-input]) function returns, for a + (p [The internal ,(tt [gexp-inputs]) function returns, for a given gexp, store, and system type, the derivations that the gexp depends on. In this example, it returns the derivations for ImageMagick and Emacs, as computed by the ,(tt [package-derivation]) function seen earlier. Gexps can be nested, as in ,(tt [#~#$#~(string-append #$emacs "/bin/emacs")]). The input list returned by ,(tt [gexp-inputs]) for the outermost gexp is the sum of -the inputs of outermost gexp and the inputs nested gexps.]) +the inputs of the outermost gexp and the inputs nested gexps. Likewise, +,(tt [gexp-outputs]) returns the outputs declared in a gexp and in +nested gexps.]) (p [The ,(tt [gexp]) macro performs several passes on its body: ,(enumerate @@ -520,12 +530,15 @@ the literature, identifiers must be generated in a ,(emph [deterministic]) fashion: if they were not, we would produce different derivations at each run, which in turn would trigger full rebuilds of the package graph. Thus, instead of relying on ,(tt [gensym]) and -,(tt [generate-temporaries]), we generate identifiers using a hash for -the input expression as a stem, along with lexical nesting level of -the identifer.]) +,(tt [generate-temporaries]), we generate identifiers as a function of +the hash of +the input expression and of the lexical nesting level of +the identifier—these are the two components we can see in the generated +identifiers of Figure ,(ref +:figure "fig-gexp-hygiene").]) (item [The second pass ,(emph [collects the escape forms]) (,(tt [ungexp]) variants) in the input source. The list of escape forms is -needed to construct the list of inputs recorded in the ,(tt [<gexp>]) +needed to construct the list of inputs stored in the gexp record, and to construct the formal argument list of the gexp’s code generation function shown in Figure ,(ref :figure "fig-gexp-expansion").]) @@ -558,8 +571,8 @@ by ,(tt [gexp->sexp]) when it encounters instances of the relevant type in a gexp that is being processed.]) (p [Gexp compilers can also have an associated ,(emph [expander]), which specifies how objects should be “rendered” in the -final sexp. The default expander simply produces the store file name -that corresponds to the output of the derivation. For example, +residual sexp. The default expander simply produces the store file name +of the derivation output. For example, assuming the variable ,(tt [emacs]) is bound to a package object, ,(tt [#~(string-append #$emacs "/bin/emacs")]) expands to ,(tt [(string-append "/gnu/store/…-emacs-25.2" "/bin/emacs")]), as we have @@ -576,19 +589,19 @@ when generating the sexp. We can now write gexps like:] (!latex "\\\\[0.3cm]\n") [This is convenient in situations where we do not want or cannot impose -a build-side ,(tt [string-append]) code.])) +a ,(tt [string-append]) call in staged code.])) (section :title [Extensions] (figure - :legend [Specifying importing modules in a gexp.] + :legend [Specifying imported modules in a gexp.] :ident "fig-gexp-modules" (prog :line #f (source :language guix :file "code/gexp-modules.scm"))) (p [,(bold [Modules.]) One of the reasons for using the same -language uniformly is the ability to reuse Guile modules among in +language uniformly is the ability to reuse Guile modules in several contexts. Since builds are performed in an isolated environment, Scheme modules that are needed must be explicitly ,(emph [imported]) in that environment; in other words, the modules must be @@ -598,7 +611,7 @@ objects embed information about the modules they need; the ,(tt modules to import in the gexps that appear in its body. The example in Figure ,(ref :figure "fig-gexp-modules") creates a gexp that requires the ,(tt [(guix build utils)]) module and the modules it -depends on in its execution environment. The source of these module +depends on in its execution environment. The source of these modules is taken from the user’s search path and added to the store when ,(tt [gexp->derivation]) is called.]) (p [Note that, to actually bring the module in scope, we @@ -629,36 +642,40 @@ background image to a suitable format, which resembles that of Figure expression that converts the image should use the ,(emph [native]) ImageMagick, not the target ImageMagick, which it would not be able to run anyway. Thus, we write ,(tt [#+imagemagick]) rather than ,(tt -[#$imagemagick]).]))) +[#$imagemagick]). “Nativeness” propagates to all the values beneath +,(tt [#+]).]))) (chapter :title [Experience] :ident "experience" - (p [Guix is used in production by individuals and organizations. -This section reports on our experience using gexps in Guix.]) + (p [Guix and GuixSD are used in production by individuals and +organizations to deploy software on laptops, servers, and clusters. +Introducing a new core mechanism in such a project can be both fruitful +and challenging. This section reports on our experience using gexps in +Guix.]) (section :title [Package Build Procedures] (p [As explained earlier, gexps appeared quite recently in the history of Guix. Package definitions like that of Figure ,(ref -:figure "fig-package-def") relied on the previous ad-hoc staging -mechanism. This can be seen in the use of labels in the ,(tt +:figure "fig-package-def") rely on the previous ad-hoc staging +mechanism, as can be seen in the use of labels in the ,(tt [inputs]) field of definitions. Guix today includes more than 5,500 packages, which still use this old, pre-gexp style. We are -considering a migration to the new style but given the size of the +considering a migration to a new style but given the size of the repository, this is a challenging task and we must make sure every use case is correctly addressed in the new model.]) - (p [In theory, labels are no longer needed with the use of -gexps since one can now use a ,(tt [#$]) escape when they need to -refer to the absolute file name of an inputs. The indirection that + (p [In theory, labels are no longer needed with +gexps since one can now use a ,(tt [#$]) escape to +refer to the absolute file name of an input in ,(tt [arguments]). The indirection that labels introduced had one benefit though: one could create a package -variant with a different input, and ,(tt [(assoc-ref %build-inputs …]) +variant with a different ,(tt [inputs]) field, and ,(tt [(assoc-ref %build-inputs …)]) calls in build-side code would automatically resolve to the new -package. If we instead allow for direct use of ,(tt [#$]) in package +dependencies. If we instead allow for direct use of ,(tt [#$]) in package ,(tt [arguments]), those will be unaffected by changes in ,(tt -[inputs]), which would break this particular use case. It remains to +[inputs]). It remains to be seen how we can allow ,(tt [#$]) forms while not sacrificing this -flexibility.)])) +flexibility.])) (section :title [System Services] @@ -682,7 +699,7 @@ the kernel Linux.]) :stop ";;!end-initrd"))) (p [The initrd is a small file system image that the kernel -Linux mounts as its initial file system. It then runs the ,(tt +Linux mounts as its initial root file system. It then runs the ,(tt [/init]) program therein; this program is responsible for mounting the real root file system and for loading any drivers needed to achieve that. If the file system is encrypted, this is also the place where a @@ -691,7 +708,7 @@ is a Scheme program that we generate based on the OS configuration, using gexps. Figure ,(ref :figure "fig-initrd") illustrates the creation of an initrd. Here ,(tt [expression->initrd]) returns a derivation that builds an initrd containing the given gexp as the ,(tt -[/init]) program. The staged program in this examples calls the ,(tt +[/init]) program. The staged program in this example calls the ,(tt [boot-system]) function from the ,(tt [(gnu build linux-boot)]) module. The initrd is automatically populated with Guile and its dependencies, the closure of the ,(tt [(gnu build linux-boot)]) @@ -733,9 +750,9 @@ case is the operating system’s run-time environment.])) (p [GuixSD comes with a set of ,(emph [whole-system tests]). Each of them takes an ,(tt [operating-system]) definition, which defines the OS configuration, instantiates it in a virtual machine (VM), and -verifies that system running in a VM matches some of the settings. The +verifies that the system running in the VM matches some of the settings. The guest OS is instrumented with a Scheme interpreter that evaluates -expressions sent by the host OS (we call it “marionette”).]) +expressions sent by the host OS—we call it “marionette”.]) (p [Whole-system tests are derivations whose build programs are gexps that resemble that of Figure ,(ref :figure "fig-system-test"). The build program passes ,(tt [run]), the script to spawn the VM, to the @@ -755,11 +772,11 @@ in ,(numref :text [Section] :ident "implementation"), follows the well-documented approach to the problem ,(ref :bib '(rhiger2012:hygienic kiselyov2008:metascheme)). Rhiger’s implementation handles a single binding construct (,(tt [lambda])) and -MetaScheme handles a couple more constructs, but of course, ours had -to deal with many more binding constructs: R6RS defines around ten +MetaScheme handles a couple more constructs, but ours has +to deal with more binding constructs: R6RS defines around ten binding constructs (including binding constructs for syntactic keywords such as ,(tt [let-syntax])), and Guile adds a couple more.]) - (p [Fundamentally, this is all about identifying binding + (p [Hygiene in multi-stage programs relies on identifying binding constructs. This turns out to be hard to achieve in Scheme because macros can define ,(emph [new]) bindings constructs. Our ,(symbol "alpha")-renaming pass is oblivious to those so it will @@ -775,7 +792,8 @@ Guile variant used to evaluate “host-side” code. How we could hook into Guile’s macro expander, based on ,(tt [psyntax]) ,(ref :bib 'dybvig1992:syntax-case), is still an open question. To our knowledge, this problem of hygienic staging of a language with macros -has not been addressed in literature.]) +has not been addressed in literature outside of work on macro expanders +,(ref :bib 'dybvig1992:syntax-case).]) (p [On top of that, ,(tt [gexp]) must track the ,(emph [quotation level]) of several types of quotation: ,(tt [gexp]), ,(tt [quote]), ,(tt [quasiquote]), and ,(tt [syntax]) (though our @@ -794,9 +812,12 @@ specify ,(emph [which modules should be in scope]), which could be useful in some situations. Part of the reason is that in Guile ,(tt [use-modules]) clauses must appear at the top level, and thus they cannot be used in a gexp that ends up being inserted in a -non-top-level position. Scoped ,(tt [use-modules]) clauses would help -to some extent, but there are still open questions open question -regarding potential name clashes.]) +non-top-level position. Macro expanders know the modules in scope at +macro-definition points so they can replace free variables in residual +code with fully-qualified references to variables inside the modules +in scope at the macro definition point. How to achieve something +similar with gexp, which lack the big picture that a macro expander has, +remains an open question.]) (p [,(bold [Cross-stage debugging.]) ,(tt [gexp->derivation]) emits build programs as sexps in a file in ,(tt [/gnu/store]), using Scheme ,(tt [write]), which writes the whole sexp as one line. When @@ -810,8 +831,8 @@ feature was available in Scheme, it would be unsuitable: moving the source code where a gexp appears would lead to a different derivation, in turn triggering a rebuild of everything that depends on it, which is undesirable. Instead we would need a way to pass source code -mapping information ,(emph [off-band]), in a way that does not affect -the derivation that is produced. We are still investigating ways to +mapping information ,(emph [out-of-band]), in a way that does not affect +the derivation that is produced. We are investigating ways to achieve that.])) (chapter :title [Related Work] @@ -830,19 +851,18 @@ derivation, the Nix interpreter records this dependency in the string context and substitutes the reference with the output file name of the derivation.]) (p [Because Nix views this generated code as mere strings, it -does provide any guarantee on the generated code (notably syntactic +does not provide any guarantee on the generated code (notably syntactic correctness). The string interpolation syntax (,(tt [${])…,(tt [}]) sequences), often clashes with the target’s language syntax (e.g., Bash uses dollar-brace syntax to reference variables), which can lead -to subtle errors and contrain developers to resort to non-trivial +to subtle errors and constrain developers to resort to non-trivial escaping syntax. The “code-as-string” paradigm also has other side -effects: comments and whitespace in those strings is preserved, which -means those can trigger a rebuild of the derivation, which is +effects: comments and whitespace in those strings is preserved, and +changing those triggers a rebuild of the derivation, which is inconvenient.]) (p [Code staging in Scheme has been studied in the context of -,(emph [macros]). Dybvig’s work ,(ref :bib 'dybvig1992:syntax-case) -introduced “hygienic” macros in Scheme—i.e., macros that generate -well-scoped code, without unintended capture of variables—which later +,(emph [hygienic macros])—i.e., macros that generate +well-scoped code, without unintended capture of variables ,(ref :bib '(kohlbecker1986:hygienic dybvig1992:syntax-case))—which later made it into the Sixth Report on Scheme (R6RS). MacroML achieves something similar in the context of ML, which is statically-typed ,(ref :bib 'ganz2001:macroml). Both tools allow users to define new @@ -891,7 +911,8 @@ G-expressions, support for tilde forms is built in the Hop compiler, and tilde forms are not first-class objects. Hop comes with useful multi-stage debugging facilities not found in Guix, such as the ability to display cross-stage stack traces with correct source -location information.]) +location information. It also has a way to express modules in scope for +staged code.]) ;; See refs at https://www.researchgate.net/publication/2632322_Writing_Hygienic_Macros_in_Scheme_with_Syntax-Case diff --git a/doc/gpce-2017/staging.sbib b/doc/gpce-2017/staging.sbib index c12b605..d4e0fd0 100644 --- a/doc/gpce-2017/staging.sbib +++ b/doc/gpce-2017/staging.sbib @@ -8,7 +8,7 @@ (url "http://www.cs.indiana.edu/~dyb/pubs/tr356.pdf")) (inproceedings kohlbecker1986:hygienic - (author "Kohlbecker, Eugene and Friedman, Daniel P. and Felleisen, Matthias and Duba, Bruce") + (author "Eugene Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce Duba") (title "Hygienic Macro Expansion") (booktitle "Proceedings of the 1986 ACM Conference on LISP and Functional Programming") (series "LFP '86") @@ -95,7 +95,7 @@ Evaluation and Semantics-Based Program Manipulation (PEPM 1999)") (url "http://repository.readscheme.org/ftp/papers/pepm99/bawden.pdf")) (inproceedings rhiger2012:hygienic - (author "Rhiger, Morten") + (author "Morten Rhiger") (title "Hygienic Quasiquotation in Scheme") (booktitle "Proceedings of the 2012 Annual Workshop on Scheme and Functional Programming") (series "Scheme '12")