Re: hackage importer

Ludovic Courtès Sat, 02 May 2015 05:49:27 -0700

Federico Beffa <be...@ieee.org> skribis:

> Please find attached a patch reorganizing the code as you suggest.


Woow, neat!  Impressive work.  I think this is a great improvement.

I have a bunch of stylistic comments below, and some open questions as
well.

> From bc8cdab1e322a25002a3d9cf33eddd856c8a81d8 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <be...@fbengineering.ch>
> Date: Sun, 26 Apr 2015 11:22:29 +0200
> Subject: [PATCH] import: hackage: Refactor parsing code and add new option.
>
> * guix/import/cabal.scm: New file.
>
> * guix/import/hackage.scm: Update to use the new Cabal parsing module.
>
> * tests/hackage.scm: Update tests for private functions.
>
> * guix/scripts/import/hackage.scm: Add new '--cabal-environment' option.
>
> * doc/guix.texi: ... and document it.
>
> * Makefile.am (MODULES): Add 'guix/import/cabal.scm',
>   'guix/import/hackage.scm' and 'guix/scripts/import/hackage.scm'.
>   (SCM_TESTS): Add 'tests/hackage.scm'.

No newlines between entries.


[...]

> +;; This record stores the state information needed during parsing of Cabal
> +;; files.
> +(define-record-type  <cabal-parse-state>
> +  (make-cabal-parse-state lines minimum-indent indents conditionals
> +                          true-group? true-group false-group
> +                          true-group?-stack true-group-stack 
> false-group-stack)


[...]

> +            (make-cabal-parse-state lines -1 '() '() #t '() '() '() '() 
> '())))

I’m not claiming this must done now, but it may improve readability to
use ‘define-record-type*’.  That way, with appropriate field default
values, one could write something like:

  (cabal-parse-state
    (lines lines))

That would also allow the use of ‘inherit’, which is slightly less
verbose than ‘set-fields’.  WDYT?

Besides, could you add comments to explain the meaning of the various
fields?  I’m particularly curious about ‘true-group?’ & co.  ;-)

> +(define (parse-cabal result)
> +  "Parse a Cabal file and append its content to RESULT (a list).  Return the
> +updated result as a monadic value in the state monad."
> +  (mlet* %state-monad ((state (current-state)))
> +    (match state
> +      (($ <cabal-parse-state> lines minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       (let*-values
> +           (((current-indent line)
> +             (if (null? lines)
> +                 (values 0 "")
> +                 (line-indentation+rest (first lines))))
> +            ((next-line-indent next-line)
> +             (if (or (null? lines) (null? (cdr lines)))
> +                 (values 0 "")
> +                 (line-indentation+rest (second lines))))
> +            ((key-value-rx-result) (has-key? line))
> +            ((end-of-file?) (null? lines))
> +            ((is-simple-key-value?) (and (= next-line-indent current-indent)
> +                                         key-value-rx-result))
> +            ((is-multi-line-key-value?) (and (> next-line-indent 
> current-indent)
> +                                             key-value-rx-result))
> +            ((key) (and=> key-value-rx-result
> +                          (lambda (rx-res)
> +                            (string-downcase (match:substring rx-res 1)))))
> +            ((value) (and=> key-value-rx-result (cut match:substring <> 2))))
> +         (cond
> +          (end-of-file? (return (reverse result)))
> +          (is-simple-key-value?
> +           (>>= (state-add-entry (list key `(,value)) result (cdr lines))
> +                parse-cabal))
> +          (is-multi-line-key-value?
> +           (let*-values 
> +               (((value-lst lines)
> +                 (multi-line-value (cdr lines)
> +                                   (if (string-null? value) '() `(,value)))))
> +             (>>= (state-add-entry (list key value-lst) result lines)
> +                  parse-cabal)))
> +          (else ; it's a section
> +           (let* ((section-header (string-tokenize (string-downcase line)))
> +                  (section-type (string->symbol (first section-header)))
> +                  (section-name (if (> (length section-header) 1)
> +                                     (second section-header)
> +                                     "")))
> +             (mbegin %current-monad
> +               (set-current-state 
> +                (set-fields state
> +                            ((cabal-parse-state-minimum-indent) 
> current-indent)
> +                            ((cabal-parse-state-lines) (cdr lines))))
> +               (>>=
> +                (>>= (parse-cabal-section '())
> +                     (lambda (section-contents)
> +                       (mlet* %state-monad ((state (current-state)))
> +                         (mbegin %current-monad
> +                           (set-current-state
> +                            (set-fields state
> +                                        ((cabal-parse-state-minimum-indent) 
> -1)))
> +                           (return 
> +                            (cons (append
> +                                   (if (string-null? section-name)
> +                                       (list 'section section-type)
> +                                       (list 'section section-type 
> section-name))
> +                                   (list section-contents))
> +                                  result))))))
> +                parse-cabal))))))))))

This procedure is intimidating.  I think this is partly due to its
length, to the big let-values, the long identifiers, the many local
variables, nested binds, etc.

Would it be possible to create auxiliary procedures that would help?
I’m thinking of procedures that take a <cabal-parse-state> and return
the necessary info, like ‘cabal-parse-state-indentation’,
‘cabal-parse-state-key’, ‘cabal-parse-state-multiline?’,
‘cabal-parse-state-eof?’, etc.  WDYT?

Also, please try hard to avoid car and cdr and use ‘match’ instead.

(BTW it’s a good idea to use the state monad here!)

> +(define (parse-cabal-section result)
> +  "Parse a section of a cabal file and append its content to RESULT (a list).
> +Return the updated result as a value in the state monad."
> +  (mlet* %state-monad ((state (current-state)))
> +    (match state
> +      (($ <cabal-parse-state> lines minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       (let*-values
> +           (((current-indent line)
> +             (if (null? lines)
> +                 (values 0 "")
> +                 (line-indentation+rest (first lines))))
> +            ((next-line-indent next-line)
> +             (if (or (null? lines) (null? (cdr lines)))
> +                 (values 0 "")
> +                 (line-indentation+rest (second lines))))
> +            ((key-value-rx-result) (has-key? line))
> +            ((end-of-section?) (or (<= current-indent minimum-indent)
> +                                   (null? lines)))
> +            ;; If this is the last line of the section, then it can't be the
> +            ;; start of a conditional or an 'else'.
> +            ((last-line-of-section?) (<= next-line-indent minimum-indent))
> +            ((is-simple-key-value?) (or (and (= next-line-indent 
> current-indent)
> +                                             key-value-rx-result)
> +                                        (and (pair? conditionals)
> +                                             (= next-line-indent (first 
> indents))
> +                                             (string-prefix? "else" 
> next-line))))
> +            ((is-multi-line-key-value?) (and (> next-line-indent 
> current-indent)
> +                                             key-value-rx-result))
> +            ((end-of-cond?)
> +             (and (pair? conditionals)
> +                  (or (and (= next-line-indent (first indents))
> +                           (not (string-prefix? "else" next-line)))
> +                      (< next-line-indent (first indents)))))
> +            ((is-else?) (and (pair? conditionals)
> +                             (= current-indent (first indents))
> +                             (string-prefix? "else" line)))
> +            ((condition) (cabal-conditional-line->sexp line))
> +            ((key) (and=> key-value-rx-result
> +                          (lambda (rx-res)
> +                            (string-downcase (match:substring rx-res 1)))))
> +            ((value) (and=> key-value-rx-result
> +                            (cut match:substring <> 2))))
> +         (cond
> +          (end-of-section?
> +           (if (pair? indents)
> +               (state-reduce-indentation (1- (length indents)) #f result 
> lines)
> +               (return result)))
> +          (last-line-of-section?
> +           (if (pair? indents)
> +               (state-reduce-indentation
> +                (1- (length indents)) (list key `(,value)) result (cdr 
> lines))
> +               (mbegin %current-monad
> +                 (set-current-state 
> +                  (set-fields state ((cabal-parse-state-lines) (cdr lines))))
> +                 (return (cons (list key `(,value)) result)))))
> +          (is-simple-key-value?
> +           (>>= (state-add-entry (list key `(,value)) result (cdr lines))
> +                parse-cabal-section))
> +          (is-multi-line-key-value?
> +           (let*-values
> +               ;; VALUE-LST is the full multi-line value and LINES are the
> +               ;; remaining lines to be parsed (from the line following the
> +               ;; multi-line value).  We need to check if we are at the end 
> of
> +               ;; a conditional or at the end of the section.
> +               (((value-lst lines)
> +                 (multi-line-value (cdr lines)
> +                                   (if (string-null? value) '() `(,value))))
> +                ((ind line) (if (null? lines)
> +                                (values 0 "")
> +                                (line-indentation+rest (first lines))))
> +                ((end-of-cond?) (and (pair? conditionals)
> +                                     (or (and (= ind (first indents))
> +                                              (not (string-prefix? "else" 
> line)))
> +                                         (< ind (first indents)))))
> +                ;; If IND is not in INDENTS, assume that we are at the end of
> +                ;; the section.
> +                ((idx) (or (and=>
> +                            (list-index (cut = ind <>) indents)
> +                            (cut + <> (if (string-prefix? "else" line) -1 
> 0)))
> +                           (1- (length indents)))))
> +             (if end-of-cond?
> +                 (>>= (state-reduce-indentation idx (list key value-lst)
> +                                                result lines)
> +                      parse-cabal-section)
> +                 (>>= (state-add-entry (list key value-lst) result lines)
> +                      parse-cabal-section))))
> +          (end-of-cond?
> +           (let ((idx (+ (list-index (cut = next-line-indent <>) indents)
> +                         (if (string-prefix? "else" next-line) -1 0))))
> +             (>>= (state-reduce-indentation idx (list key `(,value)) result
> +                                         (if (pair? lines) (cdr lines) '()))
> +                  parse-cabal-section)))
> +          (is-else?
> +           (mbegin %current-monad
> +             (set-current-state 
> +              (set-fields state
> +                          ((cabal-parse-state-lines) (cdr lines))
> +                          ((cabal-parse-state-true-group?) #f)))
> +             (parse-cabal-section result)))
> +          (condition
> +           (mbegin %current-monad
> +             (state-add-conditional condition current-indent)
> +             (parse-cabal-section result)))))))))

This one is also very intimidating and it seems to duplicate some of the
code of the previous one, so maybe the propose <cabal-state> procedures
will help here as well.

> +(define (state-reduce-indentation index entry result lines)

s/reduce/decrease/

> +  "Given RESULT, if ENTRY is not #f, add it as appropriate and return the
> +updated result as a value in the state monad.  Update the state according to
> +the reduction of the indentation level specified by INDEX, an index of an

s/reduction/decrease/

> +entry in the 'indentations' field of the state.

Could you explain what RESULT and ENTRY are?  Also, it seems to do two
different things: compute a value function of RESULT and ENTRY, and
update the current indentation.  Should it be two separate procedures?

> As an example, if there are
> +two nested conditional levels, the first starting at indentation 2 and the
> +second at indentation 4, then the 'indentations' field of state is '(4 2) and
> +an INDEX value of 0 means that the second conditional is finished.  Set the
> +remaining lines to be parsed to LINES."
> +  (lambda (state)
> +    (match state
> +      (($ <cabal-parse-state> _ minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       ;; The suffix '-d' stays for 'drop'.
> +       (let*-values (((inds-d inds) (split-at indents (1+ index)))
> +                     ((conds-d conds) (split-at conditionals (1+ index)))
> +                     ((t-g?-s-d t-g?-s)
> +                      (if (> (length true-group?-stack) index)
> +                          (split-at true-group?-stack (1+ index))
> +                          (values true-group?-stack '())))
> +                     ((t-g-s-d t-g-s)
> +                      (if (> (length true-group-stack) index)
> +                          (split-at true-group-stack (1+ index))
> +                          (values true-group-stack '())))
> +                     ((f-g-s-d f-g-s)
> +                      (if (> (length false-group-stack) index)
> +                          (split-at false-group-stack (1+ index))
> +                          (values false-group-stack '())))
> +                     ((t-g?)
> +                      (if (> (length true-group?-stack) index) 
> +                          (last t-g?-s-d) #t))
> +                     ((t-g) (if (and true-group? entry)
> +                                (cons entry true-group)
> +                                true-group))
> +                     ((f-g) (if (or true-group? (not entry))
> +                                false-group
> +                                (cons entry false-group)))
> +                     ((res) result))
> +         (let reduce-by-one ((conds-d conds-d) (t-g t-g) (f-g f-g) (res res)
> +                             (t-g?-s-d t-g?-s-d) (t-g-s-d t-g-s-d) 
> +                             (f-g-s-d f-g-s-d))

This is somewhat scary ;-) but I’m not sure how to improve it.

> +           (values '*unspecified*

Did you mean *unspecified*, without the quote, which evaluates to *the*
unspecified value?

> +(define-record-type <cabal-package>
> +  (make-cabal-package name version license home-page source-repository
> +                      synopsis description
> +                      executables lib test-suites
> +                      flags eval-environment)
> +  cabal-package?
> +  (name   cabal-package-name)
> +  (version cabal-package-version)
> +  (license cabal-package-license)
> +  (home-page cabal-package-home-page)
> +  (source-repository cabal-package-source-repository)
> +  (synopsis cabal-package-synopsis)
> +  (description cabal-package-description)
> +  (executables cabal-package-executables)
> +  (lib cabal-package-library) ; 'library' is a Scheme keyword

There are no keyboards in Scheme.  :-)


[...]

> +  (define (impl haskell)
> +    (let* ((haskell-implementation (or (assoc-ref env "impl") "ghc"))
> +           (impl-rx-result-with-version
> +            (string-match "([a-zA-Z0-9_]+)-([0-9.]+)" 
> haskell-implementation))
> +           (impl-name (or (and=> impl-rx-result-with-version
> +                                 (cut match:substring <> 1))
> +                          haskell-implementation))
> +           (impl-version (and=> impl-rx-result-with-version
> +                                (cut match:substring <> 2)))
> +           (cabal-rx-result-with-version
> +            (string-match "([a-zA-Z0-9_-]+) *([<>=]+) *([0-9.]+) *" haskell))
> +           (cabal-rx-result-without-version 
> +            (string-match "([a-zA-Z0-9_-]+)" haskell))
> +           (cabal-impl-name (or (and=> cabal-rx-result-with-version
> +                                       (cut match:substring <> 1))
> +                                (match:substring 
> +                                 cabal-rx-result-without-version 1)))
> +           (cabal-impl-version (and=> cabal-rx-result-with-version
> +                                      (cut match:substring <> 3)))
> +           (cabal-impl-operator (and=> cabal-rx-result-with-version
> +                                       (cut match:substring <> 2)))
> +           (comparison (and=> cabal-impl-operator
> +                              (cut string-append "string" <>))))

Again I feel we need one or more auxiliary procedures and/or data types
here to simplify this part (fewer local variables), as well as shorter
identifiers.  WDYT?

> --- a/tests/hackage.scm
> +++ b/tests/hackage.scm
> @@ -63,16 +63,13 @@ executable cabal
>  ")
>  

The existing tests here are fine, but they are more like integration
tests (they test the whole pipeline.)  Maybe it would be nice to
directly exercise ‘read-cabal’ and ‘eval-cabal’ individually?

Thanks for all of this!

Ludo’.

Re: hackage importer

Reply via email to