Federico Beffa <be...@ieee.org> skribis: > Please find attached a patch reorganizing the code as you suggest.
Woow, neat! Impressive work. I think this is a great improvement. I have a bunch of stylistic comments below, and some open questions as well. > From bc8cdab1e322a25002a3d9cf33eddd856c8a81d8 Mon Sep 17 00:00:00 2001 > From: Federico Beffa <be...@fbengineering.ch> > Date: Sun, 26 Apr 2015 11:22:29 +0200 > Subject: [PATCH] import: hackage: Refactor parsing code and add new option. > > * guix/import/cabal.scm: New file. > > * guix/import/hackage.scm: Update to use the new Cabal parsing module. > > * tests/hackage.scm: Update tests for private functions. > > * guix/scripts/import/hackage.scm: Add new '--cabal-environment' option. > > * doc/guix.texi: ... and document it. > > * Makefile.am (MODULES): Add 'guix/import/cabal.scm', > 'guix/import/hackage.scm' and 'guix/scripts/import/hackage.scm'. > (SCM_TESTS): Add 'tests/hackage.scm'. No newlines between entries. [...] > +;; This record stores the state information needed during parsing of Cabal > +;; files. > +(define-record-type <cabal-parse-state> > + (make-cabal-parse-state lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > false-group-stack) [...] > + (make-cabal-parse-state lines -1 '() '() #t '() '() '() '() > '()))) I’m not claiming this must done now, but it may improve readability to use ‘define-record-type*’. That way, with appropriate field default values, one could write something like: (cabal-parse-state (lines lines)) That would also allow the use of ‘inherit’, which is slightly less verbose than ‘set-fields’. WDYT? Besides, could you add comments to explain the meaning of the various fields? I’m particularly curious about ‘true-group?’ & co. ;-) > +(define (parse-cabal result) > + "Parse a Cabal file and append its content to RESULT (a list). Return the > +updated result as a monadic value in the state monad." > + (mlet* %state-monad ((state (current-state))) > + (match state > + (($ <cabal-parse-state> lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + (let*-values > + (((current-indent line) > + (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((next-line-indent next-line) > + (if (or (null? lines) (null? (cdr lines))) > + (values 0 "") > + (line-indentation+rest (second lines)))) > + ((key-value-rx-result) (has-key? line)) > + ((end-of-file?) (null? lines)) > + ((is-simple-key-value?) (and (= next-line-indent current-indent) > + key-value-rx-result)) > + ((is-multi-line-key-value?) (and (> next-line-indent > current-indent) > + key-value-rx-result)) > + ((key) (and=> key-value-rx-result > + (lambda (rx-res) > + (string-downcase (match:substring rx-res 1))))) > + ((value) (and=> key-value-rx-result (cut match:substring <> 2)))) > + (cond > + (end-of-file? (return (reverse result))) > + (is-simple-key-value? > + (>>= (state-add-entry (list key `(,value)) result (cdr lines)) > + parse-cabal)) > + (is-multi-line-key-value? > + (let*-values > + (((value-lst lines) > + (multi-line-value (cdr lines) > + (if (string-null? value) '() `(,value))))) > + (>>= (state-add-entry (list key value-lst) result lines) > + parse-cabal))) > + (else ; it's a section > + (let* ((section-header (string-tokenize (string-downcase line))) > + (section-type (string->symbol (first section-header))) > + (section-name (if (> (length section-header) 1) > + (second section-header) > + ""))) > + (mbegin %current-monad > + (set-current-state > + (set-fields state > + ((cabal-parse-state-minimum-indent) > current-indent) > + ((cabal-parse-state-lines) (cdr lines)))) > + (>>= > + (>>= (parse-cabal-section '()) > + (lambda (section-contents) > + (mlet* %state-monad ((state (current-state))) > + (mbegin %current-monad > + (set-current-state > + (set-fields state > + ((cabal-parse-state-minimum-indent) > -1))) > + (return > + (cons (append > + (if (string-null? section-name) > + (list 'section section-type) > + (list 'section section-type > section-name)) > + (list section-contents)) > + result)))))) > + parse-cabal)))))))))) This procedure is intimidating. I think this is partly due to its length, to the big let-values, the long identifiers, the many local variables, nested binds, etc. Would it be possible to create auxiliary procedures that would help? I’m thinking of procedures that take a <cabal-parse-state> and return the necessary info, like ‘cabal-parse-state-indentation’, ‘cabal-parse-state-key’, ‘cabal-parse-state-multiline?’, ‘cabal-parse-state-eof?’, etc. WDYT? Also, please try hard to avoid car and cdr and use ‘match’ instead. (BTW it’s a good idea to use the state monad here!) > +(define (parse-cabal-section result) > + "Parse a section of a cabal file and append its content to RESULT (a list). > +Return the updated result as a value in the state monad." > + (mlet* %state-monad ((state (current-state))) > + (match state > + (($ <cabal-parse-state> lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + (let*-values > + (((current-indent line) > + (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((next-line-indent next-line) > + (if (or (null? lines) (null? (cdr lines))) > + (values 0 "") > + (line-indentation+rest (second lines)))) > + ((key-value-rx-result) (has-key? line)) > + ((end-of-section?) (or (<= current-indent minimum-indent) > + (null? lines))) > + ;; If this is the last line of the section, then it can't be the > + ;; start of a conditional or an 'else'. > + ((last-line-of-section?) (<= next-line-indent minimum-indent)) > + ((is-simple-key-value?) (or (and (= next-line-indent > current-indent) > + key-value-rx-result) > + (and (pair? conditionals) > + (= next-line-indent (first > indents)) > + (string-prefix? "else" > next-line)))) > + ((is-multi-line-key-value?) (and (> next-line-indent > current-indent) > + key-value-rx-result)) > + ((end-of-cond?) > + (and (pair? conditionals) > + (or (and (= next-line-indent (first indents)) > + (not (string-prefix? "else" next-line))) > + (< next-line-indent (first indents))))) > + ((is-else?) (and (pair? conditionals) > + (= current-indent (first indents)) > + (string-prefix? "else" line))) > + ((condition) (cabal-conditional-line->sexp line)) > + ((key) (and=> key-value-rx-result > + (lambda (rx-res) > + (string-downcase (match:substring rx-res 1))))) > + ((value) (and=> key-value-rx-result > + (cut match:substring <> 2)))) > + (cond > + (end-of-section? > + (if (pair? indents) > + (state-reduce-indentation (1- (length indents)) #f result > lines) > + (return result))) > + (last-line-of-section? > + (if (pair? indents) > + (state-reduce-indentation > + (1- (length indents)) (list key `(,value)) result (cdr > lines)) > + (mbegin %current-monad > + (set-current-state > + (set-fields state ((cabal-parse-state-lines) (cdr lines)))) > + (return (cons (list key `(,value)) result))))) > + (is-simple-key-value? > + (>>= (state-add-entry (list key `(,value)) result (cdr lines)) > + parse-cabal-section)) > + (is-multi-line-key-value? > + (let*-values > + ;; VALUE-LST is the full multi-line value and LINES are the > + ;; remaining lines to be parsed (from the line following the > + ;; multi-line value). We need to check if we are at the end > of > + ;; a conditional or at the end of the section. > + (((value-lst lines) > + (multi-line-value (cdr lines) > + (if (string-null? value) '() `(,value)))) > + ((ind line) (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((end-of-cond?) (and (pair? conditionals) > + (or (and (= ind (first indents)) > + (not (string-prefix? "else" > line))) > + (< ind (first indents))))) > + ;; If IND is not in INDENTS, assume that we are at the end of > + ;; the section. > + ((idx) (or (and=> > + (list-index (cut = ind <>) indents) > + (cut + <> (if (string-prefix? "else" line) -1 > 0))) > + (1- (length indents))))) > + (if end-of-cond? > + (>>= (state-reduce-indentation idx (list key value-lst) > + result lines) > + parse-cabal-section) > + (>>= (state-add-entry (list key value-lst) result lines) > + parse-cabal-section)))) > + (end-of-cond? > + (let ((idx (+ (list-index (cut = next-line-indent <>) indents) > + (if (string-prefix? "else" next-line) -1 0)))) > + (>>= (state-reduce-indentation idx (list key `(,value)) result > + (if (pair? lines) (cdr lines) '())) > + parse-cabal-section))) > + (is-else? > + (mbegin %current-monad > + (set-current-state > + (set-fields state > + ((cabal-parse-state-lines) (cdr lines)) > + ((cabal-parse-state-true-group?) #f))) > + (parse-cabal-section result))) > + (condition > + (mbegin %current-monad > + (state-add-conditional condition current-indent) > + (parse-cabal-section result))))))))) This one is also very intimidating and it seems to duplicate some of the code of the previous one, so maybe the propose <cabal-state> procedures will help here as well. > +(define (state-reduce-indentation index entry result lines) s/reduce/decrease/ > + "Given RESULT, if ENTRY is not #f, add it as appropriate and return the > +updated result as a value in the state monad. Update the state according to > +the reduction of the indentation level specified by INDEX, an index of an s/reduction/decrease/ > +entry in the 'indentations' field of the state. Could you explain what RESULT and ENTRY are? Also, it seems to do two different things: compute a value function of RESULT and ENTRY, and update the current indentation. Should it be two separate procedures? > As an example, if there are > +two nested conditional levels, the first starting at indentation 2 and the > +second at indentation 4, then the 'indentations' field of state is '(4 2) and > +an INDEX value of 0 means that the second conditional is finished. Set the > +remaining lines to be parsed to LINES." > + (lambda (state) > + (match state > + (($ <cabal-parse-state> _ minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + ;; The suffix '-d' stays for 'drop'. > + (let*-values (((inds-d inds) (split-at indents (1+ index))) > + ((conds-d conds) (split-at conditionals (1+ index))) > + ((t-g?-s-d t-g?-s) > + (if (> (length true-group?-stack) index) > + (split-at true-group?-stack (1+ index)) > + (values true-group?-stack '()))) > + ((t-g-s-d t-g-s) > + (if (> (length true-group-stack) index) > + (split-at true-group-stack (1+ index)) > + (values true-group-stack '()))) > + ((f-g-s-d f-g-s) > + (if (> (length false-group-stack) index) > + (split-at false-group-stack (1+ index)) > + (values false-group-stack '()))) > + ((t-g?) > + (if (> (length true-group?-stack) index) > + (last t-g?-s-d) #t)) > + ((t-g) (if (and true-group? entry) > + (cons entry true-group) > + true-group)) > + ((f-g) (if (or true-group? (not entry)) > + false-group > + (cons entry false-group))) > + ((res) result)) > + (let reduce-by-one ((conds-d conds-d) (t-g t-g) (f-g f-g) (res res) > + (t-g?-s-d t-g?-s-d) (t-g-s-d t-g-s-d) > + (f-g-s-d f-g-s-d)) This is somewhat scary ;-) but I’m not sure how to improve it. > + (values '*unspecified* Did you mean *unspecified*, without the quote, which evaluates to *the* unspecified value? > +(define-record-type <cabal-package> > + (make-cabal-package name version license home-page source-repository > + synopsis description > + executables lib test-suites > + flags eval-environment) > + cabal-package? > + (name cabal-package-name) > + (version cabal-package-version) > + (license cabal-package-license) > + (home-page cabal-package-home-page) > + (source-repository cabal-package-source-repository) > + (synopsis cabal-package-synopsis) > + (description cabal-package-description) > + (executables cabal-package-executables) > + (lib cabal-package-library) ; 'library' is a Scheme keyword There are no keyboards in Scheme. :-) [...] > + (define (impl haskell) > + (let* ((haskell-implementation (or (assoc-ref env "impl") "ghc")) > + (impl-rx-result-with-version > + (string-match "([a-zA-Z0-9_]+)-([0-9.]+)" > haskell-implementation)) > + (impl-name (or (and=> impl-rx-result-with-version > + (cut match:substring <> 1)) > + haskell-implementation)) > + (impl-version (and=> impl-rx-result-with-version > + (cut match:substring <> 2))) > + (cabal-rx-result-with-version > + (string-match "([a-zA-Z0-9_-]+) *([<>=]+) *([0-9.]+) *" haskell)) > + (cabal-rx-result-without-version > + (string-match "([a-zA-Z0-9_-]+)" haskell)) > + (cabal-impl-name (or (and=> cabal-rx-result-with-version > + (cut match:substring <> 1)) > + (match:substring > + cabal-rx-result-without-version 1))) > + (cabal-impl-version (and=> cabal-rx-result-with-version > + (cut match:substring <> 3))) > + (cabal-impl-operator (and=> cabal-rx-result-with-version > + (cut match:substring <> 2))) > + (comparison (and=> cabal-impl-operator > + (cut string-append "string" <>)))) Again I feel we need one or more auxiliary procedures and/or data types here to simplify this part (fewer local variables), as well as shorter identifiers. WDYT? > --- a/tests/hackage.scm > +++ b/tests/hackage.scm > @@ -63,16 +63,13 @@ executable cabal > ") > The existing tests here are fine, but they are more like integration tests (they test the whole pipeline.) Maybe it would be nice to directly exercise ‘read-cabal’ and ‘eval-cabal’ individually? Thanks for all of this! Ludo’.