Knuth's literate programming "tangle" function in Clojure

Tim Daly Sat, 25 Dec 2010 22:02:48 -0800

 ;  0 AUTHOR and LICENSE
;  1 ABSTRACT and USE CASES
;  2 THE LATEX SUPPORT CODE
;  3 IMPORTS
;  4 THE TANGLE COMMAND
;  5 SAY
;  6 READ-FILE
;  7 ISCHUNK
;  8 HASHCHUNKS
;  9 EXPAND
; 10 TANGLE



;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 0 AUTHOR and LICENSE

;;; Timothy Daly (d...@axiom-developer.org)
;;; License: Public Domain

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 1 ABSTRACT and USE CASES

;;; Don Knuth has defined literate programming as a combination of
;;; documentation and source code in a single file. The TeX language
;;; is documented this way in books. Knuth defined two functions
;;;    tangle -> extract the source code from a literate file
;;;    weave  -> extract the latex from a literate file

;;; This seems unnecessarily complex. Latex is a full programming
;;; language and is capable of defining "environments" that can
;;; handle code directly in Latex. Here we define the correct environment
;;; macros. Thus, the "weave" function is not needed.

;;; If this "tangle" function were added to Clojure then Clojure could
;;; read literate files in Latex format and extract the code. We create
;;; the necessary "tangle" function here.



;;; This program will extract the source code from a literate file.

;;; A literate lisp file contains a mixture of latex and lisp sources code.
;;; The file is intended to be in standard latex format. In order to
;;; delimit code chunks we define a latex "chunk" environment.

;;; Latex format files defines a newenvironment so that code chunks
;;; can be delimited by \begin{chunk}{name} .... \end{chunk} blocks
;;; This is supported by the following latex code.

;;; So a trivial example of a literate latex file might look like
;;; (ignore the prefix semicolons. that's for lisp)

; this is a file that is in a literate
; form it has a chunk called
; \begin{chunk}{first chunk}
; THIS IS THE FIRST CHUNK
; \end{chunk}
; and this is a second chunk
; \begin{chunk}{second chunk}
; THIS IS THE SECOND CHUNK
; \end{chunk}
; and this is more in the first chunk
; \begin{chunk}{first chunk}
; \getchunk{second chunk}
; THIS IS MORE IN THE FIRST CHUNK
; \end{chunk}
; \begin{chunk}{all}
; \getchunk{first chunk}
; \getchunk{second chunk}
; \end{chunk}
; and that's it

;;; From a file called "testcase" that contains the above text
;;; we want to extract the chunk names "second chunk". We do this with:

; (tangle "testcase" "second chunk")

; which yields:

; THIS IS THE SECOND CHUNK

;;; From the same file we might extract the chunk named "first chunk".
;;; Notice that this has the second chunk embedded recursively inside.
;;; So we execute:

; (tangle "testcase" "first chunk")

; which yields:

; THIS IS THE FIRST CHUNK
; THIS IS THE SECOND CHUNK
; THIS IS MORE IN THE FIRST CHUNK

;;; There is a third chunk called "all" which will extract both chunks:

; (tangle "testcase" "all")

; which yields

; THIS IS THE FIRST CHUNK
; THIS IS THE SECOND CHUNK
; THIS IS MORE IN THE FIRST CHUNK
; THIS IS THE SECOND CHUNK

;;; The tangle function takes a third argument which is the name of
;;; an output file. Thus, you can write the same results to a file with:

; (tangle "testcase" "all" "outputfile")

;;; It is also worth noting that all chunks with the same name will be
;;; merged into one chunk so it is possible to split chunks in mulitple
;;; parts and have them extracted as one. That is,

; \begin{chunk}{a partial chunk}
; part 1 of the partial chunk
; \end{chunk}
; not part of the chunk
; \begin{chunk}{a partial chunk}
; part 2 of the partial chunk
; \end{chunk}

;;; These will be combined on output as a single chunk. Thus

; (tangle "testmerge" "a partial chunk")

; will yield

; part 1 of the partial chunk
; part 2 of the partial chunk


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 2 THE LATEX SUPPORT CODE

;;; The verbatim package quotes everything within its grasp and is used to
;;; hide and quote the source code during latex formatting. The verbatim
;;; environment is built in but the package form lets us use it in our
;;; chunk environment and it lets us change the font.
;;;
;;; \usepackage{verbatim}
;;;
;;; Make the verbatim font smaller
;;; Note that we have to temporarily change the '@' to be just a character
;;; because the \verba...@font name uses it as a character
;;;
;;; \chardef\atcode=\catcode`\@
;;; \catcod...@=11
;;; \renewcommand{\verba...@font}{\ttfamily\small}
;;; \catcod...@=\atcode

;;; This declares a new environment named ``chunk'' which has one
;;; argument that is the name of the chunk. All code needs to live
;;; between the \begin{chunk}{name} and the \end{chunk}
;;; The ``name'' is used to define the chunk.
;;; Reuse of the same chunk name later concatenates the chunks

;;; For those of you who can't read latex this says:
;;; Make a new environment named chunk with one argument
;;; The first block is the code for the \begin{chunk}{name}
;;; The second block is the code for the \end{chunk}
;;; The % is the latex comment character

;;; We have two alternate markers, a lightweight one using dashes
;;; and a heavyweight one using the \begin and \end syntax
;;; You can choose either one by changing the comment char in column 1

;;; \newenvironment{chunk}[1]{%   we need the chunkname as an argument
;;; {\ }\newline\noindent%                    make sure we are in column 1
;;; %{\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}% alternate begin mark
;;; \hbox{\hskip 2.0cm}{\bf --- #1 ---}%      mark the beginning
;;; \verbatim}%                               say exactly what we see
;;; {\endverbatim%                            process \end{chunk}
;;; \par{}%                                   we add a newline
;;; \noindent{}%                              start in column 1
;;; \hbox{\hskip 2.0cm}{\bf ----------}%      mark the end
;;; %$\backslash{}$end\{chunk\}%              alternate end mark (commented)
;;; \par%                                     and a newline
;;; \normalsize\noindent}%                    and return to the document

;;; This declares the place where we want to expand a chunk
;;; Technically we don't need this because a getchunk must always
;;; be properly nested within a chunk and will be verbatim.

;;; \providecommand{\getchunk}[1]{%
;;; \noindent%
;;; {\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}}% mark the reference

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 3 IMPORTS

(import [java.io BufferedReader FileReader BufferedWriter FileWriter])
(import [java.lang String])

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 4 THE TANGLE COMMAND

;;;
;;; The tangle command does all of the work of extracting code.
;;;
;;; In latex form the code blocks are delimited by
;;;     \begin{chunk}{name}
;;;     ... (code for name)...
;;;     \end{chunk}
;;;
;;; and referenced by \getchunk{name} which gets replaced by the code

;;; There are several ways to invoke the tangle function.
;;;
;;; The first argument is always the file from which to extract code
;;;
;;; The second argument is the name of the chunk to extract
;;;        (tangle "clweb.pamphlet" "name")
;;;
;;; The standard chunk name is ``*'' but any name can be used.
;;;
;;; The third arument is the name of an output file:
;;;  (tangle "clweb.pamphlet" "clweb.chunk" "clweb.spadfile")

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 5 SAY

;;; This function will either write to the output file, or if null,
;;; to the console

(defn say [where what]
 (if where
  (do (.write where what) (.write where "\n"))
  (println what)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 6 READ-FILE

;;; Here we return a lazy sequence that will fetch lines as we need them
;;; from the file.

(defn read-file [streamname]
 ^{:doc "Implement read-sequence in GCL"}
 (let [stream (BufferedReader. (FileReader. streamname))]
  (line-seq stream)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 7 ISCHUNK

;;; There is a built-in assumption (in the ischunk functions)
;;; that the chunks occur on separate lines and that the indentation
;;; of the chunk reference has no meaning.
;;;
;;; ischunk recognizes chunk names in latex convention
;;;
;;; There are 3 cases to recognize:
;;;  \begin{chunk}{thechunkname}  ==> 'define thechunkname
;;;  \end{chunk}                  ==> 'end nil
;;;  \getchunk{thechunkname}      ==> 'refer thechunkname

;;; The regex pattern #"^\\begin\{chunk\}\{.*\}$" matches
;;; \begin{chunk}{anything here}

;;; The regex pattern #"^\\end\{chunk\}$" matches
;;; \end{chunk}

;;; The regex pattern #"^\\getchunk\{.*\}$" matches
;;; \getchunk{anything here}

(defn ischunk [line]
 ^{:doc "Find chunks delimited by latex syntax"}
 (let [ begin   #"^\\begin\{chunk\}\{.*\}$"
        end     #"^\\end\{chunk\}$"
        get     #"^\\getchunk\{.*\}$"
        trimmed (.trim line) ]
  (cond
   (re-find begin trimmed)
     (list 'define (apply str (butlast (drop 14 trimmed))))
   (re-find end trimmed)
     (list 'end nil)
   (re-find get trimmed)
     (list 'refer trimmed)
   :else
     (list nil trimmed))))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 8 HASHCHUNKS

;;; hashchunks gathers the chunks and puts them in the hash table
;;;
;;; if we find the chunk syntax and it is a
;;;   define ==> parse the chunkname and start gathering lines onto a stack
;;;   end    ==> push the completed list of lines into a stack of chunks
;;;              already in the hash table
;;;   otherwise ==> if we are gathering, push the line onto the stack

;;; a hash table entry is a list of lists such as
;;; (("6" "5") ("4" "3") ("2" "1"))
;;; each of the sublists is a set of lines in reverse (stack) order
;;; each sublist is a single chunk of lines.
;;; there is a new sublist for each reuse of the same chunkname

;;; Calls to ischunk can have 4 results (define, end, refer, nil) where
;;;   define ==> we found a \begin{chunk}{...}
;;;   end    ==> we found a \end{chunk}
;;;   refer  ==> we found a \getchunk{...}
;;;   nil    ==> ordinary text or program text
;;;
;;; gather is initially false, implying that we are not gathering code.
;;; gather is true if we are gathering a chunk

(defn hashchunks [lines]
 ^{:doc "Gather all of the chunks and put them into a hash table"}
 (loop [ line      lines
         gather    false
         hash      (hash-map)
         chunkname "" ]
  (if (not (empty? line))
   (let [[key value] (ischunk (first line))]
    (condp = key
     'define
       (recur (rest line) true hash value)
     'end
       (recur (rest line) false hash chunkname)
     'refer
       (if gather
         (recur (rest line) gather

(assoc hash chunkname (conj (get hash chunkname) value))chunkname)

         (recur (rest line) gather hash chunkname))
     nil
       (if gather
         (recur (rest line) gather

(assoc hash chunkname (conj (get hash chunkname) value))chunkname)

         (recur (rest line) gather hash chunkname))))
   hash)))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 9 EXPAND

;;; expand will recursively expand chunks in the hash table
;;;
;;; latex chunk names are just the chunkname itself e.g. chunkname

;;; a hash table key is the chunk name and the value is a reverse
;;; list of all of the text in that chunk.
;;; To process the chunk we reverse the main list and
;;; for each sublist we reverse the sublist and process the lines

;;; if a chunk name reference is encountered in a line we call expand
;;; recursively to expand the inner chunkname.

(defn expand [chunkname where table]
 ^{:doc recursively expand latex getchunk tags}
 (let [chunk (reverse (get table chunkname))]
  (when chunk
   (loop [lines chunk]
    (when (not (empty? lines))
     (let [line (first lines)]
      (let [[key value] (ischunk line)]
       (if (= key 'refer)
        (do
         (expand (apply str (butlast (drop 10 value))) where table)
         (recur (rest lines)))
        (do (say where line)
         (recur (rest lines)))))))))))


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; 10 TANGLE

;;; We expand all of the lines in the file that are surrounded by the
;;; requested chunk name. These chunk names are looked up in the hash
;;; table built by hashchunks, given the input filename.
;;; then we recursively expand the ``topchunk'' to the output stream

(defn tangle

^{:doc "Extract the source code from a pamphlet file, optional fileoutput"}

 ([filename topchunk] (tangle filename topchunk nil))
 ([filename topchunk file]
  (if (string? file)
   (with-open [where (BufferedWriter. (FileWriter. file))]
    (expand topchunk where (hashchunks (read-file filename))))
   (expand topchunk nil (hashchunks (read-file filename))))))


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Knuth's literate programming "tangle" function in Clojure

Reply via email to