Re: Literate Executables

Tim Daly Sun, 04 Dec 2022 00:15:25 -0800

In fact, making your program 'literate' (aka, a book ) is easy.

The first steps (1)-(6) are only done once to get your program in book form.
Note that this works with any programming language. I normally use Lisp
but I'll illustrate it with C code.

The last step (7) will create a PDF and also execute your code.
You run step (7) which just requires typing 'make' any time you
want to see the book and run the code.

So now you're in a "hot loop", writing explanations and changing your
literate code, typing 'make' to remake the book and rerun the code while
continuing to create a literate program in a PDF.

To explain in detail, assume you have "hello.c" and want it literate.

(1) convert your program to a latex file. Copy hello.c to hello.tex.

Starting with the your hello.c:

====================================
#include <stdio.h>
int main() {
  printf("hello\n");
  return 0;
}
=====================================

At the command line type:
=====================================
cp hello.c hello.tex
=====================================

(2) wrap your code into latex using macros (attached below)

There is a latex macro pair (attached in a latex style file below called
chunk.sty) which defines \begin{chunk} and \end{chunk} that wrap anything
in a verbatim-style block. The \begin{chunk} takes an argument which is any
word or sentence, as in \begin{chunk}{int main}. So you walk your source
file(s)
inserting these macros around every function / struct / define, etc. So,
for hello.c
you create a copy, call it hello.tex which contains your C code with latex
blocks.
This is tedious but trivial.

Your hello.tex file looks like (the chunk name can be anything)
======================================================
\begin{chunk}{includes}
#include stdio.h
\end{chunk}

\begin{chunk}{main}
int main() {
  printf("hello\n"};
  return 0;
}
\end{chunk}
========================================================

(3) automate the extraction from your hello.tex to hello.c

Now you'd like to make it so your program can be extracted easily.
Make a new chunk using any name that collects your chunks: The
third macro, called getchunk, will insert the named chunk inline.

at the end of hello.tex add a new chunk:
===================================================
\begin{chunk}{all}
\getchunk{includes}
\getchunk{main}
\end{chunk}
====================================================

(4) check that you can recreate the original hello.c from hello.tex

The second too is a program called 'tanglec' which, given the name of a
chunk will extract it to stdout. So we ask for the whole program and put it
into
extracted.c with and check that it is byte compatible with hello.c:
(Note: tanglec is attached below. gcc -o tanglec tanglec.c)

at the command line type:
===========================================
./tanglec hello.tex all >extracted.c
diff -Naur extracted.c hello.c
===========================================

(5) make hello.tex into a PDF. Just add some trivial latex header/footer.
For example add a header: (Note: chunk.sty is attached below)

So now your hello.tex looks like:
============================================

\documentclass[dvipdfm]{book}
\usepackage{chunk}
\begin{document}
\title{A Literate Hello}
\author{Timothy Daly}
\maketitle

\chapter{We need includes}
\section{This is the usual one for printf}
\begin{chunk}{includes}
#include <stdio.h>
\end{chunk}

\chapter{This is where the magic happens}
\begin{chunk}{main}
int main() {
  printf("hello\n");
  return 0;
}
\end{chunk}
\begin{chunk}{all}
\getchunk{includes}
\getchunk{main}
\end{chunk}

\end{document}

=================================================

(6) create a trivial makefile (note that indented lines require TABS, not
spaces)

in a file called 'makefile'
===================================================
doit:
    latex hello.tex
    dvipdfm hello.dvi
    evince hello.pdf &
    tanglec hello.tex all >hello.c
    gcc -o hello hello.c
    ./hello
==================================================

(7)

At the command line
===============================================
make
================================================

WIN! Now you write all your explanation and your code in the latex file.
Then type 'make' which recreates the book and runs the code. The
explanation and the code are always in sync. Now your code is a book.

Now you can add pictures, URLs, a table of contents, an index,
cross references, a bibliography, and even a spiffy new cover!
Oh, and someone can actually read your explanation of your code.

Make Knuth happy! Win a Pulitzer prize!

Tim

On Sat, Dec 3, 2022 at 8:20 AM Tim Daly <axiom...@gmail.com> wrote:

> Java code has the pseudo-ability to re-generate the original source
> through decompilation. Unfortunately given the size of any executable
> it would be years worth of work to reverse-engineer the understanding
> without explanation. I struggle to even understand the traceback of
> any Java failure :-)
>
> Open source, by definition, already has the source available.
> Only the URL of the github repository and the hash number corresponding
> to the current executable is needed.
>
> I haven't worked for IBM since 1995, the year IBM Research eliminated
> the math department, including me. My ground-breaking work in Artificial
> Intelligence has yet to be cited by anyone so their decision was probably
> wise.
>
> Tim
>
>
> On Sat, Dec 3, 2022 at 1:01 AM Terence Kelly <tpke...@eecs.umich.edu>
> wrote:
>
>>
>> Hi Tim,
>>
>> Your observations seem sound.  Keep in mind, however, that we're not
>> confronted with an either/or choice.  The chicken/egg aspect of literate
>> executables means that, in your context, we can arrange for the PDF to
>> generate code *and* for the code to generate PDF.  Cyclic dependency
>> graphs take some getting used to, but one can learn to love them.
>>
>> If literate execution isn't right for Axiom, perhaps it can benefit other
>> IBM open source projects.  If you circulate the paper among your
>> colleagues I'd be interested to see if they find useful applications.
>>
>> _Queue_ readers are remarkably creative and routinely find uses for Drill
>> Bits ideas that I never anticipated.  I wonder what your colleagues will
>> come up with.
>>
>> Thanks.
>>
>> -- Terence
>>
>>
>> On Fri, 2 Dec 2022, Tim Daly wrote:
>>
>> > ...
>> >
>> > The above considerations leads me to the conclusion that the PDF is the
>> > thing that generates code rather than the code generating the PDF.
>>
>

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>

// set this to 3 for further information
#define DEBUG 0

/* forward reference for the C compiler */
int getchunk(char *chunkname);

/* a memory mapped buffer copy of the file */
char *buffer;
int bufsize;

/* return the length of the next line */
int nextline(int i) {
  int j;
  if (i >= bufsize) return(-1);
  for (j=0; ((i+j < bufsize) && (buffer[i+j] != '\n')); j++);
  return(j);
}

/* output the line we need */
int printline(int i, int length) {
  int j;
  for (j=0; j<length; j++) { putchar(buffer[i+j]); }
  printf("\n");
  return(0);
}

/* handle begin{chunk}{chunkname}        */
/* is this chunk name we are looking for? &&
   does the line start with \begin{chunk}? &&
   is the next character a \{ &&
   is the last character after the chunkname a \}
*/
int foundchunk(int i, char *chunkname) {
  if ((strncmp(&buffer[i+14],chunkname,strlen(chunkname)) == 0) &&
      (strncmp(&buffer[i],"\\begin{chunk}",13) == 0) &&
      (buffer[i+13] == '{') &&
      (buffer[i+14+strlen(chunkname)] == '}')) {
    if (DEBUG==3) { printf("foundchunk(%s)\n",chunkname); }
    return(1); 
  }
  return(0);
}

/* handle end{chunk}   */
/* is it really an end? */
int foundEnd(int i, char* chunkname) {
  if ((buffer[i] == '\\') && 
      (strncmp(&buffer[i+1],"end{chunk}",10) == 0)) {
    if (DEBUG==3) { printf("foundEnd(%s)\n",chunkname); }
    return(1); 
  }
  return(0);
}

/* handle getchunk{chunkname} */
/* is this line a getchunk?    */
int foundGetchunk(int i, int linelen) {
  int len;
  if (strncmp(&buffer[i],"\\getchunk{",10) == 0) {
    for(len=0; ((len < linelen) && (buffer[i+len] != '}')); len++);
    return(len-10);
  }
  return(0);
}

/* Somebody did a getchunk and we need a copy of the name */
/* malloc string storage for a copy of the getchunk name  */
char *getChunkname(int k, int getlen) {
  char *result = (char *)malloc(getlen+1);
  strncpy(result,&buffer[k+10],getlen);
  result[getlen]='\0';
  return(result);
}
  
/* print lines in this chunk, possibly recursing into getchunk */
int printchunk(int i, int chunklinelen, char *chunkname) {
  int k;
  int linelen;
  char *getname;
  int getlen = 0;
  if (DEBUG==3) { printf("===   \\start{%s}   ===\n",chunkname); }
  for (k=i+chunklinelen+1; ((linelen=nextline(k)) != -1); ) {
    if ((getlen=foundGetchunk(k,linelen)) > 0) {
       getname = getChunkname(k,getlen);
       getchunk(getname);
       free(getname);
       k=k+getlen+12l;
    } else {
      if ((linelen >= 11) && (foundEnd(k,chunkname) == 1)) {
      if (DEBUG==3) { printf("===   \\end{%s}   ===\n",chunkname); }
      return(k+12);
    } else {
      if (DEBUG==2) { 
        printf("======== printchunk else %d %d\n",k,linelen); 
      }
      printline(k,linelen);
      k=k+linelen+1;
    }
  }}
  if (DEBUG==2) {
     printf("=================\\out{%s} %d\n",chunkname,k); 
  }
  return(k);
}

/* find the named chunk and call printchunk on it */
int getchunk(char *chunkname) {
  int i;
  int linelen;
  int chunklen = strlen(chunkname);
  if (DEBUG==3) { printf("getchunk(%s)\n",chunkname); }
  for (i=0; ((linelen=nextline(i)) != -1); ) {
    if (DEBUG==2) { 
      printf("----"); printline(i,linelen); printf("----\n"); 
    }
    if ((linelen >= chunklen+15) && (foundchunk(i,chunkname) == 1)) {
      if (DEBUG==2) {
         fprintf(stderr,"=================\\getchunk(%s)\n",chunkname); 
      }
      i=printchunk(i,linelen,chunkname);
    } else {
      i=i+linelen+1;
    }
  }
  if (DEBUG==2) { 
    fprintf(stderr,"=================getchunk returned=%d\n",i); 
  }
  return(i);
}

/* memory map the input file into the global buffer and get the chunk */
int main(int argc, char *argv[]) {
  int fd;
  struct stat filestat;
  if ((argc == 1) || (argc > 3)) { 
    perror("Usage: tangle filename chunkname");
    exit(-1);
  }
  fd = open(argv[1],O_RDONLY);
  if (fd == -1) {
    perror("Error opening file for reading");
    exit(-2);
  }
  if (fstat(fd,&filestat) < 0) {
    perror("Error getting input file size");
    exit(-3);
  }
  bufsize = (int)filestat.st_size;
  buffer = mmap(0,filestat.st_size,PROT_READ,MAP_SHARED,fd,0);
  if (buffer == MAP_FAILED) {
    close(fd);
    perror("Error reading the file");
    exit(-4);
  }
  if (argc == 2) {
    getchunk("*");
  } else {
    getchunk(argv[2]);
  }
  close(fd);
  return(0);
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% This defines the TeX support for Axiom.

%% Latex Chunk support
%% This is the chunk environment that replaces the use of web-like tools
%%
%% \begin{verbatim}
%% To use the command you would write
%%    \begin{chunk}{some random string}
%%    random code to be verbatim formatted
%%    \end{chunk}
%% 
%%  This version prints 
%%                     --- some random string ---
%%    random code to be verbatim formatted
%%                     --------------------------
%% \end{verbatim}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% The verbatim package quotes everything within its grasp and is used to
%%% hide and quote the source code during latex formatting. The verbatim
%%% environment is built in but the package form lets us use it in our
%%% chunk environment and it lets us change the font.
%%%

\usepackage{verbatim}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% 
%%% Make the verbatim font smaller
%%% Note that we have to temporarily change the '@' to be just a character
%%% because the \verbatim@font name uses it as a character
%%%

\chardef\atcode=\catcode`\@
\catcode`\@=11
\renewcommand{\verbatim@font}{\ttfamily\small}
\catcode`\@=\atcode

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% This declares a new environment named ``chunk'' which has one
%%% argument that is the name of the chunk. All code needs to live
%%% between the \begin{chunk}{name} and the \end{chunk}
%%% The ``name'' is used to define the chunk.
%%% Reuse of the same chunk name later concatenates the chunks

%%% For those of you who can't read latex this says:
%%% Make a new environment named chunk with one argument
%%% The first block is the code for the \begin{chunk}{name}
%%% The second block is the code for the \end{chunk}
%%% The % is the latex comment character

%%% We have two alternate markers, a lightweight one using dashes
%%% and a heavyweight one using the \begin and \end syntax
%%% You can choose either one by changing the comment char in column 1
 
\newenvironment{chunk}[1]{%   we need the chunkname as an argument
{\ }\newline\noindent%                    make sure we are in column 1
%{\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}% alternate begin mark
\hbox{\hskip 2.0cm}{\bf --- #1 ---}%      mark the beginning
\verbatim}%                               say exactly what we see
{\endverbatim%                            process \end{chunk}
\par{}%                                   we add a newline
\noindent{}%                              start in column 1
\hbox{\hskip 2.0cm}{\bf ----------}%      mark the end
%$\backslash{}$end\{chunk\}%              alternate end mark (commented)
\par%                                     and a newline
\normalsize\noindent}%                    and return to the document

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% This declares the place where we want to expand a chunk

\providecommand{\getchunk}[1]{%
\noindent%
{\small $\backslash{}$begin\{chunk\}\{{\bf #1}\}}}% mark the reference

Re: Literate Executables

Reply via email to