On Wed, 2007-10-17 at 21:15 -0700, Xah Lee wrote: > Elisp Tutorial: HTML Syntax Coloring Code Block > > Xah Lee, 2007-10 > > This page shows a example of writing a emacs lisp function that > process a block of text to syntax color it by HTML tags. If you don't > know elisp, first take a gander at Emacs Lisp Basics. > > HTML version with color and links is at: > http://xahlee.org/emacs/elisp_htmlize.html > > --------------------------------------- > THE PROBLEM > > SUMMARY > > I want to write a elisp function, such that when invoked, the block of > text the cursor is on, will have various HTML style tags wrapped > around them. This is for the purpose of publishing programing language > code in HTML on the web. > > DETAIL > > I write a lot computer programing tutorials for several computer > languages. For example: Perl and Python tutorial, Java tutorial, Emacs > Lisp tutorial, Javascript tutorial. In these tutorials, often there > are code snippets. These code need to be syntax colored in HTML. > > For example, here's a elisp code snippet: > > (if (< 3 2) (message "yes") ) > > Here's what i actually want as raw HTML: > > (<span class="keyword">if</span> (< 3 2) (message <span > class="string">"yes"</span>) ) > > Which should looks like this in a web browser: > > (if (< 3 2) (message "yes") ) > > There is a emacs package that turns a syntax-colored text in emacs to > HTML form. This is extremely nice. The package is called htmlize.el > and is written (1997,...,2006) by Hrvoje Niksic, available at > http://fly.srk.fer.hr/~hniksic/emacs/htmlize.el. > > This program provides you with a few new emacs commands. Primarily, it > has htmlize-region, htmlize-buffer, htmlize-file. The region and > buffer commands will output HTML code in a new buffer, and the htmlize- > file version will take a input file name and output into a file. > > When i need to include a code snippet in my tutorial, typically, i > write the code in a separate file (e.g. “temp.java”, “temp.py”), run > it to make sure the code is correct (compile, if necessary), then, > copy the file into the HTML tutorial page, inside a «pre» block. In > this scheme, the best way for me to utilize htmlize.el program is to > use the “html-buffer” command on my temp.java, then copy the htmlized > output and paste that into my HTML tutorial file inside a «pre» block. > Since many of my tutorials are written haphazardly over the years > before seeing the need for syntax coloration, most exist inside «pre» > tags already without a temp code file. So, in most cases, what i do is > to select the text inside the «pre» tag, paste into a temp buffer and > invoke the right mode for the language (so the text will be fontified > correctly), then do htmlize-buffer, then copy the html output, then > paste back to replace the selected text. > > This process is tedious. A tutorial page will have several code > blocks. For each, i will need to select text, create a buffer, switch > mode, do htmlize, select again, switch buffer, then paste. Many of the > steps are not pure push-buttons operations but involves eye-balling. > There are few hundred such pages. > > It would be better, if i can place the cursor on a code block in a > existing HTML page, then press a button, and have emacs magically > replace the code block with htmlized version colorized for the code > block's language. We proceed to write this function. > > --------------------------------------- > SOLUTION > > For a elisp expert who knows how fontification works in emacs, the > solution would be writing a elisp code that maps emacs's string's > fontification info into html tags. This is what htmlize.el do exactly. > Since it is already written, a elisp expert might find the essential > code in htmlize.el. (the code is licensed under GPL) . > > Unfortunately, my lisp experience isn't so great. I spent maybe 30 > minutes tried to look in htmlize.html in hope to find a function > something like htmlize-str that is the essence, but wasn't successful. > I figured, it is actually faster if i took the dumb and inefficient > approach, by writing a elisp code that extracts the output from > htmlize-buffer. Here's the outline of the plan of my function: > > * 1. Grab the text inside a <pre class="«lang»">...</pre> tag. > * 2. Create a new buffer. Paste the code in. > * 3. Make the new buffer «lang» mode (and fontify it) > * 4. Call htmlize-buffer > * 5. Grab the (htmlized) text inside «pre» tag in the htmlize > created output buffer. > * 6. Kill the htmlize buffer and my temp buffer. > * 7. Delete the original text, paste in the new text. > > To achieve the above, i decided on 2 steps. A: Write a function > “htmlize-string” that takes a string and mode name, and returns the > htmlized string. B: Write a function “htmlize-block” that does the > steps of grabbing text and pasting, and calls “htmlize-string” for the > actual htmlization. > > Here's the code of my htmlize-string function: > > (defun htmlize-string (ccode mn) > "Take string ccode and return htmlized code, using mode mn.\n > This function requries the htmlize-mode.el by Hrvoje Niksic, 2006" > (let (cur-buf temp-buf temp-buf2 x1 x2 resultS) > (setq cur-buf (buffer-name)) > (setq temp-buf "xout-weewee") > (setq temp-buf2 "*html*") ;; the buffer that htmlize-buffer > creates > > ; put the code in a new buffer, set the mode > (switch-to-buffer temp-buf) > (insert ccode) > (funcall (intern mn)) > > (htmlize-buffer temp-buf) > (kill-buffer temp-buf) > (switch-to-buffer temp-buf2) > > ; extract the core code > (setq x1 (re-search-forward "<pre>")) > (setq x1 (+ x1 1)) > (re-search-forward "</pre>") > (setq x2 (re-search-backward "</pre>")) > (setq resultS (buffer-substring-no-properties x1 x2)) > (kill-buffer temp-buf2) > > (switch-to-buffer cur-buf) > resultS > ) > ) > > The major part in this code is knowing how to create, switch, kill > buffers. Then, how to set a mode. Lastly, how to grab text in a > buffer. > > Current buffer is given by “buffer-name”. To create or switch buffer > is done by “switch-to-buffer”. Kill buffer is “kill-buffer”. To > activate a mode, the code is “(funcall (intern my-mode-name))”. I > don't know why this is so in detail, but it is interesting to know. > > The grabbing text is done by locating the desired beginning and ending > locations using re-search functions, and buffer-substring-no- > properties for actually extracting the string. > > Here, note the “no-properties” in “buffer-substring-no-properties”. > Emacs's string can contain information called properties, which is > essentially the fontification information. > > Reference: Elisp Manual: Buffers. > > Reference: Elisp Manual: Text-Properties. > > Here's the code of my htmlize-block function: > > (defun htmlize-block () > "Replace the region enclosed by <pre> tag to htmlized code. > For example, if the cursor somewhere inside the tag: > > <pre cla ss=\"code\"> > codeXYZ... > </pre> > > after calling, the “codeXYZ...” block of text will be htmlized. > That is, wrapped with many <span> tags. > > The opening tag must be of the form <pre cla ss=\"lang-str\">. > The “lang-str” determines what emacs mode is used to colorize > the code. > This function requires htmlize.el by Hrvoje Niksic." > > (interactive) > (let (mycode tag-begin styclass code-begin code-end tag-end mymode) > (progn > (setq tag-begin (re-search-backward "<pre class=\"\\([A-z-]+\\) > \"")) > (setq styclass (match-string 1)) > (setq code-begin (re-search-forward ">")) > (re-search-forward "</pre>") > (setq code-end (re-search-backward "<")) > (setq tag-end (re-search-forward "</pre>")) > (setq mycode (buffer-substring-no-properties code-begin code-end)) > ) > (cond > ((equal styclass "elisp") (setq mymode "emacs-lisp-mode")) > ((equal styclass "perl") (setq mymode "cperl-mode")) > ((equal styclass "python") (setq mymode "python-mode")) > ((equal styclass "java") (setq mymode "java-mode")) > ((equal styclass "html") (setq mymode "html-mode")) > ((equal styclass "haskell") (setq mymode "haskell-mode")) > ) > (save-excursion > (delete-region code-begin code-end) > (goto-char code-begin) > (insert (htmlize-string mycode mymode)) > ) > ) > ) > > The steps of this function is to grab the text inside a «pre» block, > call htmlize-string, then insert the result replacing text. > > Originally, i wrote the code to grab text by inside plain “<pre>...</ > pre>” tags, then use some heuristics to determine what language it is, > then call htmlize-string with the mode-name passed to it. However, > since my html pages already has the language information in the form > of “<pre class="«lang»">...</pre>” (for CSS reasons), so, now i search > text by that form, and use the “lang” part to determine a mode. > > Emacs is beautiful. > > Postscript: > > The story given above is slightly simplified. For example, when i > began my language notes and commentaries, they were not planned to be > some systematic or sizable tutorial. As the pages grew, more quality > are added in editorial process. So, a plain un-colored code inside > «pre» started to have “language comment” strings colorized (e.g. > “<span class="cmt">#...</span>), by using a simple elisp code that > wraps a tag on them, and this function is mapped to shortcut key for > easy execution. As pages and languages grew, i find colorizing comment > isn't enough, then i started to look for a syntax-coloring html > solution. There are solutions in Perl, Python, PHP, but I find emacs > solution best suites my needs in particular because it integrates with > emacs's interactive nature, and my writing work is done in a > accumulative, editorial process. > > In the beginning i used htmlize-region and htmlize-buffer as they are > for new code. Note that this is still a laborious process. Gradually i > need to colorized my old code. The problem is that many already > contain my own «span class="cmt"» tags, and strings common in computer > languages such as “<=” have already been transformed into required > html encoding “<=”. So, the elisp code will first “un-htmlize” > these in my htmlize-block code. But once all my existing code has been > so newly colorized, the part of code to transform strings for un- > htmlize is no longer necessary, so they are taken out in htmlize-block > and resumes a cleaner state. Also, htmlize-block went thru many > revisions over the year. Sometimes in recent past, i had one code > wrapper for each language. For example, i had htmlize-me-perl, htmlize- > me-python, htmlize-me-java, etc. The need for unification into a > single coherent wrapper code didn't materialize. In general, it is my > experience, in particular in writing elisp customization for emacs, > that tweaking code periodically thru the year is practical, because it > adapts to the constant changes of requirements, environment, work > process. For example, eventually i might write my own htmlize.el, if i > happen to need more flexibility, or if my elisp experience > sufficiently makes the job relatively easy. > > Also note: a whole-sale solution is to write a program, in say, > Python, that process html files and replace proper sections by > htmlized string. This is perhaps more efficient if all the existing > html files are in some uniform format. However, i need to work on my > tutorials on a case-by-case basis. In part, because, some pages > contain multiple languages or contains pseudo-code that i do not wish > colorized. (For example, some pages contains codes of the Mathematica↗ > language. Mathematica code is normally done in Mathematica's > mathematical typesetting capable “front-end” IDE called “Notebook” and > is not “syntax-colored” as such.)
+1 ;; BTW, what is G2/1.0? Is that Emacs-like editor? -- Byung-Hee HWANG <[EMAIL PROTECTED]> InZealBomb, Kyungpook National University, KOREA "Godfather, Godfather, save me from death, I beg of you." -- Genco Abbandando, "Chapter 1", page 46 -- http://mail.python.org/mailman/listinfo/python-list