For a given language or file format OpenGrok needs following streams of 
terms: (or Lucene fields to be exact)

A. Human readable words (Lucene Term):
     For eg. In ASCII files these are words, in an ELF executable file 
these are words in symbol and string tables.

B. Definitions:
     For eg. In C file these are function definitions, variable 
declaration.., In Makefiles these are make targets. Finding definitions 
is mostly done by Ctags, except for few file types like Java Class files.

C. Symbols:
     program symbols aka identifiers, Ignoring comments, string literals.

Apart from these
OpenGrok also needs
1. to generate html cross reference.
2. to identify file extensions and magic numbers (first few characters 
that identify the file type)

Have a look at Analysis section of
http://opensolaris.org/os/project/opengrok/manual/internals/
Which shows how how a file travels through opengrok's analysis section.

For your language you may need to implement A, B or C and 1 and 2, 
depending on how opengrok already handles them. Tasks like A and B are 
common to most languages and opengrok already has analyzers that do it.

For A: if your file is in plain text (ASCII or Unicode), then you don't 
have to write anything extra for this.

For B: if your language is recognized by ctags or if it is easy to add 
regular expressions to ctags config to make it recognize your language 
definitions, then skip this part.

for C: you may have to write your own lexical program that just filters 
program identifiers (and ignores comments, strings etc)

for 1. you may have to write your own lexical program that prints html 
for a given source file.

Best is to copy a closest example and then start from there,
Java or Lisp are good examples:
http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensolaris/opengrok/analysis/lisp/

There are two jflex files, one extracts symbols another generate HTML.
LispAnalyzer.java ties everything together.

I'll turn this email to a "HOW-TO Add support for a new language" 
document on the opengrok site.

-Chandan


Reply via email to