For a given language or file format OpenGrok needs following streams of
terms: (or Lucene fields to be exact)
A. Human readable words (Lucene Term):
For eg. In ASCII files these are words, in an ELF executable file
these are words in symbol and string tables.
B. Definitions:
For eg. In C file these are function definitions, variable
declaration.., In Makefiles these are make targets. Finding definitions
is mostly done by Ctags, except for few file types like Java Class files.
C. Symbols:
program symbols aka identifiers, Ignoring comments, string literals.
Apart from these
OpenGrok also needs
1. to generate html cross reference.
2. to identify file extensions and magic numbers (first few characters
that identify the file type)
Have a look at Analysis section of
http://opensolaris.org/os/project/opengrok/manual/internals/
Which shows how how a file travels through opengrok's analysis section.
For your language you may need to implement A, B or C and 1 and 2,
depending on how opengrok already handles them. Tasks like A and B are
common to most languages and opengrok already has analyzers that do it.
For A: if your file is in plain text (ASCII or Unicode), then you don't
have to write anything extra for this.
For B: if your language is recognized by ctags or if it is easy to add
regular expressions to ctags config to make it recognize your language
definitions, then skip this part.
for C: you may have to write your own lexical program that just filters
program identifiers (and ignores comments, strings etc)
for 1. you may have to write your own lexical program that prints html
for a given source file.
Best is to copy a closest example and then start from there,
Java or Lisp are good examples:
http://src.opensolaris.org/source/xref/opengrok/trunk/src/org/opensolaris/opengrok/analysis/lisp/
There are two jflex files, one extracts symbols another generate HTML.
LispAnalyzer.java ties everything together.
I'll turn this email to a "HOW-TO Add support for a new language"
document on the opengrok site.
-Chandan