On Feb 27, 2007, at 6:24 PM, David Balmain wrote:

How about just using doxygen. I don't have much experience with it but
I'm pretty sure there would be a way to tag particular functions that
are public so that when you generate the documentation you can
generate only the public methods.

I don't know it well either, but I'm sure you're right and it will allow us to put in a public/non-public tag.

It would be even better if we could export at least some of the documentation -- particularly method descriptions. I'd really like to be able to synch up the Perl binding docs by running a script rather than via copy-and-paste.

Of course you could also have public and private include files.

Hmm, can you elaborate? I'd basically given up hope that we'd be able to maintain tight control over symbol export, and was expecting to define the API via documentation only.

I'm thinking we need shared
documentation.  XML, maybe?  Then each binding would require an
appropriate XML-to-whatever translation utility.

I'm not entirely sure I'm on the same wavelength as you today. By
'whatever' do you mean the specific languages documentation format?

Yes, that was what I was thinking. But perhaps not quite so ambitious as may have come across.

If
that is the case then I don't see this working as the ruby API for
Lucy will probably be quite different to the PHP API.

If we're reasonably careful about how we word things, many method descriptions could be reused across all bindings. And one of the things about the naming convention we've settled on for method invocations is that you can derive either lowerCamelCase or separated_by_underscores method names with a simple transform:

   Sim_Length_Norm => lengthNorm
   Sim_Length_Norm => length_norm

If we tag every last thing, enough so that we could actually generate, say, both POD and javadoc without intervention, then sure, XML is wayyyy too verbose. Anything would be, really, because language syntaxes are too distinct. But if we set our sights a little lower, and just try to share method names, method descriptions, and public/non-public access control, that's doable -- and it's a whole lot of savings. (Maybe parameter lists and return values, too, but that's a little harder.)

  <method>
    <name>Sim_Length_Norm</name>
    <acl>public</acl>
    <description>
Computes the normalization value for a field given the total number of terms contained in a field. These values, together with field boosts, are stored in an index and multipled into scores for hits on each field
      by the search code.

Matches in longer fields are less precise, so implementations of this method usually return smaller values when numTokens is large, and larger
      values when numTokens is small.

That these values are computed under IxWriter_Add_Document and stored
      then using Sim_Encode_Norm. Thus they have limited precision, and
      documents must be re-indexed if this method is altered.
    </description>
  </method>

Note the use of "IxWriter_Add_Document" and "Sim_Encode_Norm" within the description. Those method names are identifiable patterns, matchable with this regex:

  # $1 is class nick, $2 is short method name
  /([A-Z][A-Za-z]+)_([A-Z]\w+)/

It's easy to sub out IxWriter_Add_Document for this, which will generate a nicely formatted link...

   L<IndexWriter::add_document|Lucy::Index::IndexWriter/"add_document">

Now, returning to your point about Doxygen... With XML, we'd have to maintain separate files for the documentation, which would suck. So I'm all for using Doxygen, especially if we can rig things up so that the description can be isolated and parsed out reliably.

I might go write an extractor tool which parses our header files and generates intermediate XML. Then bindings authors could write their own final translation utilities in their language of choice, and use as much or as little as they wish.

Hopefully they'd use more rather than less. It's to the user's benefit for various bindings to present reasonably consistent APIs while still being idiomatic, because it makes it easier to apply what you learned about one of them to another.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


Reply via email to