> 1. Extract snippets from the various types of source files: XML, java,
> text
> 

I feel that this is mostly complete, but I'm open to new suggestions.

> 2. Convert these snippets to an XML form that is easily indexable with
> Lucene, generating Lucene "fields" for all important pieces of
> information: snippet key, snippet type, title, etc.
> 

This needs a little work. This represents the "single snippet" page
you had in the refdoc prototype if I'm not mistaken and currently they
don't contain enough information.

> 2b. Also generate "navigation documents" which Lucene will use to find
> all snippets. This is shown in the prototype already.
> 

This seems mostly done, though I wonder if some of the links generated
will work as is for indexing. For example one set of the "a" tags has
the href="[EMAIL PROTECTED]" or something like href="snippet_31". Can the
crawler/indexer sort that out?

> 3. Crawl and index the generated XML documents with Lucene, at first
> using the Lucene block out of the box, I assume. Some manual work (like
> starting the index creation from an URL) is ok at this stage, we're
> trying to demonstrate the full chain before implementing everyting.
> 

In the works. I might write some Java code for indexing and searching
soon, but I'll keep it skeletal until I feel good about it.

> 4. Create the required Lucene queries to put together snippets coming
> from different source files but having the same key (e.g. all
> "FileGenerator" snippets). I might need to add @doktor stuff to
> existing code and samples so that you can see better how this should
> work.
> 

Future work.

> 5. Transform the results of these queries to XML document in a
> publication-neutral format, where one document contains all the info
> and code excerpts provided by snippets having the same key.

Should we also retain the ability for a user-based query that could
dynamically publish a document on their query?

That sounds about like what I have in my notes. Thanks for walking
through it. I came to many of those conclusions in woring through the
prototype, but on some of them the precision was nebulous. I was also
more stuck on where to go from the TODOs at the time, but found a
direction to keep moving in.

Thanks,
Robert

Reply via email to