> 1. Extract snippets from the various types of source files: XML, java, > text >
I feel that this is mostly complete, but I'm open to new suggestions. > 2. Convert these snippets to an XML form that is easily indexable with > Lucene, generating Lucene "fields" for all important pieces of > information: snippet key, snippet type, title, etc. > This needs a little work. This represents the "single snippet" page you had in the refdoc prototype if I'm not mistaken and currently they don't contain enough information. > 2b. Also generate "navigation documents" which Lucene will use to find > all snippets. This is shown in the prototype already. > This seems mostly done, though I wonder if some of the links generated will work as is for indexing. For example one set of the "a" tags has the href="[EMAIL PROTECTED]" or something like href="snippet_31". Can the crawler/indexer sort that out? > 3. Crawl and index the generated XML documents with Lucene, at first > using the Lucene block out of the box, I assume. Some manual work (like > starting the index creation from an URL) is ok at this stage, we're > trying to demonstrate the full chain before implementing everyting. > In the works. I might write some Java code for indexing and searching soon, but I'll keep it skeletal until I feel good about it. > 4. Create the required Lucene queries to put together snippets coming > from different source files but having the same key (e.g. all > "FileGenerator" snippets). I might need to add @doktor stuff to > existing code and samples so that you can see better how this should > work. > Future work. > 5. Transform the results of these queries to XML document in a > publication-neutral format, where one document contains all the info > and code excerpts provided by snippets having the same key. Should we also retain the ability for a user-based query that could dynamically publish a document on their query? That sounds about like what I have in my notes. Thanks for walking through it. I came to many of those conclusions in woring through the prototype, but on some of them the precision was nebulous. I was also more stuck on where to go from the TODOs at the time, but found a direction to keep moving in. Thanks, Robert