Hi all,

Andrew already forwarded one of my mails on the list, so you might know what I am looking for by now. Maybe some more words as clarifications:

What we are doing is writing a personal document management tool based on Lucene and our visualization techniques. Actually I should say: what we have done, the only problem is that indexing is still a big hack. The plan we made to do it right was pretty much what Andrew described in his website and by now I have found the LARM descriptions here and there. In a way this framework is bigger than we are aiming for (we care only about scenario 1.1 - File System Indexer in term of the LARM documentation), but we would be happy to try to collaborate in the effort.

Here is the scenario: we are two experienced Java developers trying to get our demonstrator up and going in about a week. The query frontend is good enough by now, just the indexer is crusty. We want a notion of file filtering and were thinking along the lines of mapping java.io.FileFilters onto some generic document indexer interface. The UI should offer some means of creating a list of these mappings, where first hit wins, probably with some notion of bouncing: if the file filter says to try an indexer, the indexer should still be able to throw an exception causing the mappings down the list to be tried. We haven't decided yet if we want to push or pull the information indexed (i.e. if the indexers write themself or if the management code asks them for some defaults and extras stored in Properties). We want implementations of this interface for at least: HTML, DOC, PDF, TXT; others would that would be good are: XLS, PPT, PS(.GZ), XML (incl. RDF, SVG), TeX, SX* (the OOo files). Another cool feature would be quering external meta-data sources.

The result will be open sourced (BSD-style, as part of http://www.tockit.org). If there is interest in collaboration we will be happy to contribute the indexing parts directly into some Lucene repository. Most likely we will not spend much more time than next week on the project, since it is only a demonstrator for us. But we are happy to try to make parts of our code more reusable for other people -- in the hope that we might be able to use whatever your LARM turns into in case we get back to it one day. If you have concrete ideas please tell us, so we can adjust our designs.

For those of you who are curious by now (I hope you don't mind the plug): there are cvsbuilds available which should run on any JRE 1.4+ installation. Grab the "Docco...." file from http://www.itee.uq.edu.au/~pbecker/ToscanaJ/cvsbuilds and feel free to send me any complaints if you don't like it :-)

Regards,
  Peter Becker


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to