Great, I'll add some of this to the proposal, and keep some of this just saved off.. (also great ideas but things like *additional filters/crawlers* I want to enable but not necessarily include every possible combination).
Be assured DOC and XLS filters are foremost on my mind ;-) (I started this while trying to figure out how to hook POI to Lucene). On Thu, 2002-02-07 at 15:03, Mark Tucker wrote: > I like what you included in your proposal and suggest doing all that (over time) and >taking the following into consideration: > > Indexers/Crawlers > > General Settings > SleeptimeBetweenCalls - can be used to avoid flooding a machine with >too many requests > IndexerTimeout - kill this crawler thread after long period of >inactivity > IncludeFilter - include only items matching filter > ExcludeFilter - exclude items matching filter (can be used with >IncludeFilter) > MaxItems - stops indexing after x items > MaxMegs - stops indexing after x MB of data > > File System Indexer > URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ > > Web Indexer > HTTPUser > HTTPPassword > HTTPUserAgent > ProxyServer > ProxyUser > ProxyPassword > HTTPSCertificate > HTTPSPrivateKey > > Other Possible Indexers > Microsoft Exchange 5.5/2000 > Lotus Notes > Newsgroup (NNTP) > Documentum > ODBC/OLEDB > XML - index single XML that represents multiple documents > > > Document Factory > General > The minimum properties for each document should be: > URL > Title > Abstract > Full Text > Score > > HTML > Support for META tags including Dublic Core syntax > > Other Possible Document Factories > Office Docs - DOC, XLS, PPT > PDF > > > Thanks for the great proposal. > > Mark Tucker > > > -----Original Message----- > From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > Sent: Thursday, February 07, 2002 5:35 AM > To: Lucene Developers List > Subject: Proposal for Lucene > > > Hi All, > > This is just a few thoughts about Lucene. Please send me your feedback, > critiques and thought. > > If you folks would take a look: > > http://www.trilug.org/~acoliver/luceneplan.html > > if you'd like to submit patches: > > http://www.trilug.org/~acoliver/luceneplan.xml > > Once I've gotten feedback from the developer community I'll send this to > the user community as well. > > Thanks, > > Andy > -- > www.superlinksoftware.com > www.sourceforge.net/projects/poi - port of Excel format to java > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > - fix java generics! > > > The avalanche has already started. It is too late for the pebbles to > vote. > -Ambassador Kosh > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>