Great, I'll add some of this to the proposal, and keep some of this just
saved off..  (also great ideas but things like *additional
filters/crawlers* I want to enable but not necessarily include every
possible combination). 

Be assured DOC and XLS filters are foremost on my mind ;-) (I started
this while trying to figure out how to hook POI to Lucene).


On Thu, 2002-02-07 at 15:03, Mark Tucker wrote:
> I like what you included in your proposal and suggest doing all that (over time) and 
>taking the following into consideration:
> 
> Indexers/Crawlers
> 
>       General Settings
>               SleeptimeBetweenCalls - can be used to avoid flooding a machine with 
>too many requests
>               IndexerTimeout - kill this crawler thread after long period of 
>inactivity
>               IncludeFilter - include only items matching filter
>               ExcludeFilter - exclude items matching filter (can be used with 
>IncludeFilter)
>               MaxItems - stops indexing after x items
>               MaxMegs - stops indexing after x MB of data
> 
>       File System Indexer
>               URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/
>               
>       Web Indexer
>               HTTPUser
>               HTTPPassword
>               HTTPUserAgent
>               ProxyServer
>               ProxyUser
>               ProxyPassword
>               HTTPSCertificate
>               HTTPSPrivateKey
> 
>       Other Possible Indexers
>               Microsoft Exchange 5.5/2000
>               Lotus Notes
>               Newsgroup (NNTP)
>               Documentum
>               ODBC/OLEDB
>               XML - index single XML that represents multiple documents
> 
> 
> Document Factory              
>       General
>               The minimum properties for each document should be:
>                       URL
>                       Title
>                       Abstract
>                       Full Text
>                       Score
> 
>       HTML
>               Support for META tags including Dublic Core syntax
> 
>       Other Possible Document Factories
>               Office Docs - DOC, XLS, PPT
>               PDF
>               
> 
> Thanks for the great proposal.
> 
> Mark Tucker
>                       
> 
> -----Original Message-----
> From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, February 07, 2002 5:35 AM
> To: Lucene Developers List
> Subject: Proposal for Lucene
> 
> 
> Hi All,
> 
> This is just a few thoughts about Lucene.  Please send me your feedback,
> critiques and thought.
> 
> If you folks would take a look:
> 
> http://www.trilug.org/~acoliver/luceneplan.html
> 
> if you'd like to submit patches:
> 
> http://www.trilug.org/~acoliver/luceneplan.xml
> 
> Once I've gotten feedback from the developer community I'll send this to
> the user community as well.
> 
> Thanks,
> 
> Andy
> -- 
> www.superlinksoftware.com
> www.sourceforge.net/projects/poi - port of Excel format to java
> http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
>                       - fix java generics!
> 
> 
> The avalanche has already started. It is too late for the pebbles to
> vote.
> -Ambassador Kosh
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
                        - fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to