fixed in 3.0.0-svn-224 and 2.8.5-svn-5
2011/1/18 Rolf Schumacher <[email protected]> > Yes, sounds great, Harry. > > The function getAttachmentContent(Attachment) is called whenever setupTask > is executed. > > It would be another functionality to feed Lucene just after attachment gets > ready, a good idea. > > What I meant is to make the text conversion dependent on the MIME type of > the attachment instead of the filename extensions, however this is not > really important in the first place. > > I would like to go after this immediately, however, due to overload in > other areas, this will take a while. I will come back asap because > accumulated knowledge is not only in wiki pages but in attachments as well. > > Rolf > > > On 14.01.2011 20:30, Harry Metske wrote: > >> making a filter that processes "non plain text" files like the ones you >> mentioned sounds good. >> If I understand it correctly it should be called when adding an >> attachment, >> it should process the file creating searchable text and hand them off to >> lucene for indexing right ? >> please also consider a unit test for it. >> >> adding a few more file-types for pure text files is a good quick-win, >> starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml >> .gpx .loc >> >> anyone else opinions, suggestions ? >> >> regards, >> Harry >> >> 2011/1/13 Rolf Schumacher<[email protected]> >> >> >> >>> ok, Harry, thank you for the link. >>> >>> My suggestions, please correct: >>> >>> - hard-coding of file types seems to me as not a problem: anything shall >>> be >>> searched >>> - the list is too short, important types such as .doc, .odt, .pdf, .ppt, >>> .odp are missing >>> - am I right here?: If I can provide a filter that makes text out of this >>> files it should not be as tough to add them >>> - we may be better off if we have an attribute with each attachment >>> telling >>> its MIME type as far as detectable at attachment time, that way we are >>> not >>> as much dependent on correct file extentions >>> >>> - a quick suggestion: please add .mm as another xml type. The freemind >>> plugin is of great value. >>> >>> kind regards >>> >>> >>> Rolf >>> >>> >>> >>> On 11.01.2011 18:42, Harry Metske wrote: >>> >>> >>> >>>> Rolf, >>>> >>>> see the source >>>> >>>> >>>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328 >>>> >>>> >>>> as you can see, currently the filetypes are hardcoded to just 4 types. >>>> We could make this a configurable option, patches are welcome. >>>> >>>> You say "comments given to an Attachment", I assume you mean Change >>>> Notes >>>> entered while uploading an attachment (or saving an normal Wiki Page). >>>> That is a bit more work I think. >>>> Being a complete Lucene null, but looking at the code it looks like we >>>> could >>>> add another field (we already index the page author and page name) for >>>> the >>>> Change Note. >>>> >>>> regards, >>>> Harry >>>> >>>> >>>> 2011/1/10 Rolf Schumacher<[email protected]> >>>> >>>> >>>> >>>> >>>> >>>>> I am using JSPWiki 2.8.4 >>>>> >>>>> Is it possible to extend a search to attachments to some mime types, >>>>> e.g. >>>>> pdf? >>>>> >>>>> Is it possible to extend a search to the comments given to an >>>>> attachment? >>>>> >>>>> kind regards >>>>> >>>>> Rolf >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >> >
