Got it to work! There were some restrictions in the DASL query that were excluding the PDF result to come out.
Thanks a lot! mau On Wed, Dec 16, 2009 at 11:26 AM, Jasha Joachimsthal < [email protected]> wrote: > 2009/12/16 Maurizio Pillitu <[email protected]>: > > Thanks guys, > > I know I was missing some bits of the big picture :) > > > > So here's the next question: when I perform a DASL query, I normally > > *select* some properties *from* some repository location (path) *where* a > > certain property matches one or more conditions; if I don't have a > property > > to match, how can I define the *where* condition? > > > > sounds like a very stupid question .... sorry for that. > > There are no stupid questions! > For fulltext search, you can do <d:contains>mySearchWord</d:contains> > If you really need properties, you can let the user set them in the > assets perspective. See [1] > > [1] > http://wiki.onehippo.com/display/CMS/WebDAV+properties+used+by+Hippo+CMS > > > Thx again > > > > mau > > > > On Wed, Dec 16, 2009 at 11:01 AM, Jeroen Reijn <[email protected]> > wrote: > > > >> Hi Maurizio, > >> > >> as far as I know the pdf extractor as you have you configured now > extracts > >> all content to the lucene index only and makes sure that the text can be > >> found and mapped to the pdf document. I don't think Slide has a > repository > >> extractor that can extract the information and store it as a property. > >> > >> Regards, > >> > >> Jeroen > >> > >> Maurizio Pillitu wrote: > >> > >>> Hi everyone, > >>> I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); > I've > >>> added to my (default) extractors.xml the following: > >>> > >>> .... > >>> <extractor classname="org.apache.slide.extractor.PDFExtractor" > >>> uri="/files/default.preview/binaries" content-type="application/pdf"/> > >>> ..... > >>> > >>> then I dropped a Google Docs generated PDF file (attached) in > >>> /files/default.preview/binaries (via WebDAV); I see the repository > logging > >>> some interesting bits (attached) as if the extraction process went > fine, > >>> but > >>> I can't see the extracted data; I'd have expected a WebDAV property > >>> attached > >>> to the file, but nothing shows up; this is the list of properties > related > >>> with the PDF file (using DAVExplorer) > >>> > >>> getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT > >>> displayname DAV: this_is_my_title.pdf > >>> modificationdate DAV: 2009-12-16T09:38:35Z > >>> UID DAV: 96da71317f000001004b0bbb796bcb32 > >>> supportedlock DAV: > >>> getcontenttype DAV: application/pdf > >>> getcontentlength DAV: 5078 > >>> resourcetype DAV: > >>> getcontentlanguage DAV: en > >>> getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b > >>> lockdiscovery DAV: > >>> source DAV: > >>> creationdate DAV: 2009-12-16T09:38:35Z > >>> > >>> > >>> I feel like I'm missing something on how the PDFExtractor works; I've > >>> looked > >>> for some documentation or specific configurations, but I couldn't find > >>> anything interesting. > >>> > >>> Any hints? > >>> TIA > >>> mau > >>> > >>> Met vriendelijke groet, > >>> > >>> > >>> > ------------------------------------------------------------------------ > >>> > >>> > >>> ******************************************** > >>> Hippocms-dev: Hippo CMS development public mailinglist > >>> > >>> Searchable archives can be found at: > >>> MarkMail: http://hippocms-dev.markmail.org > >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > >>> > >>> ******************************************** > >> Hippocms-dev: Hippo CMS development public mailinglist > >> > >> Searchable archives can be found at: > >> MarkMail: http://hippocms-dev.markmail.org > >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > >> > >> > > > > > > -- > > > > Met vriendelijke groet, > > -- > > Maurizio Pillitu - 0031 (0)615655668 > > Opensource Software Engineer > > Scrum Certified Master - http://www.scrumalliance.org > > Sourcesense - making sense of Open Source: http://www.sourcesense.com > > ******************************************** > > Hippocms-dev: Hippo CMS development public mailinglist > > > > Searchable archives can be found at: > > MarkMail: http://hippocms-dev.markmail.org > > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > > > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > -- Met vriendelijke groet, -- Maurizio Pillitu - 0031 (0)615655668 Opensource Software Engineer Scrum Certified Master - http://www.scrumalliance.org Sourcesense - making sense of Open Source: http://www.sourcesense.com ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
