Hello Jun, On Fri, Feb 20, 2009 at 5:21 PM, <[email protected]> wrote: > Hi Ard, > To be specific, what I am trying to do is to enhance the search for end > users. For example, they can go into the Search tab and search a > document by a lucene-like query: "myelement:myvalue". We can index only > some of the in a document (ie, the "searchable fields"), but it will be > great if we could index all elements in any document
Aaaah, so i understand correctly that you want every seperate text containing xml element to be indexed in a seperate lucene field...hmmm, i have to think about it. Sounds ok to me though...might be little performance hit during indexing...Only problem is that you cannot search on the fields though, because, searching in properties can only be done when they are configured to be 'text' indexed, but, on beforehand, you do not know which properties you'll get. Also, I would have to dive into the code to see whether it is achievable. Starting point would be to see if in the XMLContentExtractor you can hook a lucene Doc into the contentHandler, such that at endElement(...) you could add a lucene field if it was a text node. Though, not trivial.... > > >From your previous email, it seems that the solution you are proposing > is to extract elements of the documents into properties, and then search > by properties. Sounds like this approach would work if we limit the > searchable fields into a smaller scale. Did I understand it correctly? Exactly, basically, we know on beforehand from the spec what we have to build, which queries we have to support, and thus which properties to index... Regards Ard > > Thanks a lot! > Jun > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Ard > Schrijvers > Sent: Friday, February 20, 2009 12:00 AM > To: Hippo CMS development public mailinglist > Subject: Re: [HippoCMS-dev] Strict property contains > > Hello Jun, > > On Fri, Feb 20, 2009 at 12:05 AM, <[email protected]> wrote: >> Hi Ard, >> This email thread interested me because I have been thinking about >> customizing the search so I can do search by fields (eg, search for >> value abc in tag <myelement/>). To do that, it seems that I have to >> write an analyzer to index the documents differently. (It's only a > very >> preliminary thought, so please point it out if that doesn't seem >> realistic.) > > You might better explain me what exactly you want, because this sounds > like it is already there: you set some extractor that has an xpath > into documents (this can be documents below some path in the repo > only, or all ). In your case, you could extract <myelement> into the > property 'myelement'. > > Then, if you do nothing in the indeder, you can use d:eq and such in a > dasl (see wiki for dasl), and if you want to search in the property > for text, you can use s:propsearch (to only search within this > element). > > If you index it as text, you use s:property-containes, and cannot use > the prop to search for d:eq (equivalence) or to sort on it > > >> So after reading your email, it seems that I write my own analyzer as > a >> hippo extension, and then configure it in indexer.xml. Is that anyway >> that I can scope the indexer? Say, limit an indexer to a certain >> directory? Or does it have to be applied to all documents in the > > Not during indexing, but this would be done during extracting: With an > extractor you define which xml part to extract in a property, with an > analyzer you can define *how* to analyze a property. > > Take a look in the wiki for extractors and analyzers, is pretty much > explained over there > > Ard > >> directory? If I switch to my own indexer, what should I pay attention > to >> so it doesn't break existing hippo functionalities dependant on how > docs >> are indexed? >> >> Thanks >> Jun >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Ard >> Schrijvers >> Sent: Thursday, February 19, 2009 4:27 AM >> To: Hippo CMS development public mailinglist >> Subject: Re: [HippoCMS-dev] Strict property contains >> >> Hello Wilson, >> >> You on the right way, only have to add to the indexer.xml this: >> >> <property namespace="http://hippo.nl/cms/1.0" name="taxonomy" >> type="text" >> >> > analyzer="nl.hippo.slide.index.analysis.LowercaseCommaSeparatedAnalyzer" >> /> >> >> The type="text" makes sure you can use strict-property-contains for >> this prop. Also see [1] >> >> If you want it to work for existing documents, make sure you delete >> the index before restarting the repo, this will re-index all. >> >> Regards Ard >> >> [1] >> > http://wiki.hippo.nl/display/CMSSNDBX/06.+Using+DASL+Queries#06.UsingDAS >> LQueries-%3CS%3A(not)strictpropertycontains%2F%3E(1.2.8andhigherONLY) >> >> >> On Thu, Feb 19, 2009 at 1:00 PM, Wilson de Paula Pedro Junior >> <[email protected]> wrote: >>> Hi, >>> >>> Can someone tell me what is wrong? >>> >>> I have this dasl query: >>> >>> <d:searchrequest xmlns:dav="DAV:" xmlns:d="DAV:" >>> xmlns:s="http://jakarta.apache.org/slide/" >>> xmlns:h="http://hippo.nl/cms/1.0"> >>> <d:basicsearch> >>> <d:select> >>> <d:prop> >>> <s:nrHits /> >>> <h:publicatiedatum /> >>> <h:taxonomie /> >>> </d:prop> >>> </d:select> >>> <d:from> >>> <d:scope> >>> <d:href>content/nieuws</d:href> >>> <d:depth>infinity</d:depth> >>> </d:scope> >>> </d:from> >>> <d:where> >>> <d:and> >>> <d:not-is-collection /> >>> <s:strict-property-contains> >>> <d:prop> >>> <h:taxonomie /> >>> </d:prop> >>> <d:literal>x/y/z</d:literal> >>> </s:strict-property-contains> >>> </d:and> >>> </d:where> >>> <d:orderby> >>> <d:order> >>> <d:prop> >>> <h:publicatiedatum /> >>> </d:prop> >>> <d:descending /> >>> </d:order> >>> </d:orderby> >>> <d:limit> >>> <d:nresults>10</d:nresults> >>> <s:offset>0</s:offset> >>> </d:limit> >>> </d:basicsearch> >>> </d:searchrequest> >>> >>> >>> I want to compare the given literal (x/y/z) to the property > taxonomie. >> This >>> property can have comma-separeted strings. >>> In extractors.xml I used: >>> >>> <extractor >>> >> > classname="nl.hippo.slide.extractor.HippoMultiValueXMLPropertyExtractor" >>> uri="/files/default.preview/content/nieuws" content-type="text/xml"> >>> <configuration> >>> <instruction property="taxonomie" namespace=" >>> http://hippo.nl/cms/1.0" xpath="/root/taxonomie/text()"/> >>> </configuration> >>> </extractor> >>> >>> I get an error trying to run this dasl. >>> In webdav-search tool I get the following error message: >>> >>> >>> <?xml version="1.0" encoding="UTF-8"?><D:multistatus xmlns:D="DAV:"> >>> <D:response> >>> <D:href>/default</D:href> >>> <D:status>HTTP/1.1 400 Bad Request</D:status> >>> <D:responsedescription>Factory: Uncomparable expression >>> 'strict-property-contains' for property >> 'taxonomie'.</D:responsedescription> >>> </D:response> >>> </D:multistatus> >>> >>> >>> >>> >>> Thanks in advance! >>> >>> Wilson >>> ******************************************** >>> Hippocms-dev: Hippo CMS development public mailinglist >>> >>> Searchable archives can be found at: >>> MarkMail: http://hippocms-dev.markmail.org >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>> >>> >> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> >> ******************************************** >> Hippocms-dev: Hippo CMS development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
