[ https://jira.duraspace.org/browse/DS-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=26989#comment-26989 ]
Christophe Dupriez commented on DS-1121: ---------------------------------------- Dear Jessica, DSpace (like many Information Retrieval applications today) use Apache LUCENE and its search equation syntax. The solution for your problem is to preprocess the titles to remove all punctuation signs (parenthesis, colons, quotes and others can cause problems too) and put spaces instead (to be sure to keep every word separated: LUCENE is making word searches). Doing this, you could remove the words "AND", "OR" and "NOT" from the titles. Another strategy would be to place the whole (cleaned) title between quotes. I hope this helps. Complete info: http://lucene.apache.org/core/3_6_1/queryparsersyntax.html > Searches interspersed with a minus (separated by spaces) will exclude the > following term. (AND NOT) > --------------------------------------------------------------------------------------------------- > > Key: DS-1121 > URL: https://jira.duraspace.org/browse/DS-1121 > Project: DSpace > Issue Type: Bug > Components: DSpace API > Affects Versions: 1.8.0, 1.8.1 > Reporter: Peter Dietz > Assignee: Peter Dietz > Labels: has-patch > Fix For: 3.1 > > Attachments: > DS-1121-hacking-DSQuery-to-strip-interspersed-dashses.patch, Screen Shot > 2012-02-08 at 4.30.51 PM.png, Screen Shot 2012-02-08 at 4.31.03 PM.png, > Screen Shot 2012-02-08 at 5.57.54 PM.png > > > I have confirmed that this problem exists... Here's the original report. > -- In an email to dspace-tech, Jessica Lindholm writes: > Dear all, > I would like some advice on Boolean operators and their implications on > search / search results. > > Some publications have a minus sign in titles, e.g. in Chronic intraoral pain > - assessment of diagnostic methods and prognosis > http://hdl.handle.net/2043/12563. > > This gives us a problem when searching for the item. It coincides with '-' > minus being the operator for AND NOT, so we all of a sudden look for stuff > without specific words. A simple free-text search matching the title above > leads to zero hits. And users tend to e.g. copy a title from a reference list > to the search box. > > Has anyone encountered the same problem? We have looked around and haven't > found a solution (stopwords, tokens, queries). > > We are using AND as standard operator for all searches, i.e. we have that > changed in config.xml (from default OR). If we had used OR as default the > problem wouldn't appear, and possibly this opens up for a logical slip in the > system (or by me, which is not unlikely at all). Google uses the same > operator, but handling it differently when surrounded by whitespaces. > > Is this a possible bug encountered? > > We would preferably like to change the Boolean operators to usage of > capitalised AND NOT, AND etc instead of -,+. Believing this would solve the > issue. Is this possible (Java okay)? > > However when matching exact, using quotation marks both the searches: > "Chronic intraoral pain assessment of diagnostic methods and prognosis" > (minus excluded) and "Chronic intraoral pain - assessment of diagnostic > methods and prognosis" works fine. So the minus is treated to some extent. > > Looking forward to understand better, > > Jessica Lindholm > > Ps Malmo university is using DSpace (1.8.*) XMLUI as our institutional > repository. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel