It is the need of the hour that stemming should be made configurable in the
next version of DSpace. There are number of application such as I am working
on judgment of  court where search on exact word is very much needed. I'll
make stemming off and I would test the results. I fully agree with Richard
and it should be configurable.

thanks & regards

Surinder Kumar Gaba
Senior Technical Director
Digital Archiving & Management Group
National Informatics Centre
A-Block, CGO Complex, Lodhi Road
New Delhi-110 003
Tel : 011-24362359 (o)
      : 011-27865224 (r)



>
>
> Message: 1
> Date: Wed, 02 Feb 2011 13:36:24 -0500
> From: "Schumacher, John" <john.schumac...@suny.edu>
> Subject: [Dspace-tech] re Porter Stem Filter
> To: dspace-tech@lists.sourceforge.net
> Message-ID:
>        <1f27f05c7e88284b9f68c5831f3a372c06596...@scenm2.sysadmin.suny.edu>
> Content-Type: text/plain; charset=us-ascii
>
> Hello.
>
> Opinions on this were requested.
>
> I agree completely with Richard Jizba.
>
> John
>
> John Schumacher
> Office of Library and Information Services
> SUNY System Administration
> SUNY Plaza
> Albany, NY 12246
> 518-320-1477 (Note, new number!)
> 518-320-1554 (fax)
> john.schumac...@suny.edu
> SUNY Digital Repository
> http://dspace.sunyconnect.suny.edu/
>
>
> ==== Philosophical Discussion ====
>
> I am little surprised that the DSpace community thinks stemming like
> that done by the Porter Stemming Algorithm is so important. I have been
> searching bibliographic databases since the early 1980s and teach
> courses to our health sciences students on search techniques. We have
> always appreciated the systems that give us the power to find exactly
> the terms and the combinations we want. Language is just too rich and
> varied for any other approach in my experience. There have been many
> times when I have needed to search for a singular form of a noun vs a
> plural form or vice versa. Using truncation and wildcard operators is
> not rocket science. Lucene has some really powerful search operators,
> but their power is basically nullified by the Stemming operation.
>
> Our DSpace instance isn't aimed primarily at a broad worldwide user
> base, but select groups of students, staff and faculty with rather
> sophisticated information needs. Besides, most of our collection can
> also be discovered through Google. Why duplicate that, when I have the
> option of also creating an alternative search environment that provides
> for sophisticated, analytical searches of scholarly, curricular and
> administrative documents?
>
> You might be surprised at how quickly the people in our Office of
> Medical Education have picked up on the nuances of how and where they
> put metadata, the need for standardized vocabulary in defining lecture
> objectives, and how quickly they figured out what was happening to their
> attempts to search for "wellness" (stemmed to "well"). (It did not
> surprise me!)
>
> I think the distributed community administration available with DSpace
> will really help our faculty and staff  take seriously the data (text)
> they put into their collections. Our expertise as "consultants" and
> trainers to the staff in the Office of Medical Education has really made
> them appreciate the expertise of librarians, particularly my reference
> librarians who have very good analytical search skills. Don't sell
> people short -- they can be very sophisticated which means we need to
> provide them with powerful tools, not heavy-handed interventions (the
> Porter Algorithm)
>
> I'm planning on being at OR11 and would be happy to discuss this over a
> beer.
>
> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.
>
> Richard Jizba
> Creighton University
>
>
>
>
>
>
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to