Re: Similarity percentage between two Strings

N. Hira Wed, 03 Sep 2008 13:30:34 -0700

I don't know how much of this is a Lucene problem, but -- as I'm sureyou will inevitably hear from others on the list -- it depends onwhat your definition of "similar" is.


By similar, do you mean:
1.  Identical, except for variations in case (upper/lower)

2. Allow 1., but also allow prefixes/suffixes (e.g., "FW: " or "...(summary")

3.  Allow 1., 2. and permit some new terms ... how many?

4. Allow all of the above and allow some changes to terms usingstemming (E.g., "Google releases Chrome" is similar to "Googleannounces the release of its new Chrome web browser")

....


I'm sure you see where this is going.  So ... how do you define similar?

Good luck!

-h
----------------------------------------------------------------------
Hira, N.R.
Cognocys, Inc.

On 03-Sep-2008, at 2:52 PM, Thiago Moreira wrote:

    Hey all,
I want to know how much two Strings are similar! The thing is:I'm processing an email box and I want to group all messages thathave the subject similar, makes sense?? I looked on thedocumentation but I didn't find how to accomplish this. It's notnecessary add the messages or the subjects on some kind of index.I'm using 2.3.2 version of Lucene.
    Anyone has some idea?

    Thanks in advance.
--
Thiago Moreira
Software Engineer
[EMAIL PROTECTED]
Liferay, Inc.
Enterprise. Open Source. For Life.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity percentage between two Strings

Reply via email to