Srinivas,

I don't know of a clustering algorithm, but you might check out agrep() from the base package and stringMatch() from the MiscPsycho package. These can help to identify similar text sequences, and it may be possible to group similar names by using these commands over and over again.

Ed

--
Ed Merkle, PhD
Assistant Professor
Dept. of Psychology
Wichita State University
Wichita, KS 67260


Date: Thu, 22 Jan 2009 16:33:03 +0530
From: srinivasa raghavan <srinivasrag...@gmail.com>
Subject: [R] text vector clustering
To: r-help@r-project.org
Message-ID:
        <e45b69190901220303u114028b1k43ef6f3ab7c7c...@mail.gmail.com>
Content-Type: text/plain

Hi,

I am a new user of R using R 2.8.1 in windows 2003.  I have a  csv file with
single column which contain the 30,000 students names. There were typo
errors while entering this student names. The actual list of names is <
1000. However we dont have that list for keyword search.

 I am interested in grouping/cluster these names   as those which are
similar  letter to letter.  Are there any text clustering algorithm in R
which can group names of similar type in to segments of exactly matching ,
90% matching, 80% matching,....etc.

thanks in advance,

regards,
srinivas
statistical analyst.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to