Hi, A sub string can occur anywhere in the main string.
I think I could use TABLE and than add the numbers. But I don't know how to access the numbers in the result of table. Another problem is that there might be a hierarchy in the strings. This is, string a might be a subset of b while b might be a subset of c. So, when checking the strings, I would have to start with the longest string and find all subsets of that one. An than I should check the second longest string and so on... But I cannot find a way of ordering strings on their length. Regards, Dieter jim holtman wrote: > How do you determine if one string is a subset of another? Does it > only match at the beginning, or anywhere? How large is your set of > strings? Can you use table as you describe and then determine what > the groupings of subsets are and then just add the numbers together? > You can use grep/regexpr to determine if one string is a subset of > another. > > On 10/3/07, Dieter Vanderelst <[EMAIL PROTECTED]> wrote: >> Hi list, >> >> I'm currently processing textual data and I would really appreciate some >> help with one off my problems. >> >> I have a set of strings and I want to count how often each of this >> strings appears in this set. >> >> This is not very difficult and can be done as: >> >> TB<-table(my_set) >> plot(TB) >> >> However, I also want to collapse across sub-strings. This is, I want a >> sub-string ss of string S to be counted as an occurrence of string S. >> >> So, 'abab' should be included in the count of 'ababaaa' and should not >> be listed as a separate entry in the frequency table. >> >> Does somebody has a pointer to a way to do this? I have been checking >> out the CRAN packages for handling DNA sequences, but this has not >> really brought me closer to a solution. >> >> Thanks, >> Dieter Vanderelst >> >> ------------------------------------------ >> Dieter Vanderelst >> Eindhoven University of Technology >> Faculty of Industrial Design >> Designed Intelligence Group >> Den Dolech 2 >> 5612 AZ Eindhoven >> The Netherlands >> Tel +31 40 247 91 11 >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.