Re: [R] testing whether two character vectors contain (the same) items in the same order
On 8/6/2015 5:25 AM, Federico Calboli wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F Perhaps install.packages(stringdist) help(package = 'stringdist') -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
And I probably should have included this link: http://journal.r-project.org/archive/2014-1/loo.pdf On 8/8/2015 12:50 PM, Robert Baer wrote: On 8/6/2015 5:25 AM, Federico Calboli wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F Perhaps install.packages(stringdist) help(package = 'stringdist') -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
On 7 Aug 2015, at 01:59, Bert Gunter bgunter.4...@gmail.com wrote: Boris: You may be right, but it seems like esp to me based on the op's non-description of likelihood of coming from the same noisy process. My response would be: seek local statistical help, as your replies indicate a good deal of statistical confusion. Cheers, Bert Bert, as this is R-help and not cross-validated I am looking for a precanned function that would test whether the order of characters in two character vectors comes from the same (noisy) process. I would thus expect you to say something on the lines of: function X uses method Y to do something like that function W uses method Z to do something like that … look into those, figure out exactly what you are testing and use the most appropiate function. The whys and wherefores are for me to deal with, I just want to know whether someone has built a function that does, or seems to do, what I asked for. As I said, this is R-help, and I seek help for R use. I do concede that my original question might have left many wondering, but I guess my reply to Boris would have cleared any doubts. I am therefore puzzled by the great deal of confusion on your part in understanding the purpose of my question and, in general, of this list. Best wishes F On Thursday, August 6, 2015, Boris Steipe boris.ste...@utoronto.ca wrote: You are looking for what is known as the Cayley distance between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples: x=1:5 y=c(2,3,1,4,5) distCayley(x,y) 8 Cheers, Boris On Aug 6, 2015, at 9:51 AM, Federico Calboli federico.calb...@helsinki.fi wrote: On 6 Aug 2015, at 15:40, Bert Gunter bgunter.4...@gmail.com wrote: Define goodness of match . For exact matches, see ?== , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] testing whether two character vectors contain (the same) items in the same order
On Aug 7, 2015, at 12:22 AM, Federico Calboli wrote: On 7 Aug 2015, at 01:59, Bert Gunter bgunter.4...@gmail.com wrote: Boris: You may be right, but it seems like esp to me based on the op's non-description of likelihood of coming from the same noisy process. My response would be: seek local statistical help, as your replies indicate a good deal of statistical confusion. Cheers, Bert Bert, as this is R-help and not cross-validated I am looking for a precanned function that would test whether the order of characters in two character vectors comes from the same (noisy) process. I would thus expect you to say something on the lines of: function X uses method Y to do something like that function W uses method Z to do something like that … look into those, figure out exactly what you are testing and use the most appropiate function. The whys and wherefores are for me to deal with, I just want to know whether someone has built a function that does, or seems to do, what I asked for. As I said, this is R-help, and I seek help for R use. findFn(levenshtein) found 57 matches; retrieving 3 pages 2 3 Downloaded 44 links in 17 packages. stringdist::stringdist( paste0(x, collapse=), paste0(letters[y], collapse=) ) [1] 30 -- HTH; David. I do concede that my original question might have left many wondering, but I guess my reply to Boris would have cleared any doubts. I am therefore puzzled by the great deal of confusion on your part in understanding the purpose of my question and, in general, of this list. Best wishes F On Thursday, August 6, 2015, Boris Steipe boris.ste...@utoronto.ca wrote: You are looking for what is known as the Cayley distance between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples: x=1:5 y=c(2,3,1,4,5) distCayley(x,y) 8 Cheers, Boris On Aug 6, 2015, at 9:51 AM, Federico Calboli federico.calb...@helsinki.fi wrote: On 6 Aug 2015, at 15:40, Bert Gunter bgunter.4...@gmail.com wrote: Define goodness of match . For exact matches, see ?== , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3,
[R] testing whether two character vectors contain (the same) items in the same order
Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
Boris: You may be right, but it seems like esp to me based on the op's non-description of likelihood of coming from the same noisy process. My response would be: seek local statistical help, as your replies indicate a good deal of statistical confusion. Cheers, Bert On Thursday, August 6, 2015, Boris Steipe boris.ste...@utoronto.ca wrote: You are looking for what is known as the Cayley distance between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows ( https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster ( https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples: x=1:5 y=c(2,3,1,4,5) distCayley(x,y) 8 Cheers, Boris On Aug 6, 2015, at 9:51 AM, Federico Calboli federico.calb...@helsinki.fi javascript:; wrote: On 6 Aug 2015, at 15:40, Bert Gunter bgunter.4...@gmail.com javascript:; wrote: Define goodness of match . For exact matches, see ?== , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi javascript:; wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi javascript:; __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi javascript:; __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
Define goodness of match . For exact matches, see ?== , all.equal, etc. Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi javascript:; __ R-help@r-project.org javascript:; mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
You are looking for what is known as the Cayley distance between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples: x=1:5 y=c(2,3,1,4,5) distCayley(x,y) 8 Cheers, Boris On Aug 6, 2015, at 9:51 AM, Federico Calboli federico.calb...@helsinki.fi wrote: On 6 Aug 2015, at 15:40, Bert Gunter bgunter.4...@gmail.com wrote: Define goodness of match . For exact matches, see ?== , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing whether two character vectors contain (the same) items in the same order
On 6 Aug 2015, at 15:40, Bert Gunter bgunter.4...@gmail.com wrote: Define goodness of match . For exact matches, see ?== , all.equal, etc. Fair point. I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists. BW F Bert On Thursday, August 6, 2015, Federico Calboli federico.calb...@helsinki.fi wrote: Hi All, let’s assume I have a vector of letters drawn only once from the alphabet: x = sample(letters, 15, replace = F) x [1] z t g l u d w x a q k j f n “v y = x[c(1:7,9:8, 10:12, 14, 15, 13)] I would now like to test how good a match y is for x. Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it). BW F -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll -- Federico Calboli Ecological Genetics Research Unit Department of Biosciences PO Box 65 (Biocenter 3, Viikinkaari 1) FIN-00014 University of Helsinki Finland federico.calb...@helsinki.fi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.