Re: [R] String comparison, trailing blanks make a difference.
On Fri, Jul 18, 2014 at 11:17 AM, John McKown john.archie.mck...@gmail.com wrote: Well, this was a shock to me. And I don't really see any documentation about it, but perhaps I just can't see it. abc == abc [1] FALSE I guess that I thought of strings in R like I do is some other languages where the shorter value is padded with blanks to the length of the longer value, then compared. I.e. that trailing blanks didn't matter. The best solution that I have found is to use the str_trim() function from the stringr to remove all the trailing blanks after I get the data from the SQL data base. I cannot change the SQL schema to make the column a varchar instead of a char column. It is a vendor DB. And I don't know an ANSI SQL standard way to remove trailing blanks in the SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but MS-SQL upchucks on that syntax. Well, here I am - talking to myself ... again. My problem was, of course, of my own making. I am getting my data via RODBC from MS-SQL Server. I was basically doing a SELECT * FROM TABLE. I normally use PostgreSQL, not MS-SQL, and I tend to use the TEXT data type instead of CHAR or VARCHAR. So when I do the SELECT, I get back my data without trailing blanks. Well, the data I am reading now is created by a software vendor. I guess in order to be database independent, the vendor designed his tables to have only fixed length CHAR, and INT values in it. The fixed length CHAR values are, naturally, padded on the right with blanks. Of course, now that I understand this (weird as it is to me), I know to use a SELECT which specifically lists the columns that I want _and_ does a TRIM() on them to remove trailing blanks. This will reduce the size, in bytes, in my data.frame and make it easier to use the comparison operators. Given how the vendor saves the data, I am quite surprised that they didn't use SQLite. The tables are simple. There are no stored procedures, no VIEWs, no use of SCHEMAs to make subsets. Basically they just want a simple data store, with the ability to do _simple_ joins. SQLite seems, to me, to be a better fit than requiring the user to have a full blown RDMS such as MS-SQL or Oracle. Well, thanks for the whack on the head to wake me up and make me really look at my data. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String comparison, trailing blanks make a difference.
If you have unicode strings, you may need to do even more because there are often multiple ways of representing the same glyph. I made a little demo at http://rpubs.com/hadley/unicode-normalisation, since any unicode characters are likely to get mangled by email. Hadley On Fri, Jul 18, 2014 at 11:32 AM, William Dunlap wdun...@tibco.com wrote: abc == abc [1] FALSE R does no interpretation of strings when doing comparisons so you do have do your own canonicalization. That may involve removing trailing, leading, or all white space or punctuation, converting to lower or upper case, mapping nicknames to official names, trimming to a fixed number of characters, etc. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 18, 2014 at 9:17 AM, John McKown john.archie.mck...@gmail.com wrote: Well, this was a shock to me. And I don't really see any documentation about it, but perhaps I just can't see it. abc == abc [1] FALSE I guess that I thought of strings in R like I do is some other languages where the shorter value is padded with blanks to the length of the longer value, then compared. I.e. that trailing blanks didn't matter. The best solution that I have found is to use the str_trim() function from the stringr to remove all the trailing blanks after I get the data from the SQL data base. I cannot change the SQL schema to make the column a varchar instead of a char column. It is a vendor DB. And I don't know an ANSI SQL standard way to remove trailing blanks in the SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but MS-SQL upchucks on that syntax. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] String comparison, trailing blanks make a difference.
Well, this was a shock to me. And I don't really see any documentation about it, but perhaps I just can't see it. abc == abc [1] FALSE I guess that I thought of strings in R like I do is some other languages where the shorter value is padded with blanks to the length of the longer value, then compared. I.e. that trailing blanks didn't matter. The best solution that I have found is to use the str_trim() function from the stringr to remove all the trailing blanks after I get the data from the SQL data base. I cannot change the SQL schema to make the column a varchar instead of a char column. It is a vendor DB. And I don't know an ANSI SQL standard way to remove trailing blanks in the SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but MS-SQL upchucks on that syntax. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String comparison, trailing blanks make a difference.
abc == abc [1] FALSE R does no interpretation of strings when doing comparisons so you do have do your own canonicalization. That may involve removing trailing, leading, or all white space or punctuation, converting to lower or upper case, mapping nicknames to official names, trimming to a fixed number of characters, etc. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 18, 2014 at 9:17 AM, John McKown john.archie.mck...@gmail.com wrote: Well, this was a shock to me. And I don't really see any documentation about it, but perhaps I just can't see it. abc == abc [1] FALSE I guess that I thought of strings in R like I do is some other languages where the shorter value is padded with blanks to the length of the longer value, then compared. I.e. that trailing blanks didn't matter. The best solution that I have found is to use the str_trim() function from the stringr to remove all the trailing blanks after I get the data from the SQL data base. I cannot change the SQL schema to make the column a varchar instead of a char column. It is a vendor DB. And I don't know an ANSI SQL standard way to remove trailing blanks in the SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but MS-SQL upchucks on that syntax. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] String comparison, trailing blanks make a difference.
Hi John, On 07/18/2014 09:17 AM, John McKown wrote: Well, this was a shock to me. And I don't really see any documentation about it, but perhaps I just can't see it. abc == abc [1] FALSE I guess that I thought of strings in R like I do is some other languages where the shorter value is padded with blanks to the length of the longer value, then compared. I.e. that trailing blanks didn't matter. The shock to me is to learn that some programming languages consider strings abc and abc to be the same. Please name them so I can stay away from them ;-) Thanks, H. The best solution that I have found is to use the str_trim() function from the stringr to remove all the trailing blanks after I get the data from the SQL data base. I cannot change the SQL schema to make the column a varchar instead of a char column. It is a vendor DB. And I don't know an ANSI SQL standard way to remove trailing blanks in the SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but MS-SQL upchucks on that syntax. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.