Re: [R] String comparison, trailing blanks make a difference.

2014-07-19 Thread John McKown
On Fri, Jul 18, 2014 at 11:17 AM, John McKown
john.archie.mck...@gmail.com wrote:
 Well, this was a shock to me. And I don't really see any documentation
 about it, but perhaps I just can't see it.

abc == abc 
 [1] FALSE

 I guess that I thought of strings in R like I do is some other
 languages where the shorter value is padded with blanks to the length
 of the longer value, then compared. I.e. that trailing blanks didn't
 matter.

 The best solution that I have found is to use the str_trim() function
 from the stringr to remove all the trailing blanks after I get the
 data from the SQL data base. I cannot change the SQL schema to make
 the column a varchar instead of a char column. It is a vendor DB. And
 I don't know an ANSI SQL standard way to remove trailing blanks in the
 SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but
 MS-SQL upchucks on that syntax.


Well, here I am - talking to myself ... again.

My problem was, of course, of my own making. I am getting my data
via RODBC from MS-SQL Server. I was basically doing a SELECT * FROM
TABLE. I normally use PostgreSQL, not MS-SQL, and I tend to use the
TEXT data type instead of CHAR or VARCHAR. So when I do the SELECT,
I get back my data without trailing blanks. Well, the data I am
reading now is created by a software vendor. I guess in order to be
database independent, the vendor designed his tables to have only
fixed length CHAR, and INT values in it. The fixed length CHAR values
are, naturally, padded on the right with blanks. Of course, now that I
understand this (weird as it is to me), I know to use a SELECT which
specifically lists the columns that I want _and_ does a TRIM() on them
to remove trailing blanks. This will reduce the size, in bytes, in my
data.frame and make it easier to use the comparison operators. Given
how the vendor saves the data, I am quite surprised that they didn't
use SQLite. The tables are simple. There are no stored procedures,
no VIEWs, no use of SCHEMAs to make subsets. Basically they just want
a simple data store, with the ability to do _simple_ joins. SQLite
seems, to me, to be a better fit than requiring the user to have a
full blown RDMS such as MS-SQL or Oracle.

Well, thanks for the whack on the head to wake me up and make me
really look at my data.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String comparison, trailing blanks make a difference.

2014-07-19 Thread Hadley Wickham
If you have unicode strings, you may need to do even more because
there are often multiple ways of representing the same glyph. I made a
little demo at http://rpubs.com/hadley/unicode-normalisation, since
any unicode characters are likely to get mangled by email.

Hadley

On Fri, Jul 18, 2014 at 11:32 AM, William Dunlap wdun...@tibco.com wrote:
abc == abc 
 [1] FALSE

 R does no interpretation of strings when doing comparisons so you do
 have do your own canonicalization.  That may involve removing
 trailing, leading, or all white space or punctuation, converting to
 lower or upper case, mapping nicknames to official names, trimming to
 a fixed number of characters, etc.

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Fri, Jul 18, 2014 at 9:17 AM, John McKown
 john.archie.mck...@gmail.com wrote:
 Well, this was a shock to me. And I don't really see any documentation
 about it, but perhaps I just can't see it.

abc == abc 
 [1] FALSE

 I guess that I thought of strings in R like I do is some other
 languages where the shorter value is padded with blanks to the length
 of the longer value, then compared. I.e. that trailing blanks didn't
 matter.

 The best solution that I have found is to use the str_trim() function
 from the stringr to remove all the trailing blanks after I get the
 data from the SQL data base. I cannot change the SQL schema to make
 the column a varchar instead of a char column. It is a vendor DB. And
 I don't know an ANSI SQL standard way to remove trailing blanks in the
 SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but
 MS-SQL upchucks on that syntax.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] String comparison, trailing blanks make a difference.

2014-07-18 Thread John McKown
Well, this was a shock to me. And I don't really see any documentation
about it, but perhaps I just can't see it.

abc == abc 
[1] FALSE

I guess that I thought of strings in R like I do is some other
languages where the shorter value is padded with blanks to the length
of the longer value, then compared. I.e. that trailing blanks didn't
matter.

The best solution that I have found is to use the str_trim() function
from the stringr to remove all the trailing blanks after I get the
data from the SQL data base. I cannot change the SQL schema to make
the column a varchar instead of a char column. It is a vendor DB. And
I don't know an ANSI SQL standard way to remove trailing blanks in the
SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but
MS-SQL upchucks on that syntax.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String comparison, trailing blanks make a difference.

2014-07-18 Thread William Dunlap
abc == abc 
 [1] FALSE

R does no interpretation of strings when doing comparisons so you do
have do your own canonicalization.  That may involve removing
trailing, leading, or all white space or punctuation, converting to
lower or upper case, mapping nicknames to official names, trimming to
a fixed number of characters, etc.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 18, 2014 at 9:17 AM, John McKown
john.archie.mck...@gmail.com wrote:
 Well, this was a shock to me. And I don't really see any documentation
 about it, but perhaps I just can't see it.

abc == abc 
 [1] FALSE

 I guess that I thought of strings in R like I do is some other
 languages where the shorter value is padded with blanks to the length
 of the longer value, then compared. I.e. that trailing blanks didn't
 matter.

 The best solution that I have found is to use the str_trim() function
 from the stringr to remove all the trailing blanks after I get the
 data from the SQL data base. I cannot change the SQL schema to make
 the column a varchar instead of a char column. It is a vendor DB. And
 I don't know an ANSI SQL standard way to remove trailing blanks in the
 SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but
 MS-SQL upchucks on that syntax.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] String comparison, trailing blanks make a difference.

2014-07-18 Thread Hervé Pagès

Hi John,

On 07/18/2014 09:17 AM, John McKown wrote:

Well, this was a shock to me. And I don't really see any documentation
about it, but perhaps I just can't see it.


abc == abc

[1] FALSE

I guess that I thought of strings in R like I do is some other
languages where the shorter value is padded with blanks to the length
of the longer value, then compared. I.e. that trailing blanks didn't
matter.


The shock to me is to learn that some programming languages consider
strings abc and abc  to be the same. Please name them so I can stay
away from them ;-)

Thanks,
H.



The best solution that I have found is to use the str_trim() function
from the stringr to remove all the trailing blanks after I get the
data from the SQL data base. I cannot change the SQL schema to make
the column a varchar instead of a char column. It is a vendor DB. And
I don't know an ANSI SQL standard way to remove trailing blanks in the
SELECT command. PostgreSQL has a trim(trailing ' ' from column)', but
MS-SQL upchucks on that syntax.



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.