Re: [Rd] Improving string concatenation

2015-06-24 Thread Gökçen Eraslan


On 2015-06-17 20:24, Joshua Bradley wrote:

How would this new '+' deal with factors, as paste does or as the current

'+'

does?  Would number+string and string+number cause errors (as in current
'+' in R and python) or coerce both to strings (as in current R:paste and

in perl's '+').


I had posted this sample code previously to demonstrate how string
concatenation could be implemented

+ = function(x,y) {
 if(is.character(x)  is.character(y)) {
 return(paste0(x , y))
 } else {
 .Primitive(+)(x,y)
 }}



%+% might have been another option, possibly a more backward-compatible 
one. paste0 - %+% pair also resembles outer - %o% and match - %in% 
pairs.


My 2 cents.

PS: I don't agree that the subject is rather incomplete or just not true.



so it would only happen if both objects were characters, otherwise you
should expect the same behavior as before with all other classes. This
would be backwards compatible as well since string+string was never
supported before and therefore no one would have previously working code
that could break.

Josh Bradley


Goekcen.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-18 Thread Hervé Pagès

Hi Joshua,

On 06/17/2015 11:24 AM, Joshua Bradley wrote:

How would this new '+' deal with factors, as paste does or as the current

'+'

does?  Would number+string and string+number cause errors (as in current
'+' in R and python) or coerce both to strings (as in current R:paste and

in perl's '+').


I had posted this sample code previously to demonstrate how string
concatenation could be implemented

+ = function(x,y) {
 if(is.character(x)  is.character(y)) {
 return(paste0(x , y))
 } else {
 .Primitive(+)(x,y)
 }}


so it would only happen if both objects were characters,


Problem with this is that it's inconsistent with other binary operators
that will first coerce the non-character operand to character if the
other operand is a character.

H.


otherwise you
should expect the same behavior as before with all other classes. This
would be backwards compatible as well since string+string was never
supported before and therefore no one would have previously working code
that could break.

Josh Bradley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-18 Thread MacQueen, Don
At the risk of unnecessarily (annoyingly?) prolonging a conversation that
has died down...

I don't think I've seen the sep or collapse arguments to paste mentioned
as aspects to consider. I don't see any way in which this version of '+'
could offer those arguments. Hence I would consider this version of '+' to
be a just convenience function, i.e., a function that, for convenience,
implements a special case of a more general function. It would not be a
different type of concatenation, nor would it improve the current methods
of string concatenation.

There is precedent in R for convenience functions. Indeed, I consider
paste0 to be a convenience function for paste with sep=''. read.csv and
several others are convenience functions that implement special cases of
read.table. 

Viewed that way, I see no intrinsic conceptual impediment to introducing a
version of '+' that does string concatenation. Of course, those who did
the work would have to decide how it would handle recycling and other
issues that have been raised.

However, whether or not it would be a good idea to do so, or worth the
effort, is not clear.

I've never felt that ... it would be nice if R did something the same way
as language X ... is by itself a strong argument for introducing a new
function or capability. Speaking as a long-time user, I wouldn't ask R
core to spend time on it. Would I use it if it were available? Possibly
over time I might migrate toward using it in simple situations.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 6/17/15, 12:36 PM, R-devel on behalf of William Dunlap
r-devel-boun...@r-project.org on behalf of wdun...@tibco.com wrote:

if '+' and paste don't change their behavior with respect to
factors but you encourage people to use '+' instead of paste
then you will run into problems with data.frame columns because
many people don't notice whether a character-like column is
character or factor.  With paste() this is not a problem but with '+'
it is.  I think it is good not to make people worry about this much.

As for the recycling issue, consider calls involving NULL arguments,
   f - function(n)paste0(n,  test, if(n!=1)s,  failed)
   f(1)
  [1] 1 test failed
   f(0)
  [1] 0 tests failed
If paste0 followed the same recycling rules as + then f(1) would return
character(0).  There is a fair bit of code like that on CRAN.

Consider using sprintf() to get the sort of recycling rules that + uses
   sprintf(%s is %d, c(One,Two), numeric(0))
  character(0)
   sprintf(%s is %d, c(One,Two), 17)
  [1] One is 17 Two is 17
   sprintf(%s is %d, c(One,Two), 26:27)
  [1] One is 26 Two is 27



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com
wrote:

 On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com
 wrote:
  ... adding the ability to concat
  strings with '+' would be a relatively simple addition (no pun
intended)
  to
  the code base I believe. With a lot of other languages supporting
this
  kind
  of concatenation, this is what surprised me most when first learning
R.
 
  Wow!  R has a lot of surprising features and I would have thought
  this would be quite a way down the list.

 Well, it is hard to guess what users and people in general find
 surprising. As '+' is used for string concatenation in essentially all
 major scripting (and many other) languages, personally I am not
 surprised that this is surprising for people. :)

  How would this new '+' deal with factors, as paste does or as the
current
  '+'
  does?

 The same as before. It would not change the behavior for other
 classes, only basic characters.

  Would number+string and string+number cause errors (as in current
  '+' in R and python) or coerce both to strings (as in current R:paste
and
  in perl's '+').

 Would cause errors, exactly as it does right now.

  Having '+' work on all types of data can let improperly imported data
  get further into the system before triggering an error.

 Nobody is asking for this. Only characters, not all types of data.

  I see lots of
  errors
  reported on this list that are due to read.table interpreting text as
  character
  strings instead of the numbers that the user expected.  Detecting that
  error as early as possible is good.

 Isn't that a problem with read.table then? Detecting it there would be
 the earliest possible, no?

 Gabor

 [...]


   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread William Dunlap
if '+' and paste don't change their behavior with respect to
factors but you encourage people to use '+' instead of paste
then you will run into problems with data.frame columns because
many people don't notice whether a character-like column is
character or factor.  With paste() this is not a problem but with '+'
it is.  I think it is good not to make people worry about this much.

As for the recycling issue, consider calls involving NULL arguments,
   f - function(n)paste0(n,  test, if(n!=1)s,  failed)
   f(1)
  [1] 1 test failed
   f(0)
  [1] 0 tests failed
If paste0 followed the same recycling rules as + then f(1) would return
character(0).  There is a fair bit of code like that on CRAN.

Consider using sprintf() to get the sort of recycling rules that + uses
   sprintf(%s is %d, c(One,Two), numeric(0))
  character(0)
   sprintf(%s is %d, c(One,Two), 17)
  [1] One is 17 Two is 17
   sprintf(%s is %d, c(One,Two), 26:27)
  [1] One is 26 Two is 27



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com
wrote:

 On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com
 wrote:
  ... adding the ability to concat
  strings with '+' would be a relatively simple addition (no pun intended)
  to
  the code base I believe. With a lot of other languages supporting this
  kind
  of concatenation, this is what surprised me most when first learning R.
 
  Wow!  R has a lot of surprising features and I would have thought
  this would be quite a way down the list.

 Well, it is hard to guess what users and people in general find
 surprising. As '+' is used for string concatenation in essentially all
 major scripting (and many other) languages, personally I am not
 surprised that this is surprising for people. :)

  How would this new '+' deal with factors, as paste does or as the current
  '+'
  does?

 The same as before. It would not change the behavior for other
 classes, only basic characters.

  Would number+string and string+number cause errors (as in current
  '+' in R and python) or coerce both to strings (as in current R:paste and
  in perl's '+').

 Would cause errors, exactly as it does right now.

  Having '+' work on all types of data can let improperly imported data
  get further into the system before triggering an error.

 Nobody is asking for this. Only characters, not all types of data.

  I see lots of
  errors
  reported on this list that are due to read.table interpreting text as
  character
  strings instead of the numbers that the user expected.  Detecting that
  error as early as possible is good.

 Isn't that a problem with read.table then? Detecting it there would be
 the earliest possible, no?

 Gabor

 [...]


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread Hervé Pagès

Hi Bill,

On 06/17/2015 12:36 PM, William Dunlap wrote:

if '+' and paste don't change their behavior with respect to
factors but you encourage people to use '+' instead of paste
then you will run into problems with data.frame columns because
many people don't notice whether a character-like column is
character or factor.  With paste() this is not a problem but with '+'
it is.  I think it is good not to make people worry about this much.

As for the recycling issue, consider calls involving NULL arguments,
f - function(n)paste0(n,  test, if(n!=1)s,  failed)
f(1)
   [1] 1 test failed
f(0)
   [1] 0 tests failed
If paste0 followed the same recycling rules as + then f(1) would return
character(0).  There is a fair bit of code like that on CRAN.


OTOH a very common use case is to use paste (or paste0) to add a given
prefix (or suffix) to a bunch of strings:

  paste0(ID, x)  # buggy! (won't do the right thing if length(x) is 0)

This is like adding something to 'x' so it's conceptually no different
from doing:

  x + 5

which does the right thing when 'x' is a numeric(0).

Anyway, I don't think anybody suggested to change the recycling rules
of paste() or paste0() (which would of course break some existing code
that relies on it, but that's a very generic statement right?), only
to adopt the recycling rules of `+` and other binary arithmetic and
comparison operators if `+` was used to concatenate strings.

Cheers,
H.



Consider using sprintf() to get the sort of recycling rules that + uses
sprintf(%s is %d, c(One,Two), numeric(0))
   character(0)
sprintf(%s is %d, c(One,Two), 17)
   [1] One is 17 Two is 17
sprintf(%s is %d, c(One,Two), 26:27)
   [1] One is 26 Two is 27



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com
wrote:


On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com
wrote:

... adding the ability to concat
strings with '+' would be a relatively simple addition (no pun intended)

to

the code base I believe. With a lot of other languages supporting this

kind

of concatenation, this is what surprised me most when first learning R.


Wow!  R has a lot of surprising features and I would have thought
this would be quite a way down the list.


Well, it is hard to guess what users and people in general find
surprising. As '+' is used for string concatenation in essentially all
major scripting (and many other) languages, personally I am not
surprised that this is surprising for people. :)


How would this new '+' deal with factors, as paste does or as the current
'+'
does?


The same as before. It would not change the behavior for other
classes, only basic characters.


Would number+string and string+number cause errors (as in current
'+' in R and python) or coerce both to strings (as in current R:paste and
in perl's '+').


Would cause errors, exactly as it does right now.


Having '+' work on all types of data can let improperly imported data
get further into the system before triggering an error.


Nobody is asking for this. Only characters, not all types of data.


I see lots of
errors
reported on this list that are due to read.table interpreting text as
character
strings instead of the numbers that the user expected.  Detecting that
error as early as possible is good.


Isn't that a problem with read.table then? Detecting it there would be
the earliest possible, no?

Gabor

[...]



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread Joshua Bradley
 How would this new '+' deal with factors, as paste does or as the current
'+'
 does?  Would number+string and string+number cause errors (as in current
 '+' in R and python) or coerce both to strings (as in current R:paste and
in perl's '+').


I had posted this sample code previously to demonstrate how string
concatenation could be implemented

+ = function(x,y) {
if(is.character(x)  is.character(y)) {
return(paste0(x , y))
} else {
.Primitive(+)(x,y)
}}


so it would only happen if both objects were characters, otherwise you
should expect the same behavior as before with all other classes. This
would be backwards compatible as well since string+string was never
supported before and therefore no one would have previously working code
that could break.

Josh Bradley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread Michael Lawrence
Just to clarify, primitive (C-level) generics do not support dispatch
on basic classes (like character). This is for performance (no need to
consider dispatch on non-objects) and for sanity (in general,
redefining fundamental behaviors is dangerous). It is of course
possible to define a + method with a signature containing a class
not in the set of basic classes.

On Tue, Jun 16, 2015 at 7:30 PM, Joshua Bradley jgbradl...@gmail.com wrote:
 One of the poster's on the SO post I linked to previously suggested this
 but if '+' were made to be S4 compliant, then adding the ability to concat
 strings with '+' would be a relatively simple addition (no pun intended) to
 the code base I believe. With a lot of other languages supporting this kind
 of concatenation, this is what surprised me most when first learning R.

 This is where my (lack of) experience in R starts to show and why I brought
 up the question about performance. I'm wondering how bad performance would
 be effected by making '+' (or all arithmetic operators in general) S4
 compliant.

 Josh Bradley

 On Tue, Jun 16, 2015 at 8:35 PM, Gábor Csárdi csardi.ga...@gmail.com
 wrote:

 On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote:
 [...]
 
  If I was to override `+` to concatenate strings, I would make it stick
  to the recycling scheme used by arithmetic and comparison operators
  (which is the most sensible of all IMO).

 Yeah, I agree, paste's recycling rules are sometimes painful. This
 could be fixed with a nice new '+' concatenation operator, too. :)

 Gabor

  H.

 [...]


 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread Gábor Csárdi
On Wed, Jun 17, 2015 at 9:04 AM, Michael Lawrence
lawrence.mich...@gene.com wrote:
 Just to clarify, primitive (C-level) generics do not support dispatch
 on basic classes (like character). This is for performance (no need to
 consider dispatch on non-objects) and for sanity (in general,
 redefining fundamental behaviors is dangerous). It is of course
 possible to define a + method with a signature containing a class
 not in the set of basic classes.

I see, thanks for pointing this out.

Still, I see this as a technicality. The current + clearly detects
if it gets a non-numeric argument, because it gives an error message
for it. So in this case it could just check if both sides are
characters, and if that's true, concatenate them. So there is no
performance loss at all.

This is obviously not as clean as a dispatch, but I think it is still
better than requiring people to add classes to their strings,
especially if the strings are literals.

Btw. for some motivation, here is a (surely incomplete) list of
languages with '+' as the string concatenation operator:

ALGOL 68, BASIC, C++, C#, Cobra, Pascal, Object Pascal, Eiffel, Go,
JavaScript, Java, Python, Turing, Ruby, Windows PowerShell,
Objective-C, F#, Scala, Ya.

and there are a lot of others that have a different operator for it:

Haskell, Erlang, Ada, AppleScript, COBOL (for literals only), Curl,
Seed7, VHDL, Visual Basic, Excel, FreeBASIC, Perl, PHP, Maple, Icon,
Standard SQL, PL/I, Rexx, Mathematica, Lua, Smalltalk, OCaml, Standard
ML, F#, rc, Fortran.

Source: 
https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(strings)

Yes, even Fortran has one, and in C, I can simply write literal1
literal2 and they'll be concatenated. It is only for literals, but
still very useful.

Best,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread William Dunlap
 ... adding the ability to concat
 strings with '+' would be a relatively simple addition (no pun intended)
to
 the code base I believe. With a lot of other languages supporting this
kind
 of concatenation, this is what surprised me most when first learning R.

Wow!  R has a lot of surprising features and I would have thought
this would be quite a way down the list.

How would this new '+' deal with factors, as paste does or as the current
'+'
does?  Would number+string and string+number cause errors (as in current
'+' in R and python) or coerce both to strings (as in current R:paste and
in perl's '+').

Having '+' work on all types of data can let improperly imported data
get further into the system before triggering an error.  I see lots of
errors
reported on this list that are due to read.table interpreting text as
character
strings instead of the numbers that the user expected.  Detecting that
error as early as possible is good.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Jun 16, 2015 at 10:25 PM, Joshua Bradley jgbradl...@gmail.com
wrote:

 Bad choice of words I'm afraid. What I'm ultimately pushing for is a
 feature request. To allow string concatenation with '+' by default. Sure I
 can write my own string addition function (like the example I posted
 previously) but I use it so often that I end up putting it in every script
 I write.

 It is ultimately a matter of readability and syntactic sugar I guess. As an
 example, I work in the bioinformatics domain and write R scripts for
 pipelines with calls to various programs that require a lot of parameters
 to be set/varied. Seeing paste everywhere detracts from reading the code
 (in my opinion).

 This may not be a very strong argument, but to give a bit more objective
 reason, I claim its more readable/intuitive because other big languages
 have also picked up this convention (C++, java, javascript, python, etc.).


 Josh Bradley
 Graduate Student
 University of Maryland

 On Tue, Jun 16, 2015 at 11:00 PM, Gabriel Becker gmbec...@ucdavis.edu
 wrote:

 
  On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote:
  
   Hi, first time poster here. During my time using R, I have always found
   string concatenation to be (what I feel is) unnecessarily complicated
 by
   requiring the use of the paste() or similar commands.
 
  I don't follow. In what sense is paste complicated to use? Not in the
  sense of it's actual behavior, since what you propose below has identical
  behavior. So is your objection simply the number of characters one must
  type?
 
  I would argue that having a separate verb makes code much more readable,
  particularly at a quick glance. I know a character will come out of paste
  no matter what goes in. That is not without value from a code maintenance
  perspective. IMHO.
 
  ~G
 
  
  
   When searching for how to concatenate strings in R, several top search
   results show answers that say to write your own function or override
 the
   '+' operator.
  
   Sample code like the following from this
   
 
 http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r
  
   page
  
   + = function(x,y) {
   if(is.character(x)  is.character(y)) {
   return(paste(x , y, sep=))
   } else {
   .Primitive(+)(x,y)
   }}
  
  
  
   An old (2005) post
   https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on
  r-help
   mentioned possible performance reasons as to why this type of string
   concatenation is not supported out of the box but did not go into
 detail.
   Can someone explain why such a basic task as this must be handled by
   paste() instead of just using the '+' operator directly? Would
  performance
   degrade much today if the '+' form of string concatenation were added
  into
   R by default?
  
  
  
   Josh Bradley
  
   [[alternative HTML version deleted]]
  
   __
   R-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-devel
 

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-17 Thread Gábor Csárdi
On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com wrote:
 ... adding the ability to concat
 strings with '+' would be a relatively simple addition (no pun intended)
 to
 the code base I believe. With a lot of other languages supporting this
 kind
 of concatenation, this is what surprised me most when first learning R.

 Wow!  R has a lot of surprising features and I would have thought
 this would be quite a way down the list.

Well, it is hard to guess what users and people in general find
surprising. As '+' is used for string concatenation in essentially all
major scripting (and many other) languages, personally I am not
surprised that this is surprising for people. :)

 How would this new '+' deal with factors, as paste does or as the current
 '+'
 does?

The same as before. It would not change the behavior for other
classes, only basic characters.

 Would number+string and string+number cause errors (as in current
 '+' in R and python) or coerce both to strings (as in current R:paste and
 in perl's '+').

Would cause errors, exactly as it does right now.

 Having '+' work on all types of data can let improperly imported data
 get further into the system before triggering an error.

Nobody is asking for this. Only characters, not all types of data.

 I see lots of
 errors
 reported on this list that are due to read.table interpreting text as
 character
 strings instead of the numbers that the user expected.  Detecting that
 error as early as possible is good.

Isn't that a problem with read.table then? Detecting it there would be
the earliest possible, no?

Gabor

[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Joshua Bradley
Bad choice of words I'm afraid. What I'm ultimately pushing for is a
feature request. To allow string concatenation with '+' by default. Sure I
can write my own string addition function (like the example I posted
previously) but I use it so often that I end up putting it in every script
I write.

It is ultimately a matter of readability and syntactic sugar I guess. As an
example, I work in the bioinformatics domain and write R scripts for
pipelines with calls to various programs that require a lot of parameters
to be set/varied. Seeing paste everywhere detracts from reading the code
(in my opinion).

This may not be a very strong argument, but to give a bit more objective
reason, I claim its more readable/intuitive because other big languages
have also picked up this convention (C++, java, javascript, python, etc.).


Josh Bradley
Graduate Student
University of Maryland

On Tue, Jun 16, 2015 at 11:00 PM, Gabriel Becker gmbec...@ucdavis.edu
wrote:


 On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote:
 
  Hi, first time poster here. During my time using R, I have always found
  string concatenation to be (what I feel is) unnecessarily complicated by
  requiring the use of the paste() or similar commands.

 I don't follow. In what sense is paste complicated to use? Not in the
 sense of it's actual behavior, since what you propose below has identical
 behavior. So is your objection simply the number of characters one must
 type?

 I would argue that having a separate verb makes code much more readable,
 particularly at a quick glance. I know a character will come out of paste
 no matter what goes in. That is not without value from a code maintenance
 perspective. IMHO.

 ~G

 
 
  When searching for how to concatenate strings in R, several top search
  results show answers that say to write your own function or override the
  '+' operator.
 
  Sample code like the following from this
  
 http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r
 
  page
 
  + = function(x,y) {
  if(is.character(x)  is.character(y)) {
  return(paste(x , y, sep=))
  } else {
  .Primitive(+)(x,y)
  }}
 
 
 
  An old (2005) post
  https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on
 r-help
  mentioned possible performance reasons as to why this type of string
  concatenation is not supported out of the box but did not go into detail.
  Can someone explain why such a basic task as this must be handled by
  paste() instead of just using the '+' operator directly? Would
 performance
  degrade much today if the '+' form of string concatenation were added
 into
  R by default?
 
 
 
  Josh Bradley
 
  [[alternative HTML version deleted]]
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Joshua Bradley
One of the poster's on the SO post I linked to previously suggested this
but if '+' were made to be S4 compliant, then adding the ability to concat
strings with '+' would be a relatively simple addition (no pun intended) to
the code base I believe. With a lot of other languages supporting this kind
of concatenation, this is what surprised me most when first learning R.

This is where my (lack of) experience in R starts to show and why I brought
up the question about performance. I'm wondering how bad performance would
be effected by making '+' (or all arithmetic operators in general) S4
compliant.

Josh Bradley

On Tue, Jun 16, 2015 at 8:35 PM, Gábor Csárdi csardi.ga...@gmail.com
wrote:

 On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote:
 [...]
 
  If I was to override `+` to concatenate strings, I would make it stick
  to the recycling scheme used by arithmetic and comparison operators
  (which is the most sensible of all IMO).

 Yeah, I agree, paste's recycling rules are sometimes painful. This
 could be fixed with a nice new '+' concatenation operator, too. :)

 Gabor

  H.

 [...]


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Gábor Csárdi
On Tue, Jun 16, 2015 at 10:30 PM, Joshua Bradley jgbradl...@gmail.com wrote:
 One of the poster's on the SO post I linked to previously suggested this but
 if '+' were made to be S4 compliant, then adding the ability to concat
 strings with '+' would be a relatively simple addition (no pun intended) to
 the code base I believe. With a lot of other languages supporting this kind
 of concatenation, this is what surprised me most when first learning R.

 This is where my (lack of) experience in R starts to show and why I brought
 up the question about performance. I'm wondering how bad performance would
 be effected by making '+' (or all arithmetic operators in general) S4
 compliant.

I don't know much about S4, but '+' is already generic, so
implementation would be
easy I guess. Also, since it is already generic, I don't think this
would affect performance at all. (But FIXME please.)

The reason why it is not implemented is not because it is difficult.

Gabor

 Josh Bradley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Gabriel Becker
On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote:

 Hi, first time poster here. During my time using R, I have always found
 string concatenation to be (what I feel is) unnecessarily complicated by
 requiring the use of the paste() or similar commands.

I don't follow. In what sense is paste complicated to use? Not in the sense
of it's actual behavior, since what you propose below has identical
behavior. So is your objection simply the number of characters one must
type?

I would argue that having a separate verb makes code much more readable,
particularly at a quick glance. I know a character will come out of paste
no matter what goes in. That is not without value from a code maintenance
perspective. IMHO.

~G



 When searching for how to concatenate strings in R, several top search
 results show answers that say to write your own function or override the
 '+' operator.

 Sample code like the following from this
 
http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

 page

 + = function(x,y) {
 if(is.character(x)  is.character(y)) {
 return(paste(x , y, sep=))
 } else {
 .Primitive(+)(x,y)
 }}



 An old (2005) post
 https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on
r-help
 mentioned possible performance reasons as to why this type of string
 concatenation is not supported out of the box but did not go into detail.
 Can someone explain why such a basic task as this must be handled by
 paste() instead of just using the '+' operator directly? Would performance
 degrade much today if the '+' form of string concatenation were added into
 R by default?



 Josh Bradley

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Gábor Csárdi
On Tue, Jun 16, 2015 at 6:32 PM, Joshua Bradley jgbradl...@gmail.com wrote:
[...]
 An old (2005) post
 https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help
 mentioned possible performance reasons as to why this type of string
 concatenation is not supported out of the box but did not go into detail.
 Can someone explain why such a basic task as this must be handled by
 paste() instead of just using the '+' operator directly?

Well, R-core's reason was in that email thread, quoting:

The issue is that only coercion between numeric
(broad sense, including complex) types is supported for the arithmetical
operators, presumably to avoid the ambiguity of things like

x - 123.45
y - as.character(1)
x + y

Should that be 124.45 or 123.451?  One of the difficulties of any
dispatch on two arguments is how to do the best matching on two classes,
especially with symmetric operators like +.  Internally R favours simple
fast rules.

Personally, I am not really convinced by this, because what currently
happens is this:

1 + 1
# Error in 1 + 1 : non-numeric argument to binary operator
1 + 1
# Error in 1 + 1 : non-numeric argument to binary operator

which is perfectly fine behavior, and it could stay the same with a
'+' string concatenation operator, i.e.:
- if both arguments are characters, call paste(),
- otherwise go on and do whatever is being done right now.
In other words, coercion to string is not important in the '+' operator.

 Would performance
 degrade much today if the '+' form of string concatenation were added into
 R by default?

Personally, I highly doubt it, but I don't have a benchmark to back this up.

Gabor

[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Hervé Pagès

Hi Joshua,

On 06/16/2015 03:32 PM, Joshua Bradley wrote:

Hi, first time poster here. During my time using R, I have always found
string concatenation to be (what I feel is) unnecessarily complicated by
requiring the use of the paste() or similar commands.


When searching for how to concatenate strings in R, several top search
results show answers that say to write your own function or override the
'+' operator.

Sample code like the following from this
http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r
page

+ = function(x,y) {
 if(is.character(x)  is.character(y)) {
 return(paste(x , y, sep=))
 } else {
 .Primitive(+)(x,y)
 }}


Note that paste0() is a more convenient and more efficient way to
concatenate strings:

  paste0(x, y)  # no need to specify 'sep', no separator is inserted

Related to this, one thing that has always bothered me is the
different/inconsistent recycling schemes used by different binary
operations in R:

 1:3 + integer(0)
integer(0)

 c(a, b, c) = character(0)
logical(0)

 paste0(c(a, b, c), character(0))
[1] a b c

 mapply(paste0, c(a, b, c), character(0))
Error in mapply(paste0, c(a, b, c), character(0)) :
  zero-length inputs cannot be mixed with those of non-zero length

If I was to override `+` to concatenate strings, I would make it stick
to the recycling scheme used by arithmetic and comparison operators
(which is the most sensible of all IMO).

H.





An old (2005) post
https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help
mentioned possible performance reasons as to why this type of string
concatenation is not supported out of the box but did not go into detail.
Can someone explain why such a basic task as this must be handled by
paste() instead of just using the '+' operator directly? Would performance
degrade much today if the '+' form of string concatenation were added into
R by default?



Josh Bradley

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Improving string concatenation

2015-06-16 Thread Gábor Csárdi
On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote:
[...]

 If I was to override `+` to concatenate strings, I would make it stick
 to the recycling scheme used by arithmetic and comparison operators
 (which is the most sensible of all IMO).

Yeah, I agree, paste's recycling rules are sometimes painful. This
could be fixed with a nice new '+' concatenation operator, too. :)

Gabor

 H.

[...]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel