Re: [Rd] Improving string concatenation
On 2015-06-17 20:24, Joshua Bradley wrote: How would this new '+' deal with factors, as paste does or as the current '+' does? Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). I had posted this sample code previously to demonstrate how string concatenation could be implemented + = function(x,y) { if(is.character(x) is.character(y)) { return(paste0(x , y)) } else { .Primitive(+)(x,y) }} %+% might have been another option, possibly a more backward-compatible one. paste0 - %+% pair also resembles outer - %o% and match - %in% pairs. My 2 cents. PS: I don't agree that the subject is rather incomplete or just not true. so it would only happen if both objects were characters, otherwise you should expect the same behavior as before with all other classes. This would be backwards compatible as well since string+string was never supported before and therefore no one would have previously working code that could break. Josh Bradley Goekcen. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
Hi Joshua, On 06/17/2015 11:24 AM, Joshua Bradley wrote: How would this new '+' deal with factors, as paste does or as the current '+' does? Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). I had posted this sample code previously to demonstrate how string concatenation could be implemented + = function(x,y) { if(is.character(x) is.character(y)) { return(paste0(x , y)) } else { .Primitive(+)(x,y) }} so it would only happen if both objects were characters, Problem with this is that it's inconsistent with other binary operators that will first coerce the non-character operand to character if the other operand is a character. H. otherwise you should expect the same behavior as before with all other classes. This would be backwards compatible as well since string+string was never supported before and therefore no one would have previously working code that could break. Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
At the risk of unnecessarily (annoyingly?) prolonging a conversation that has died down... I don't think I've seen the sep or collapse arguments to paste mentioned as aspects to consider. I don't see any way in which this version of '+' could offer those arguments. Hence I would consider this version of '+' to be a just convenience function, i.e., a function that, for convenience, implements a special case of a more general function. It would not be a different type of concatenation, nor would it improve the current methods of string concatenation. There is precedent in R for convenience functions. Indeed, I consider paste0 to be a convenience function for paste with sep=''. read.csv and several others are convenience functions that implement special cases of read.table. Viewed that way, I see no intrinsic conceptual impediment to introducing a version of '+' that does string concatenation. Of course, those who did the work would have to decide how it would handle recycling and other issues that have been raised. However, whether or not it would be a good idea to do so, or worth the effort, is not clear. I've never felt that ... it would be nice if R did something the same way as language X ... is by itself a strong argument for introducing a new function or capability. Speaking as a long-time user, I wouldn't ask R core to spend time on it. Would I use it if it were available? Possibly over time I might migrate toward using it in simple situations. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 6/17/15, 12:36 PM, R-devel on behalf of William Dunlap r-devel-boun...@r-project.org on behalf of wdun...@tibco.com wrote: if '+' and paste don't change their behavior with respect to factors but you encourage people to use '+' instead of paste then you will run into problems with data.frame columns because many people don't notice whether a character-like column is character or factor. With paste() this is not a problem but with '+' it is. I think it is good not to make people worry about this much. As for the recycling issue, consider calls involving NULL arguments, f - function(n)paste0(n, test, if(n!=1)s, failed) f(1) [1] 1 test failed f(0) [1] 0 tests failed If paste0 followed the same recycling rules as + then f(1) would return character(0). There is a fair bit of code like that on CRAN. Consider using sprintf() to get the sort of recycling rules that + uses sprintf(%s is %d, c(One,Two), numeric(0)) character(0) sprintf(%s is %d, c(One,Two), 17) [1] One is 17 Two is 17 sprintf(%s is %d, c(One,Two), 26:27) [1] One is 26 Two is 27 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com wrote: On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com wrote: ... adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. Wow! R has a lot of surprising features and I would have thought this would be quite a way down the list. Well, it is hard to guess what users and people in general find surprising. As '+' is used for string concatenation in essentially all major scripting (and many other) languages, personally I am not surprised that this is surprising for people. :) How would this new '+' deal with factors, as paste does or as the current '+' does? The same as before. It would not change the behavior for other classes, only basic characters. Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). Would cause errors, exactly as it does right now. Having '+' work on all types of data can let improperly imported data get further into the system before triggering an error. Nobody is asking for this. Only characters, not all types of data. I see lots of errors reported on this list that are due to read.table interpreting text as character strings instead of the numbers that the user expected. Detecting that error as early as possible is good. Isn't that a problem with read.table then? Detecting it there would be the earliest possible, no? Gabor [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
if '+' and paste don't change their behavior with respect to factors but you encourage people to use '+' instead of paste then you will run into problems with data.frame columns because many people don't notice whether a character-like column is character or factor. With paste() this is not a problem but with '+' it is. I think it is good not to make people worry about this much. As for the recycling issue, consider calls involving NULL arguments, f - function(n)paste0(n, test, if(n!=1)s, failed) f(1) [1] 1 test failed f(0) [1] 0 tests failed If paste0 followed the same recycling rules as + then f(1) would return character(0). There is a fair bit of code like that on CRAN. Consider using sprintf() to get the sort of recycling rules that + uses sprintf(%s is %d, c(One,Two), numeric(0)) character(0) sprintf(%s is %d, c(One,Two), 17) [1] One is 17 Two is 17 sprintf(%s is %d, c(One,Two), 26:27) [1] One is 26 Two is 27 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com wrote: On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com wrote: ... adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. Wow! R has a lot of surprising features and I would have thought this would be quite a way down the list. Well, it is hard to guess what users and people in general find surprising. As '+' is used for string concatenation in essentially all major scripting (and many other) languages, personally I am not surprised that this is surprising for people. :) How would this new '+' deal with factors, as paste does or as the current '+' does? The same as before. It would not change the behavior for other classes, only basic characters. Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). Would cause errors, exactly as it does right now. Having '+' work on all types of data can let improperly imported data get further into the system before triggering an error. Nobody is asking for this. Only characters, not all types of data. I see lots of errors reported on this list that are due to read.table interpreting text as character strings instead of the numbers that the user expected. Detecting that error as early as possible is good. Isn't that a problem with read.table then? Detecting it there would be the earliest possible, no? Gabor [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
Hi Bill, On 06/17/2015 12:36 PM, William Dunlap wrote: if '+' and paste don't change their behavior with respect to factors but you encourage people to use '+' instead of paste then you will run into problems with data.frame columns because many people don't notice whether a character-like column is character or factor. With paste() this is not a problem but with '+' it is. I think it is good not to make people worry about this much. As for the recycling issue, consider calls involving NULL arguments, f - function(n)paste0(n, test, if(n!=1)s, failed) f(1) [1] 1 test failed f(0) [1] 0 tests failed If paste0 followed the same recycling rules as + then f(1) would return character(0). There is a fair bit of code like that on CRAN. OTOH a very common use case is to use paste (or paste0) to add a given prefix (or suffix) to a bunch of strings: paste0(ID, x) # buggy! (won't do the right thing if length(x) is 0) This is like adding something to 'x' so it's conceptually no different from doing: x + 5 which does the right thing when 'x' is a numeric(0). Anyway, I don't think anybody suggested to change the recycling rules of paste() or paste0() (which would of course break some existing code that relies on it, but that's a very generic statement right?), only to adopt the recycling rules of `+` and other binary arithmetic and comparison operators if `+` was used to concatenate strings. Cheers, H. Consider using sprintf() to get the sort of recycling rules that + uses sprintf(%s is %d, c(One,Two), numeric(0)) character(0) sprintf(%s is %d, c(One,Two), 17) [1] One is 17 Two is 17 sprintf(%s is %d, c(One,Two), 26:27) [1] One is 26 Two is 27 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jun 17, 2015 at 9:56 AM, Gábor Csárdi csardi.ga...@gmail.com wrote: On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com wrote: ... adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. Wow! R has a lot of surprising features and I would have thought this would be quite a way down the list. Well, it is hard to guess what users and people in general find surprising. As '+' is used for string concatenation in essentially all major scripting (and many other) languages, personally I am not surprised that this is surprising for people. :) How would this new '+' deal with factors, as paste does or as the current '+' does? The same as before. It would not change the behavior for other classes, only basic characters. Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). Would cause errors, exactly as it does right now. Having '+' work on all types of data can let improperly imported data get further into the system before triggering an error. Nobody is asking for this. Only characters, not all types of data. I see lots of errors reported on this list that are due to read.table interpreting text as character strings instead of the numbers that the user expected. Detecting that error as early as possible is good. Isn't that a problem with read.table then? Detecting it there would be the earliest possible, no? Gabor [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
How would this new '+' deal with factors, as paste does or as the current '+' does? Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). I had posted this sample code previously to demonstrate how string concatenation could be implemented + = function(x,y) { if(is.character(x) is.character(y)) { return(paste0(x , y)) } else { .Primitive(+)(x,y) }} so it would only happen if both objects were characters, otherwise you should expect the same behavior as before with all other classes. This would be backwards compatible as well since string+string was never supported before and therefore no one would have previously working code that could break. Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
Just to clarify, primitive (C-level) generics do not support dispatch on basic classes (like character). This is for performance (no need to consider dispatch on non-objects) and for sanity (in general, redefining fundamental behaviors is dangerous). It is of course possible to define a + method with a signature containing a class not in the set of basic classes. On Tue, Jun 16, 2015 at 7:30 PM, Joshua Bradley jgbradl...@gmail.com wrote: One of the poster's on the SO post I linked to previously suggested this but if '+' were made to be S4 compliant, then adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. This is where my (lack of) experience in R starts to show and why I brought up the question about performance. I'm wondering how bad performance would be effected by making '+' (or all arithmetic operators in general) S4 compliant. Josh Bradley On Tue, Jun 16, 2015 at 8:35 PM, Gábor Csárdi csardi.ga...@gmail.com wrote: On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote: [...] If I was to override `+` to concatenate strings, I would make it stick to the recycling scheme used by arithmetic and comparison operators (which is the most sensible of all IMO). Yeah, I agree, paste's recycling rules are sometimes painful. This could be fixed with a nice new '+' concatenation operator, too. :) Gabor H. [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Wed, Jun 17, 2015 at 9:04 AM, Michael Lawrence lawrence.mich...@gene.com wrote: Just to clarify, primitive (C-level) generics do not support dispatch on basic classes (like character). This is for performance (no need to consider dispatch on non-objects) and for sanity (in general, redefining fundamental behaviors is dangerous). It is of course possible to define a + method with a signature containing a class not in the set of basic classes. I see, thanks for pointing this out. Still, I see this as a technicality. The current + clearly detects if it gets a non-numeric argument, because it gives an error message for it. So in this case it could just check if both sides are characters, and if that's true, concatenate them. So there is no performance loss at all. This is obviously not as clean as a dispatch, but I think it is still better than requiring people to add classes to their strings, especially if the strings are literals. Btw. for some motivation, here is a (surely incomplete) list of languages with '+' as the string concatenation operator: ALGOL 68, BASIC, C++, C#, Cobra, Pascal, Object Pascal, Eiffel, Go, JavaScript, Java, Python, Turing, Ruby, Windows PowerShell, Objective-C, F#, Scala, Ya. and there are a lot of others that have a different operator for it: Haskell, Erlang, Ada, AppleScript, COBOL (for literals only), Curl, Seed7, VHDL, Visual Basic, Excel, FreeBASIC, Perl, PHP, Maple, Icon, Standard SQL, PL/I, Rexx, Mathematica, Lua, Smalltalk, OCaml, Standard ML, F#, rc, Fortran. Source: https://en.wikipedia.org/wiki/Comparison_of_programming_languages_(strings) Yes, even Fortran has one, and in C, I can simply write literal1 literal2 and they'll be concatenated. It is only for literals, but still very useful. Best, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
... adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. Wow! R has a lot of surprising features and I would have thought this would be quite a way down the list. How would this new '+' deal with factors, as paste does or as the current '+' does? Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). Having '+' work on all types of data can let improperly imported data get further into the system before triggering an error. I see lots of errors reported on this list that are due to read.table interpreting text as character strings instead of the numbers that the user expected. Detecting that error as early as possible is good. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Jun 16, 2015 at 10:25 PM, Joshua Bradley jgbradl...@gmail.com wrote: Bad choice of words I'm afraid. What I'm ultimately pushing for is a feature request. To allow string concatenation with '+' by default. Sure I can write my own string addition function (like the example I posted previously) but I use it so often that I end up putting it in every script I write. It is ultimately a matter of readability and syntactic sugar I guess. As an example, I work in the bioinformatics domain and write R scripts for pipelines with calls to various programs that require a lot of parameters to be set/varied. Seeing paste everywhere detracts from reading the code (in my opinion). This may not be a very strong argument, but to give a bit more objective reason, I claim its more readable/intuitive because other big languages have also picked up this convention (C++, java, javascript, python, etc.). Josh Bradley Graduate Student University of Maryland On Tue, Jun 16, 2015 at 11:00 PM, Gabriel Becker gmbec...@ucdavis.edu wrote: On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote: Hi, first time poster here. During my time using R, I have always found string concatenation to be (what I feel is) unnecessarily complicated by requiring the use of the paste() or similar commands. I don't follow. In what sense is paste complicated to use? Not in the sense of it's actual behavior, since what you propose below has identical behavior. So is your objection simply the number of characters one must type? I would argue that having a separate verb makes code much more readable, particularly at a quick glance. I know a character will come out of paste no matter what goes in. That is not without value from a code maintenance perspective. IMHO. ~G When searching for how to concatenate strings in R, several top search results show answers that say to write your own function or override the '+' operator. Sample code like the following from this http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r page + = function(x,y) { if(is.character(x) is.character(y)) { return(paste(x , y, sep=)) } else { .Primitive(+)(x,y) }} An old (2005) post https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help mentioned possible performance reasons as to why this type of string concatenation is not supported out of the box but did not go into detail. Can someone explain why such a basic task as this must be handled by paste() instead of just using the '+' operator directly? Would performance degrade much today if the '+' form of string concatenation were added into R by default? Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Wed, Jun 17, 2015 at 12:45 PM, William Dunlap wdun...@tibco.com wrote: ... adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. Wow! R has a lot of surprising features and I would have thought this would be quite a way down the list. Well, it is hard to guess what users and people in general find surprising. As '+' is used for string concatenation in essentially all major scripting (and many other) languages, personally I am not surprised that this is surprising for people. :) How would this new '+' deal with factors, as paste does or as the current '+' does? The same as before. It would not change the behavior for other classes, only basic characters. Would number+string and string+number cause errors (as in current '+' in R and python) or coerce both to strings (as in current R:paste and in perl's '+'). Would cause errors, exactly as it does right now. Having '+' work on all types of data can let improperly imported data get further into the system before triggering an error. Nobody is asking for this. Only characters, not all types of data. I see lots of errors reported on this list that are due to read.table interpreting text as character strings instead of the numbers that the user expected. Detecting that error as early as possible is good. Isn't that a problem with read.table then? Detecting it there would be the earliest possible, no? Gabor [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
Bad choice of words I'm afraid. What I'm ultimately pushing for is a feature request. To allow string concatenation with '+' by default. Sure I can write my own string addition function (like the example I posted previously) but I use it so often that I end up putting it in every script I write. It is ultimately a matter of readability and syntactic sugar I guess. As an example, I work in the bioinformatics domain and write R scripts for pipelines with calls to various programs that require a lot of parameters to be set/varied. Seeing paste everywhere detracts from reading the code (in my opinion). This may not be a very strong argument, but to give a bit more objective reason, I claim its more readable/intuitive because other big languages have also picked up this convention (C++, java, javascript, python, etc.). Josh Bradley Graduate Student University of Maryland On Tue, Jun 16, 2015 at 11:00 PM, Gabriel Becker gmbec...@ucdavis.edu wrote: On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote: Hi, first time poster here. During my time using R, I have always found string concatenation to be (what I feel is) unnecessarily complicated by requiring the use of the paste() or similar commands. I don't follow. In what sense is paste complicated to use? Not in the sense of it's actual behavior, since what you propose below has identical behavior. So is your objection simply the number of characters one must type? I would argue that having a separate verb makes code much more readable, particularly at a quick glance. I know a character will come out of paste no matter what goes in. That is not without value from a code maintenance perspective. IMHO. ~G When searching for how to concatenate strings in R, several top search results show answers that say to write your own function or override the '+' operator. Sample code like the following from this http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r page + = function(x,y) { if(is.character(x) is.character(y)) { return(paste(x , y, sep=)) } else { .Primitive(+)(x,y) }} An old (2005) post https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help mentioned possible performance reasons as to why this type of string concatenation is not supported out of the box but did not go into detail. Can someone explain why such a basic task as this must be handled by paste() instead of just using the '+' operator directly? Would performance degrade much today if the '+' form of string concatenation were added into R by default? Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
One of the poster's on the SO post I linked to previously suggested this but if '+' were made to be S4 compliant, then adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. This is where my (lack of) experience in R starts to show and why I brought up the question about performance. I'm wondering how bad performance would be effected by making '+' (or all arithmetic operators in general) S4 compliant. Josh Bradley On Tue, Jun 16, 2015 at 8:35 PM, Gábor Csárdi csardi.ga...@gmail.com wrote: On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote: [...] If I was to override `+` to concatenate strings, I would make it stick to the recycling scheme used by arithmetic and comparison operators (which is the most sensible of all IMO). Yeah, I agree, paste's recycling rules are sometimes painful. This could be fixed with a nice new '+' concatenation operator, too. :) Gabor H. [...] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Tue, Jun 16, 2015 at 10:30 PM, Joshua Bradley jgbradl...@gmail.com wrote: One of the poster's on the SO post I linked to previously suggested this but if '+' were made to be S4 compliant, then adding the ability to concat strings with '+' would be a relatively simple addition (no pun intended) to the code base I believe. With a lot of other languages supporting this kind of concatenation, this is what surprised me most when first learning R. This is where my (lack of) experience in R starts to show and why I brought up the question about performance. I'm wondering how bad performance would be effected by making '+' (or all arithmetic operators in general) S4 compliant. I don't know much about S4, but '+' is already generic, so implementation would be easy I guess. Also, since it is already generic, I don't think this would affect performance at all. (But FIXME please.) The reason why it is not implemented is not because it is difficult. Gabor Josh Bradley __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Jun 16, 2015 3:44 PM, Joshua Bradley jgbradl...@gmail.com wrote: Hi, first time poster here. During my time using R, I have always found string concatenation to be (what I feel is) unnecessarily complicated by requiring the use of the paste() or similar commands. I don't follow. In what sense is paste complicated to use? Not in the sense of it's actual behavior, since what you propose below has identical behavior. So is your objection simply the number of characters one must type? I would argue that having a separate verb makes code much more readable, particularly at a quick glance. I know a character will come out of paste no matter what goes in. That is not without value from a code maintenance perspective. IMHO. ~G When searching for how to concatenate strings in R, several top search results show answers that say to write your own function or override the '+' operator. Sample code like the following from this http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r page + = function(x,y) { if(is.character(x) is.character(y)) { return(paste(x , y, sep=)) } else { .Primitive(+)(x,y) }} An old (2005) post https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help mentioned possible performance reasons as to why this type of string concatenation is not supported out of the box but did not go into detail. Can someone explain why such a basic task as this must be handled by paste() instead of just using the '+' operator directly? Would performance degrade much today if the '+' form of string concatenation were added into R by default? Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Tue, Jun 16, 2015 at 6:32 PM, Joshua Bradley jgbradl...@gmail.com wrote: [...] An old (2005) post https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help mentioned possible performance reasons as to why this type of string concatenation is not supported out of the box but did not go into detail. Can someone explain why such a basic task as this must be handled by paste() instead of just using the '+' operator directly? Well, R-core's reason was in that email thread, quoting: The issue is that only coercion between numeric (broad sense, including complex) types is supported for the arithmetical operators, presumably to avoid the ambiguity of things like x - 123.45 y - as.character(1) x + y Should that be 124.45 or 123.451? One of the difficulties of any dispatch on two arguments is how to do the best matching on two classes, especially with symmetric operators like +. Internally R favours simple fast rules. Personally, I am not really convinced by this, because what currently happens is this: 1 + 1 # Error in 1 + 1 : non-numeric argument to binary operator 1 + 1 # Error in 1 + 1 : non-numeric argument to binary operator which is perfectly fine behavior, and it could stay the same with a '+' string concatenation operator, i.e.: - if both arguments are characters, call paste(), - otherwise go on and do whatever is being done right now. In other words, coercion to string is not important in the '+' operator. Would performance degrade much today if the '+' form of string concatenation were added into R by default? Personally, I highly doubt it, but I don't have a benchmark to back this up. Gabor [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
Hi Joshua, On 06/16/2015 03:32 PM, Joshua Bradley wrote: Hi, first time poster here. During my time using R, I have always found string concatenation to be (what I feel is) unnecessarily complicated by requiring the use of the paste() or similar commands. When searching for how to concatenate strings in R, several top search results show answers that say to write your own function or override the '+' operator. Sample code like the following from this http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r page + = function(x,y) { if(is.character(x) is.character(y)) { return(paste(x , y, sep=)) } else { .Primitive(+)(x,y) }} Note that paste0() is a more convenient and more efficient way to concatenate strings: paste0(x, y) # no need to specify 'sep', no separator is inserted Related to this, one thing that has always bothered me is the different/inconsistent recycling schemes used by different binary operations in R: 1:3 + integer(0) integer(0) c(a, b, c) = character(0) logical(0) paste0(c(a, b, c), character(0)) [1] a b c mapply(paste0, c(a, b, c), character(0)) Error in mapply(paste0, c(a, b, c), character(0)) : zero-length inputs cannot be mixed with those of non-zero length If I was to override `+` to concatenate strings, I would make it stick to the recycling scheme used by arithmetic and comparison operators (which is the most sensible of all IMO). H. An old (2005) post https://stat.ethz.ch/pipermail/r-help/2005-February/066709.html on r-help mentioned possible performance reasons as to why this type of string concatenation is not supported out of the box but did not go into detail. Can someone explain why such a basic task as this must be handled by paste() instead of just using the '+' operator directly? Would performance degrade much today if the '+' form of string concatenation were added into R by default? Josh Bradley [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Improving string concatenation
On Tue, Jun 16, 2015 at 8:24 PM, Hervé Pagès hpa...@fredhutch.org wrote: [...] If I was to override `+` to concatenate strings, I would make it stick to the recycling scheme used by arithmetic and comparison operators (which is the most sensible of all IMO). Yeah, I agree, paste's recycling rules are sometimes painful. This could be fixed with a nice new '+' concatenation operator, too. :) Gabor H. [...] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel