Hi,

I would really like to have a way to split long string literals across multiple lines in R.

Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:

> "aaa
+ bbb"
[1] "aaa\nbbb"


If a line ends with a backslash, it is just ignored:

> "aaa\
+ bbb"
[1] "aaa\nbbb"


We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:

> "aaa\
+ bbb"
[1] "aaabbb"


I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:

>   "aaa \
+    bbb"
[1] "aaa bbb"

>   "aaa\
+    \ bbb"
[1] "aaa bbb"

This is also implemented by this patch.


An alternative approach could be to have something like

("aaa "
"bbb")

or

("aaa ",
"bbb")

be interpreted as "aaa bbb".

I don't know the ins and outs of the parser of R (hence: please very carefully review the attached patch), but I guess this would be more work to implement!?


What do you think? Is there anybody else who is missing this feature in the first place?

Regards,
Andreas
Index: src/main/gram.c
===================================================================
--- src/main/gram.c	(Revision 72789)
+++ src/main/gram.c	(Arbeitskopie)
@@ -4646,10 +4646,17 @@
     int wcnt = 0;
     ucs_t wcs[10001];
     Rboolean oct_or_hex = FALSE, use_wcs = FALSE, currtext_truncated = FALSE;
+    Rboolean backslash_was_newline = FALSE, ignore_next_blank = FALSE;
 
     CTEXT_PUSH(c);
     while ((c = xxgetc()) != R_EOF && c != quote) {
 	CTEXT_PUSH(c);
+	if (ignore_next_blank) {
+	    if (c == ' ' || c == '\t')
+	        continue;
+	    else
+	        ignore_next_blank = FALSE;
+	} 
 	if (c == '\n') {
 	    xxungetc(c); CTEXT_POP();
 	    /* Fix suggested by Mark Bravington to allow multiline strings
@@ -4657,6 +4664,7 @@
 	     * return ERROR;
 	     */
 	    c = '\\';
+	    backslash_was_newline = TRUE;
 	}
 	if (c == '\\') {
 	    c = xxgetc(); CTEXT_PUSH(c);
@@ -4815,8 +4823,14 @@
 		case '\'':
 		case '`':
 		case ' ':
+		    break;
 		case '\n':
-		    break;
+		    if (backslash_was_newline) {
+		        backslash_was_newline = FALSE;
+		        break;
+		    }
+		    ignore_next_blank = TRUE;
+		    continue;
 		default:
 		    *ct = '\0';
 		    errorcall(R_NilValue, _("'\\%c' is an unrecognized escape in character string starting \"%s\""), c, currtext);
Index: src/main/gram.y
===================================================================
--- src/main/gram.y	(Revision 72789)
+++ src/main/gram.y	(Arbeitskopie)
@@ -2308,10 +2308,17 @@
     int wcnt = 0;
     ucs_t wcs[10001];
     Rboolean oct_or_hex = FALSE, use_wcs = FALSE, currtext_truncated = FALSE;
+    Rboolean backslash_was_newline = FALSE, ignore_next_blank = FALSE;
 
     CTEXT_PUSH(c);
     while ((c = xxgetc()) != R_EOF && c != quote) {
 	CTEXT_PUSH(c);
+	if (ignore_next_blank) {
+	    if (c == ' ' || c == '\t')
+	        continue;
+	    else
+	        ignore_next_blank = FALSE;
+	} 
 	if (c == '\n') {
 	    xxungetc(c); CTEXT_POP();
 	    /* Fix suggested by Mark Bravington to allow multiline strings
@@ -2319,6 +2326,7 @@
 	     * return ERROR;
 	     */
 	    c = '\\';
+	    backslash_was_newline = TRUE;
 	}
 	if (c == '\\') {
 	    c = xxgetc(); CTEXT_PUSH(c);
@@ -2477,8 +2485,14 @@
 		case '\'':
 		case '`':
 		case ' ':
+		    break;
 		case '\n':
-		    break;
+		    if (backslash_was_newline) {
+		        backslash_was_newline = FALSE;
+		        break;
+		    }
+		    ignore_next_blank = TRUE;
+		    continue;
 		default:
 		    *ct = '\0';
 		    errorcall(R_NilValue, _("'\\%c' is an unrecognized escape in character string starting \"%s\""), c, currtext);
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to