Hi,
I would really like to have a way to split long string literals across
multiple lines in R.
Currently, if a string literal spans multiple lines, there is no way to
inhibit the introduction of newline characters:
> "aaa
+ bbb"
[1] "aaa\nbbb"
If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb"
[1] "aaa\nbbb"
We could use this fact to implement string splitting in a fairly
backward-compatible way, since currently such trailing backslashes
should hardly be used as they do not have any effect. The attached patch
makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb"
[1] "aaabbb"
I personally would also prefer if leading blanks (spaces and tabs) in
the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb"
[1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
Index: src/main/gram.c
===================================================================
--- src/main/gram.c (Revision 72789)
+++ src/main/gram.c (Arbeitskopie)
@@ -4646,10 +4646,17 @@
int wcnt = 0;
ucs_t wcs[10001];
Rboolean oct_or_hex = FALSE, use_wcs = FALSE, currtext_truncated = FALSE;
+ Rboolean backslash_was_newline = FALSE, ignore_next_blank = FALSE;
CTEXT_PUSH(c);
while ((c = xxgetc()) != R_EOF && c != quote) {
CTEXT_PUSH(c);
+ if (ignore_next_blank) {
+ if (c == ' ' || c == '\t')
+ continue;
+ else
+ ignore_next_blank = FALSE;
+ }
if (c == '\n') {
xxungetc(c); CTEXT_POP();
/* Fix suggested by Mark Bravington to allow multiline strings
@@ -4657,6 +4664,7 @@
* return ERROR;
*/
c = '\\';
+ backslash_was_newline = TRUE;
}
if (c == '\\') {
c = xxgetc(); CTEXT_PUSH(c);
@@ -4815,8 +4823,14 @@
case '\'':
case '`':
case ' ':
+ break;
case '\n':
- break;
+ if (backslash_was_newline) {
+ backslash_was_newline = FALSE;
+ break;
+ }
+ ignore_next_blank = TRUE;
+ continue;
default:
*ct = '\0';
errorcall(R_NilValue, _("'\\%c' is an unrecognized escape in character string starting \"%s\""), c, currtext);
Index: src/main/gram.y
===================================================================
--- src/main/gram.y (Revision 72789)
+++ src/main/gram.y (Arbeitskopie)
@@ -2308,10 +2308,17 @@
int wcnt = 0;
ucs_t wcs[10001];
Rboolean oct_or_hex = FALSE, use_wcs = FALSE, currtext_truncated = FALSE;
+ Rboolean backslash_was_newline = FALSE, ignore_next_blank = FALSE;
CTEXT_PUSH(c);
while ((c = xxgetc()) != R_EOF && c != quote) {
CTEXT_PUSH(c);
+ if (ignore_next_blank) {
+ if (c == ' ' || c == '\t')
+ continue;
+ else
+ ignore_next_blank = FALSE;
+ }
if (c == '\n') {
xxungetc(c); CTEXT_POP();
/* Fix suggested by Mark Bravington to allow multiline strings
@@ -2319,6 +2326,7 @@
* return ERROR;
*/
c = '\\';
+ backslash_was_newline = TRUE;
}
if (c == '\\') {
c = xxgetc(); CTEXT_PUSH(c);
@@ -2477,8 +2485,14 @@
case '\'':
case '`':
case ' ':
+ break;
case '\n':
- break;
+ if (backslash_was_newline) {
+ backslash_was_newline = FALSE;
+ break;
+ }
+ ignore_next_blank = TRUE;
+ continue;
default:
*ct = '\0';
errorcall(R_NilValue, _("'\\%c' is an unrecognized escape in character string starting \"%s\""), c, currtext);
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel