Many languages allow a final backslash (“\”) character to allow an expression to span multiple lines, and I’ve often wished for this in R, particularly to allow me to put `else` on a separate line at the top-level. It would also allow alignment of infix operators like the new pipe operator `|>` at the start of a line, which I would heartily endorse.
On Wed, Dec 9, 2020 at 3:58 PM Ben Bolker <bbol...@gmail.com> wrote: > Definitely support the idea that if this kind of trickery is going to > happen that it be confined to some particular IDE/environment or some > particular submission protocol. I don't want it to happen in my ESS > session please ... I'd rather deal with the parentheses. > > On 12/9/20 3:45 PM, Timothy Goodman wrote: > > Regarding special treatment for |>, isn't it getting special treatment > > anyway, because it's implemented as a syntax transformation from x |> > f(y) > > to f(x, y), rather than as an operator? > > > > That said, the point about wanting a block of code submitted line-by-line > > to work the same as a block of code submittedr d all at once is a fair > one. > > Maybe the better solution would be if there were a way to say "Submit the > > selected code as a single expression, ignoring line-breaks". Then I > could > > run any number of lines with pipes at the start and no special character > at > > the end, and have it treated as a single pipeline. I suppose that'd need > > to be a feature offered by the erred environment (RStudio's RNotebooks > in my > > case). I could wrap my pipelines in parentheses (to make the "pipes at > > start of line" syntax valid R code), and then could use the hypothetical > > "submit selected code ignoring line-breaks" feature when running just the > > first part of the pipeline -- i.e., selecting full lines, but starting > > after the opening paren so as not to need to insert a closing paren. > > > > - Tim > > > > On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.dun...@gmail.com > > > > wrote: > > > >> On 09/12/2020 2:33 p.m., Timothy Goodman wrote: > >>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the > >>> command in the Notebook environment I'm using) I certainly *would* > >>> expect R to treat it as a complete statement. > >>> > >>> But what I'm talking about is a different case, where I highlight a > >>> multi-line statement in my notebook: > >>> > >>> my_data_frame1 > >>> |> filter(some_conditions_1) > >>> > >>> and then press Ctrl+Enter. > >> > >> I don't think I'd like it if parsing changed between passing one line at > >> a time and passing a block of lines. I'd like to be able to highlight a > >> few lines and pass those, then type one, then highlight some more and > >> pass those: and have it act as though I just passed the whole combined > >> block, or typed everything one line at a time. > >> > >> > >> Or, I suppose the equivalent would be to run > >>> an R script containing those two lines of code, or to run a multi-line > >>> statement like that from the console (which in RStudio I can do by > >>> pressing Shift+Enter between the lines.) > >>> > >>> In those cases, R could either (1) Give an error message [the current > >>> behavior], or (2) understand that the first line is meant to be piped > to > >>> the second. The second option would be significantly more useful, and > >>> is almost certainly what the user intended. > >>> > >>> (For what it's worth, there are some languages, such as Javascript, > that > >>> consider the first token of the next line when determining if the > >>> previous line was complete. JavaScript's rules around this are overly > >>> complicated, but a rule like "a pipe following a line break is treated > >>> as continuing the previous line" would be much simpler. And while it > >>> might be objectionable to treat the operator %>% different from other > >>> operators, the addition of |>, which isn't truly an operator at all, > >>> seems like the right time to consider it.) > >> > >> I think this would be hard to implement with R's current parser, but > >> possible. I think it could be done by distinguishing between EOL > >> markers within a block of text and "end of block" marks. If it applied > >> only to the |> operator it would be *really* ugly. > >> > >> My strongest objection to it is the one at the top, though. If I have a > >> block of lines sitting in my editor that I just finished executing, with > >> the cursor pointing at the next line, I'd like to know that it didn't > >> matter whether the lines were passed one at a time, as a block, or some > >> combination of those. > >> > >> Duncan Murdoch > >> > >>> > >>> -Tim > >>> > >>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch < > murdoch.dun...@gmail.com > >>> <mailto:murdoch.dun...@gmail.com>> wrote: > >>> > >>> The requirement for operators at the end of the line comes from > the > >>> interactive nature of R. If you type > >>> > >>> my_data_frame_1 > >>> > >>> how could R know that you are not done, and are planning to type > the > >>> rest of the expression > >>> > >>> %>% filter(some_conditions_1) > >>> ... > >>> > >>> before it should consider the expression complete? The way > languages > >>> like C do this is by requiring a statement terminator at the end. > >> You > >>> can also do it by wrapping the entire thing in parentheses (). > >>> > >>> However, be careful: Don't use braces: they don't work. And > parens > >>> have the side effect of removing invisibility from the result > (which > >> is > >>> a design flaw or bonus, depending on your point of view). So I > >>> actually > >>> wouldn't advise this workaround. > >>> > >>> Duncan Murdoch > >>> > >>> > >>> On 09/12/2020 12:45 a.m., Timothy Goodman wrote: > >>> > Hi, > >>> > > >>> > I'm a data scientist who routinely uses R in my day-to-day > work, > >>> for tasks > >>> > such as cleaning and transforming data, exploratory data > >>> analysis, etc. > >>> > This includes frequent use of the pipe operator from the > magrittr > >>> and dplyr > >>> > libraries, %>%. So, I was pleased to hear about the recent > work > >> on a > >>> > native pipe operator, |>. > >>> > > >>> > This seems like a good time to bring up the main pain point I > >>> encounter > >>> > when using pipes in R, and some suggestions on what could be > done > >>> about > >>> > it. The issue is that the pipe operator can't be placed at the > >>> start of a > >>> > line of code (except in parentheses). That's no different than > >>> any binary > >>> > operator in R, but I find it's a source of difficulty for the > >>> pipe because > >>> > of how pipes are often used. > >>> > > >>> > [I'm assuming here that my usage is fairly typical of a lot of > >>> users; at > >>> > any rate, I don't think I'm *too* unusual.] > >>> > > >>> > === Why this is a problem === > >>> > > >>> > It's very common (for me, and I suspect for many users of > dplyr) > >>> to write > >>> > multi-step pipelines and put each step on its own line for > >>> readability. > >>> > Something like this: > >>> > > >>> > ### Example 1 ### > >>> > my_data_frame_1 %>% > >>> > filter(some_conditions_1) %>% > >>> > inner_join(my_data_frame_2, by = some_columns_1) %>% > >>> > group_by(some_columns_2) %>% > >>> > summarize(some_aggregate_functions_1) %>% > >>> > filter(some_conditions_2) %>% > >>> > left_join(my_data_frame_3, by = some_columns_3) %>% > >>> > group_by(some_columns_4) %>% > >>> > summarize(some_aggregate_functions_2) %>% > >>> > arrange(some_columns_5) > >>> > > >>> > [I guess some might consider this an overly long pipeline; for > me > >>> it's > >>> > pretty typical. I *could* split it up by assigning > intermediate > >>> results to > >>> > variables, but much of the value I get from the pipe is that it > >>> lets my > >>> > code communicate which results are temporary, and which will be > >>> used again > >>> > later. Assigning variables for single-use results would remove > >> that > >>> > expressiveness.] > >>> > > >>> > I would prefer (for reasons I'll explain) to be able to write > the > >>> above > >>> > example like this, which isn't valid R: > >>> > > >>> > ### Example 2 (not valid R) ### > >>> > my_data_frame_1 > >>> > %>% filter(some_conditions_1) > >>> > %>% inner_join(my_data_frame_2, by = some_columns_1) > >>> > %>% group_by(some_columns_2) > >>> > %>% summarize(some_aggregate_functions_1) > >>> > %>% filter(some_conditions_2) > >>> > %>% left_join(my_data_frame_3, by = some_columns_3) > >>> > %>% group_by(some_columns_4) > >>> > %>% summarize(some_aggregate_functions_2) > >>> > %>% arrange(some_columns_5) > >>> > > >>> > One (minor) advantage is obvious: It lets you easily line up > the > >>> pipes, > >>> > which means that you can see at a glance that the whole block > is > >>> a single > >>> > pipeline, and you'd immediately notice if you inadvertently > >>> omitted a pipe, > >>> > which otherwise can lead to confusing output. [It's also > >>> aesthetically > >>> > pleasing, especially when %>% is replaced with |>, but that's > >>> subjective.] > >>> > > >>> > But the bigger issue happens when I want to re-run just *part* > of > >> the > >>> > pipeline. I do this often when debugging: if the output of the > >>> pipeline > >>> > seems wrong, I re-run the first few steps and check the output, > >> then > >>> > include a little more and re-run again, etc., until I locate my > >>> mistake. > >>> > Working in an interactive notebook environment, this involves > >>> using the > >>> > cursor to select just the part of the code I want to re-run. > >>> > > >>> > It's fast and easy to select *entire* lines of code, but > >>> unfortunately with > >>> > the pipes placed at the end of the line I must instead select > >>> everything > >>> > *except* the last three characters of the line (the last two > >>> characters for > >>> > the new pipe). Then when I want to re-run the same partial > >>> pipeline with > >>> > the next line of code included, I can't just press SHIFT+Down > to > >>> select it > >>> > as I otherwise would, but instead must move the cursor > >>> horizontally to a > >>> > position three characters before the end of *that* line (which > is > >>> generally > >>> > different due to varying line lengths). And so forth each > time I > >>> want to > >>> > include an additional line. > >>> > > >>> > Moreover, with the staggered positions of the pipes at the end > of > >>> each > >>> > line, it's very easy to accidentally select the final pipe on a > >>> line, and > >>> > then sit there for a moment wondering if the environment has > >> stopped > >>> > responding before realizing it's just waiting for further input > >>> (i.e., for > >>> > the right-hand side). These small delays and disruptions add > up > >>> over the > >>> > course of a day. > >>> > > >>> > This desire to select and re-run the first part of a pipeline > is > >>> also the > >>> > reason why it doesn't suffice to achieve syntax like my > "Example > >>> 2" by > >>> > wrapping the entire pipeline in parentheses. That's of no use > if > >>> I want to > >>> > re-run a selection that doesn't include the final close-paren. > >>> > > >>> > === Possible Solutions === > >>> > > >>> > I can think of two, but maybe there are others. The first > would > >> make > >>> > "Example 2" into valid code, and the second would allow you to > >> run a > >>> > selection that included a trailing pipe. > >>> > > >>> > Solution 1: Add a special case to how R is parsed, so if the > >> first > >>> > (non-whitespace) token after an end-line is a pipe, that pipe > >>> gets moved to > >>> > before the end-line. > >>> > - Argument for: This lets you write code like example 2, > >> which > >>> > addresses the pain point around re-running part of a pipeline, > >>> and has > >>> > advantages for readability. Also, since starting a line with a > >> pipe > >>> > operator is currently invalid, the change wouldn't break any > >>> working code. > >>> > - Argument against: It would make the behavior of %>% > >>> inconsistent with > >>> > that of other binary operators in R. (However, this objection > >>> might not > >>> > apply to the new pipe, |>, which I understand is being > >>> implemented as a > >>> > syntax transformation rather than a binary operator.) > >>> > > >>> > Solution 2: Ignore the pipe operator if it occurs as the > final > >>> token of > >>> > the code being executed. > >>> > - Argument for: This would mean the user could select and > >>> re-run the > >>> > first few lines of a longer pipeline (selecting *entire* > lines), > >>> avoiding > >>> > the difficulties described above. > >>> > - Argument against: This means that %>% would be valid > even > >>> if it > >>> > occurred without a right-hand side, which is inconsistent with > >> other > >>> > operators in R. (But, as above, this objection might not apply > >>> to |>.) > >>> > Also, this solution still doesn't enable the syntax of "Example > >>> 2", with > >>> > its readability benefit. > >>> > > >>> > Thanks for reading this and considering it. > >>> > > >>> > - Tim Goodman > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > ______________________________________________ > >>> > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing > list > >>> > https://stat.ethz.ch/mailman/listinfo/r-devel > >>> <https://stat.ethz.ch/mailman/listinfo/r-devel> > >>> > > >>> > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- "Whereas true religion and good morals are the only solid foundations of public liberty and happiness . . . it is hereby earnestly recommended to the several States to take the most effectual measures for the encouragement thereof." Continental Congress, 1778 [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel