Re: [Rd] the pipe |> and line breaks in pipelines

Ben Bolker Wed, 09 Dec 2020 12:58:28 -0800

Definitely support the idea that if this kind of trickery is going tohappen that it be confined to some particular IDE/environment or someparticular submission protocol. I don't want it to happen in my ESSsession please ... I'd rather deal with the parentheses.


On 12/9/20 3:45 PM, Timothy Goodman wrote:

Regarding special treatment for |>, isn't it getting special treatment
anyway, because it's implemented as a syntax transformation from x |> f(y)
to f(x, y), rather than as an operator?


That said, the point about wanting a block of code submitted line-by-line
to work the same as a block of code submitted all at once is a fair one.
Maybe the better solution would be if there were a way to say "Submit the
selected code as a single expression, ignoring line-breaks".  Then I could
run any number of lines with pipes at the start and no special character at
the end, and have it treated as a single pipeline.  I suppose that'd need
to be a feature offered by the environment (RStudio's RNotebooks in my
case).  I could wrap my pipelines in parentheses (to make the "pipes at
start of line" syntax valid R code), and then could use the hypothetical
"submit selected code ignoring line-breaks" feature when running just the
first part of the pipeline -- i.e., selecting full lines, but starting
after the opening paren so as not to need to insert a closing paren.

- Tim

On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.dun...@gmail.com>
wrote:

On 09/12/2020 2:33 p.m., Timothy Goodman wrote:

If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
command in the Notebook environment I'm using) I certainly *would*
expect R to treat it as a complete statement.

But what I'm talking about is a different case, where I highlight a
multi-line statement in my notebook:

      my_data_frame1
          |> filter(some_conditions_1)

and then press Ctrl+Enter.


I don't think I'd like it if parsing changed between passing one line at
a time and passing a block of lines.  I'd like to be able to highlight a
few lines and pass those, then type one, then highlight some more and
pass those:  and have it act as though I just passed the whole combined
block, or typed everything one line at a time.


    Or, I suppose the equivalent would be to run

an R script containing those two lines of code, or to run a multi-line
statement like that from the console (which in RStudio I can do by
pressing Shift+Enter between the lines.)

In those cases, R could either (1) Give an error message [the current
behavior], or (2) understand that the first line is meant to be piped to
the second.  The second option would be significantly more useful, and
is almost certainly what the user intended.

(For what it's worth, there are some languages, such as Javascript, that
consider the first token of the next line when determining if the
previous line was complete.  JavaScript's rules around this are overly
complicated, but a rule like "a pipe following a line break is treated
as continuing the previous line" would be much simpler.  And while it
might be objectionable to treat the operator %>% different from other
operators, the addition of |>, which isn't truly an operator at all,
seems like the right time to consider it.)


I think this would be hard to implement with R's current parser, but
possible.  I think it could be done by distinguishing between EOL
markers within a block of text and "end of block" marks.  If it applied
only to the |> operator it would be *really* ugly.

My strongest objection to it is the one at the top, though.  If I have a
block of lines sitting in my editor that I just finished executing, with
the cursor pointing at the next line, I'd like to know that it didn't
matter whether the lines were passed one at a time, as a block, or some
combination of those.

Duncan Murdoch


-Tim

On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>> wrote:

     The requirement for operators at the end of the line comes from the
     interactive nature of R.  If you type

           my_data_frame_1

     how could R know that you are not done, and are planning to type the
     rest of the expression

             %>% filter(some_conditions_1)
             ...

     before it should consider the expression complete?  The way languages
     like C do this is by requiring a statement terminator at the end.

You

     can also do it by wrapping the entire thing in parentheses ().

     However, be careful: Don't use braces:  they don't work.  And parens
     have the side effect of removing invisibility from the result (which

is

     a design flaw or bonus, depending on your point of view).  So I
     actually
     wouldn't advise this workaround.

     Duncan Murdoch


     On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
      > Hi,
      >
      > I'm a data scientist who routinely uses R in my day-to-day work,
     for tasks
      > such as cleaning and transforming data, exploratory data
     analysis, etc.
      > This includes frequent use of the pipe operator from the magrittr
     and dplyr
      > libraries, %>%.  So, I was pleased to hear about the recent work

on a

      > native pipe operator, |>.
      >
      > This seems like a good time to bring up the main pain point I
     encounter
      > when using pipes in R, and some suggestions on what could be done
     about
      > it.  The issue is that the pipe operator can't be placed at the
     start of a
      > line of code (except in parentheses).  That's no different than
     any binary
      > operator in R, but I find it's a source of difficulty for the
     pipe because
      > of how pipes are often used.
      >
      > [I'm assuming here that my usage is fairly typical of a lot of
     users; at
      > any rate, I don't think I'm *too* unusual.]
      >
      > === Why this is a problem ===
      >
      > It's very common (for me, and I suspect for many users of dplyr)
     to write
      > multi-step pipelines and put each step on its own line for
     readability.
      > Something like this:
      >
      >    ### Example 1 ###
      >    my_data_frame_1 %>%
      >      filter(some_conditions_1) %>%
      >      inner_join(my_data_frame_2, by = some_columns_1) %>%
      >      group_by(some_columns_2) %>%
      >      summarize(some_aggregate_functions_1) %>%
      >      filter(some_conditions_2) %>%
      >      left_join(my_data_frame_3, by = some_columns_3) %>%
      >      group_by(some_columns_4) %>%
      >      summarize(some_aggregate_functions_2) %>%
      >      arrange(some_columns_5)
      >
      > [I guess some might consider this an overly long pipeline; for me
     it's
      > pretty typical.  I *could* split it up by assigning intermediate
     results to
      > variables, but much of the value I get from the pipe is that it
     lets my
      > code communicate which results are temporary, and which will be
     used again
      > later.  Assigning variables for single-use results would remove

that

      > expressiveness.]
      >
      > I would prefer (for reasons I'll explain) to be able to write the
     above
      > example like this, which isn't valid R:
      >
      >    ### Example 2 (not valid R) ###
      >    my_data_frame_1
      >      %>% filter(some_conditions_1)
      >      %>% inner_join(my_data_frame_2, by = some_columns_1)
      >      %>% group_by(some_columns_2)
      >      %>% summarize(some_aggregate_functions_1)
      >      %>% filter(some_conditions_2)
      >      %>% left_join(my_data_frame_3, by = some_columns_3)
      >      %>% group_by(some_columns_4)
      >      %>% summarize(some_aggregate_functions_2)
      >      %>% arrange(some_columns_5)
      >
      > One (minor) advantage is obvious: It lets you easily line up the
     pipes,
      > which means that you can see at a glance that the whole block is
     a single
      > pipeline, and you'd immediately notice if you inadvertently
     omitted a pipe,
      > which otherwise can lead to confusing output.  [It's also
     aesthetically
      > pleasing, especially when %>% is replaced with |>, but that's
     subjective.]
      >
      > But the bigger issue happens when I want to re-run just *part* of

the

      > pipeline.  I do this often when debugging: if the output of the
     pipeline
      > seems wrong, I re-run the first few steps and check the output,

then

      > include a little more and re-run again, etc., until I locate my
     mistake.
      > Working in an interactive notebook environment, this involves
     using the
      > cursor to select just the part of the code I want to re-run.
      >
      > It's fast and easy to select *entire* lines of code, but
     unfortunately with
      > the pipes placed at the end of the line I must instead select
     everything
      > *except* the last three characters of the line (the last two
     characters for
      > the new pipe).  Then when I want to re-run the same partial
     pipeline with
      > the next line of code included, I can't just press SHIFT+Down to
     select it
      > as I otherwise would, but instead must move the cursor
     horizontally to a
      > position three characters before the end of *that* line (which is
     generally
      > different due to varying line lengths).  And so forth each time I
     want to
      > include an additional line.
      >
      > Moreover, with the staggered positions of the pipes at the end of
     each
      > line, it's very easy to accidentally select the final pipe on a
     line, and
      > then sit there for a moment wondering if the environment has

stopped

      > responding before realizing it's just waiting for further input
     (i.e., for
      > the right-hand side).  These small delays and disruptions add up
     over the
      > course of a day.
      >
      > This desire to select and re-run the first part of a pipeline is
     also the
      > reason why it doesn't suffice to achieve syntax like my "Example
     2" by
      > wrapping the entire pipeline in parentheses.  That's of no use if
     I want to
      > re-run a selection that doesn't include the final close-paren.
      >
      > === Possible Solutions ===
      >
      > I can think of two, but maybe there are others.  The first would

make

      > "Example 2" into valid code, and the second would allow you to

run a

      > selection that included a trailing pipe.
      >
      >    Solution 1: Add a special case to how R is parsed, so if the

first

      > (non-whitespace) token after an end-line is a pipe, that pipe
     gets moved to
      > before the end-line.
      >      - Argument for: This lets you write code like example 2,

which

      > addresses the pain point around re-running part of a pipeline,
     and has
      > advantages for readability.  Also, since starting a line with a

pipe

      > operator is currently invalid, the change wouldn't break any
     working code.
      >      - Argument against: It would make the behavior of %>%
     inconsistent with
      > that of other binary operators in R.  (However, this objection
     might not
      > apply to the new pipe, |>, which I understand is being
     implemented as a
      > syntax transformation rather than a binary operator.)
      >
      >    Solution 2: Ignore the pipe operator if it occurs as the final
     token of
      > the code being executed.
      >      - Argument for: This would mean the user could select and
     re-run the
      > first few lines of a longer pipeline (selecting *entire* lines),
     avoiding
      > the difficulties described above.
      >      - Argument against: This means that %>% would be valid even
     if it
      > occurred without a right-hand side, which is inconsistent with

other

      > operators in R.  (But, as above, this objection might not apply
     to |>.)
      > Also, this solution still doesn't enable the syntax of "Example
     2", with
      > its readability benefit.
      >
      > Thanks for reading this and considering it.
      >
      > - Tim Goodman
      >
      >       [[alternative HTML version deleted]]
      >
      > ______________________________________________
      > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
      > https://stat.ethz.ch/mailman/listinfo/r-devel
     <https://stat.ethz.ch/mailman/listinfo/r-devel>
      >


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] the pipe |> and line breaks in pipelines

Reply via email to