Re: [Rd] \>

avi.e.gross Sat, 29 Jun 2024 20:09:44 -0700

I suggest there is actually quite a lot to know about piping, albeit you can 
use it fine while knowing little.

For those who can happily write complex lines of code containing nested 
function calls and never have to explain it to anyone, feel free. I can do that 
and sometimes months later I only figure out what I did in ten minutes and then 
check to see if I got it right!

But for people who are used to features vaguely similar in other languages, 
pipes are a great way to visualize data and process flow as they show a sort of 
sequence.

No, they are not at all the same as a UNIX pipe but that is not a bad model as 
it lets you write shell scripts that do one conceptual step at a time and pass 
along data to the input of another program that processes it further and passes 
it along until you reach some goal.

Many languages, such as ones using variations on Object Oriented, have a sort 
of pipeline that can look like:

a.method_a(args).method_b(args)

And in some languages, that can be spread across multiple lines to look a bit 
more like a pipeline. This too is an inexact analogy as what really happens is 
that the underlying object can return perhaps another object when you call a 
method and then you can call a method in that object and so on. This can make 
it limited in some ways or quite powerful.

The many versions that have been created of an R pipe can be variations on many 
themes. As an example, you could take the multiple lines in a pipeline and 
rearrange them to look like the nested code with function calls as arguments in 
other functions and then evaluate it. It would, in effect, be a sort of 
syntactic sugar that makes it easier for SOME programmers.

But the topic now shifts to debugging and indeed, the underlying implementation 
of a pipeline can impact on one debugs.

The simplest case is trivial to debug. No visible pipes:

Temp1 <- f1(x, args)
Temp2 <- f2(Temp1,  args)
Result <- f3(Temp2, args)
rm(Temp1, Temp2)

So one form of piping does something like this under the table:

For code like:
X PIPED f1(args) PIPED f2(args) -> Result

It simply does something like this:

. <- x
. <- f1(., args)
.  <- f2(.,  args)
Result <- f3(., args)

The variable "." just gets re-used repeatedly. But as this code swap is done 
outside normal view, can a debugger follow it? And "." keeps changing. As a 
nice feature, some implementations may actually check and if you place "." as 
an argument past the beginning as in f3(args, ., more_args) allow you to pipe 
in not just to the first argument for the many functions that may want the data 
second or third or ...

There are other implementations possible that allow syntactic sugar without 
necessarily being run as shown. I am not sure how the native pipe that was 
added is implemented but it seems quite a bit faster than many other 
implementations and has some quirks such as requiring all functions to include 
parentheses, even if empty like piping to head(), and the way to do some things 
using anonymous functions is a tad annoying.

I think the focus for many people is the HUMAN who is programming and sees a 
logical way to describe what they want without much ambiguity. Of course, if 
you want to keep playing with your code, don't use pipes except perhaps when it 
is pretty much done.

An analogy to consider is another variant of piping used by ggplot where "+" is 
overloaded and:

ggplot(args) +
  geom_point(args) +
  geom_line(args) +
  xlab(args) +
  theme_bw() +
  coord_flip() +
  ...

Is a common way of writing a fairly complex set of operations. But what is 
being piped there is a growing object that each step modifies and an the end, 
the object is rendered into a graph based on whatever complex contents it 
contains. And, yes, that can be painful to debug and a simple option is:

P <- ggplot(args)
P <- P + geom_point(args)
P <- P + geom_line(args)
...
print(P)

Being able to declare incremental changes and layers to a graph this way is 
more intuitive to some. Not using a pipelined approach allows you to comment 
out parts easily, such as not making it black/white sometimes, albeit you can 
as easily comment out the other version.

What some people need to understand is that adding pipes of any of the 
varieties has never taken away to write the code in other ways. It is not in 
any way required. And for some people, it aligns better with how they can 
reason. Yet, if you need lots of debugging in your programs, writing them 
differently may be a better idea, at least until it is debugged.

I have written code for my clients with quite elegant pipelines as well as 
functions like the dplyr mutate() that allow me to do many things in one 
function call, and formatted it beautifully with varying levels of indentation 
so you can see at a glance where things line up. Parts of the code are nested 
function calls and when it all leads to a ggplot structure like above, it can 
be a tad hard for many people to appreciate what it is doing. But then, I get 
some requests to change things, add or subtract features, allow some parts to 
be commented/documented close to where the code does things, or allow 
parameters to be set next to where they are called. What I sometimes do is go 
back to the linear style of code above where each new section does mostly one 
thing with a comment before it and a setting of changeable parameters like 
colors that the customer can tune. The code can get much longer but can be 
absorbed step by step, and unless we remove variables no longer needed, can 
have so
 me performance issues if it is processing lots of data! LOL!

There is plenty more to know, but unless you have to read other people's code 
and modify it, it may be optional.

-----Original Message-----
From: R-devel <r-devel-boun...@r-project.org> On Behalf Of Spencer Graves
Sent: Saturday, June 29, 2024 6:57 PM
To: Duncan Murdoch <murdoch.dun...@gmail.com>; Rui Barradas 
<ruipbarra...@sapo.pt>; r-devel <r-devel@r-project.org>
Subject: Re: [Rd] \>

Hi, Duncan:

On 6/29/24 17:24, Duncan Murdoch wrote:
> 
>>       Yes. I'm not yet facile with "|>", but I'm learning.
>>
>>
>>       Spencer Graves
> 
> There's very little to know.  This:
> 
>       x |> f() |> g()
> 
> is just a different way of writing
> 
>      g(f(x))
> 
> If f() or g() have extra arguments, just add them afterwards:
> 
>      x |> f(a = 1) |> g(b = 2)
> 
> is just
> 
>      g(f(x, a = 1), b = 2)

          Agreed. If I understand correctly, the supporters of the former think 
it's easier to highlight and execute a subset of the earlier character 
string, e.g., "x |> f(a = 1)" than the corresponding subset of the 
latter, "f(x, a = 1)". I remain unconvinced.

          For debugging, I prefer the following:

          fx1 <- f(x, a = 1)
          g(fx1, b=2)

          Yes, "fx1" occupies storage space that the other two do not. Ir you 
are writing code for an 8086, the difference in important. However, for 
my work, ease of debugging is important, which is why I prefer, "fx1 <- 
f(x, a = 1); g(fx1, b=2)".

          Thanks, again, for the reply.
          Spencer Graves

> 
> This isn't quite true of the magrittr pipe, but it is exactly true of 
> the base pipe.
> 
> Duncan Murdoch
>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] \>

Reply via email to