Gabe,

It's the current behavior of paste() that is a major source of bugs:

  ## Add "rs" prefix to SNP ids and collapse them in a
  ## comma-separated string.
  collapse_snp_ids <- function(snp_ids)
      paste("rs", snp_ids, sep="", collapse=",")

  snp_groups <- list(
    group1=c(55, 22, 200),
    group2=integer(0),
    group3=c(99, 550)
  )

  vapply(snp_groups, collapse_snp_ids, character(1))
  #            group1            group2            group3
  # "rs55,rs22,rs200"              "rs"      "rs99,rs550"

This has hit me so many times!

Now with 'collapse0=TRUE', we finally have the opportunity to make it do the right thing. Let's not miss that opportunity.

Cheers,
H.


On 5/22/20 11:26, Gabriel Becker wrote:
I understand that this is consistent but it also strikes me as an enormous 'gotcha' of a magnitude that 'we' are trying to avoid/smooth over at this point in user-facing R space.

For the record I'm not suggesting it should return something other than "", and in particular I'm not arguing that any call to paste /that does not return an error/ with non-NULL collapse should return a character vector of length one.

Rather I'm pointing out that it could (perhaps should, imo) simply be an error, which is also consistent, in the strict sense, with previous behavior in that it is the developer simply declining to extend the recycle0 argument to the full parameter space (there is no rule that says we must do so, arguments whose use is incompatible with other arguments can be reasonable and called for).

I don't feel feel super strongly that reeturning "" in this and similar cases horrible and should never happen, but i'd bet dollars to donuts that to the extent that behavior occurs it will be a disproportionately major source of bugs, and i think thats at least worth considering in addition to pure consistency.

~G

On Fri, May 22, 2020 at 9:50 AM William Dunlap <wdun...@tibco.com <mailto:wdun...@tibco.com>> wrote:

    I agree with Herve, processing collapse happens last so
    collapse=non-NULL always leads to a single character string being
    returned, the same as paste(collapse="").  See the altPaste function
    I posted yesterday.

    Bill Dunlap
    TIBCO Software
    wdunlap tibco.com
    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=>


    On Fri, May 22, 2020 at 9:12 AM Hervé Pagès <hpa...@fredhutch.org
    <mailto:hpa...@fredhutch.org>> wrote:

        I think that

             paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ", collapse
        = ",",
        recycle0=TRUE)

        should just return an empty string and don't see why it needs to
        emit a
        warning or raise an error. To me it does exactly what the user
        is asking
        for, which is to change how the 3 arguments are recycled
        **before** the
        'sep' operation.

        The 'recycle0' argument has no business in the 'collapse' operation
        (which comes after the 'sep' operation): this operation still
        behaves
        like it always had.

        That's all there is to it.

        H.


        On 5/22/20 03:00, Gabriel Becker wrote:
         > Hi Martin et al,
         >
         >
         >
         > On Thu, May 21, 2020 at 9:42 AM Martin Maechler
         > <maech...@stat.math.ethz.ch
        <mailto:maech...@stat.math.ethz.ch>
        <mailto:maech...@stat.math.ethz.ch
        <mailto:maech...@stat.math.ethz.ch>>> wrote:
         >
         >      >>>>> Hervé Pagès
         >      >>>>>     on Fri, 15 May 2020 13:44:28 -0700 writes:
         >
         >          > There is still the situation where **both** 'sep' and
         >     'collapse' are
         >          > specified:
         >
         >          >> paste(integer(0), "nth", sep="", collapse=",")
         >          > [1] "nth"
         >
         >          > In that case 'recycle0' should **not** be ignored i.e.
         >
         >          > paste(integer(0), "nth", sep="", collapse=",",
        recycle0=TRUE)
         >
         >          > should return the empty string (and not
        character(0) like it
         >     does at the
         >          > moment).
         >
         >          > In other words, 'recycle0' should only control the
        first
         >     operation (the
         >          > operation controlled by 'sep'). Which makes plenty
        of sense:
         >     the 1st
         >          > operation is binary (or n-ary) while the collapse
        operation
         >     is unary.
         >          > There is no concept of recycling in the context of
        unary
         >     operations.
         >
         >     Interesting, ..., and sounding somewhat convincing.
         >
         >          > On 5/15/20 11:25, Gabriel Becker wrote:
         >          >> Hi all,
         >          >>
         >          >> This makes sense to me, but I would think that
        recycle0 and
         >     collapse
         >          >> should actually be incompatible and paste should
        throw an
         >     error if
         >          >> recycle0 were TRUE and collapse were declared in
        the same
         >     call. I don't
         >          >> think the value of recycle0 should be silently
        ignored if it
         >     is actively
         >          >> specified.
         >          >>
         >          >> ~G
         >
         >     Just to summarize what I think we should know and agree
        (or be
         >     be "disproven") and where this comes from ...
         >
         >     1) recycle0 is a new R 4.0.0 option in paste() / paste0()
        which by
         >     default
         >         (recycle0 = FALSE) should (and *does* AFAIK) not
        change anything,
         >         hence  paste() / paste0() behave completely
        back-compatible
         >         if recycle0 is kept to FALSE.
         >
         >     2) recycle0 = TRUE is meant to give different behavior,
        notably
         >         0-length arguments (among '...') should result in
        0-length results.
         >
         >         The above does not specify what this means in detail,
        see 3)
         >
         >     3) The current R 4.0.0 implementation (for which I'm
        primarily
         >     responsible)
         >         and help(paste)  are in accordance.
         >         Notably the help page (Arguments -> 'recycle0' ;
        Details 1st
         >     para ; Examples)
         >         says and shows how the 4.0.0 implementation has been
        meant to work.
         >
         >     4) Several provenly smart members of the R community
        argue that
         >         both the implementation and the documentation of
        'recycle0 =
         >         TRUE'  should be changed to be more logical /
        coherent / sensical ..
         >
         >     Is the above all correct in your view?
         >
         >     Assuming yes,  I read basically two proposals, both agreeing
         >     that  recycle0 = TRUE  should only ever apply to the
        action of 'sep'
         >     but not the action of 'collapse'.
         >
         >     1) Bill and Hervé (I think) propose that 'recycle0'
        should have
         >         no effect whenever  'collapse = <string>'
         >
         >     2) Gabe proposes that 'collapse = <string>' and 'recycle0
        = TRUE'
         >         should be declared incompatible and error. If going
        in that
         >         direction, I could also see them to give a warning (and
         >         continue as if recycle = FALSE).
         >
         >
         > Herve makes a good point about when sep and collapse are both
        set. That
         > said, if the user explicitly sets recycle0, Personally, I
        don't think it
         > should be silently ignored under any configuration of other
        arguments.
         >
         > If all of the arguments are to go into effect, the question
        then becomes
         > one of ordering, I think.
         >
         > Consider
         >
         >     paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
        collapse = ",",
         >     recycle0=TRUE)
         >
         > Currently that returns character(0), becuase the logic is
         > essenttially (in pseudo-code)
         >
         >     collapse(paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
         >     recycle0=TRUE), collapse = ", ", recycle0=TRUE)
         >
         >       -> collapse(character(0), collapse = ", " recycle0=TRUE)
         >
         >     -> character(0)
         >
         > Now Bill Dunlap argued, fairly convincingly I think, that
        paste(...,
         > collapse=<string>) should /always/ return a character vector
        of length
         > exactly one. With recycle0, though,  it will return "" via
        the progression
         >
         >     paste(c("a", "b"), NULL, c("c",  "d"),  sep = " ",
        collapse = ",",
         >     recycle0=TRUE)
         >
         >       -> collapse(character(0), collapse = ", ")
         >
         >     -> ""
         >
         >
         > because recycle0 is still applied to the sep-based operation
        which
         > occurs before collapse, thus leaving a vector of length 0 to
        collapse.
         >
         > That is consistent but seems unlikely to be what the user
        wanted, imho.
         > I think if it does this there should be at least a warning
        when paste
         > collapses to "" this way, if it is allowed at all (ie if mixing
         > collapse=<string>and recycle0=TRUEis not simply made an error).
         >
         > I would like to hear others' thoughts as well though. @Pages,
        Herve
         > <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>
        @William Dunlap
         > <mailto:wdun...@tibco.com <mailto:wdun...@tibco.com>> is ""
        what you envision as thee desired and
         > useful behavior there?
         >
         > Best,
         > ~G
         >
         >
         >
         >     I have not yet my mind up but would tend to agree to "you
        guys",
         >     but I think that other R Core members should chime in, too.
         >
         >     Martin
         >
         >          >> On Fri, May 15, 2020 at 11:05 AM Hervé Pagès
         >     <hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
        <mailto:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>>
         >          >> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>>>
         >     wrote:
         >          >>
         >          >> Totally agree with that.
         >          >>
         >          >> H.
         >          >>
         >          >> On 5/15/20 10:34, William Dunlap via R-devel wrote:
         >          >> > I agree: paste(collapse="something", ...)
        should always
         >     return a
         >          >> single
         >          >> > character string, regardless of the value of
        recycle0.
         >     This would be
         >          >> > similar to when there are no non-NULL arguments
        to paste;
         >          >> collapse="."
         >          >> > gives a single empty string and collapse=NULL
        gives a zero
         >     long
         >          >> character
         >          >> > vector.
         >          >> >> paste()
         >          >> > character(0)
         >          >> >> paste(collapse=", ")
         >          >> > [1] ""
         >          >> >
         >          >> > Bill Dunlap
         >          >> > TIBCO Software
         >          >> > wdunlap tibco.com
        
<https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=7ZT1IjmexPqsDBhrV3NspPTr8M8XiMweEwJWErgAlqw&e=>
>  <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=o9ozvxBK-kVvAUFro7U1RrI5w0U8EPb0uyjQwMvOpt8&e=>
         >          >>
>  <https://urldefense.proofpoint.com/v2/url?u=http-3A__tibco.com&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=rXIwWqf4U4HZS_bjUT3KfA9ARaV5YTb_kEcXWHnkt-c&e=>
         >          >> >
         >          >> >
         >          >> > On Thu, Apr 30, 2020 at 9:56 PM
        suharto_anggono--- via
         >     R-devel <
         >          >> > r-devel@r-project.org
        <mailto:r-devel@r-project.org> <mailto:r-devel@r-project.org
        <mailto:r-devel@r-project.org>>
         >     <mailto:r-devel@r-project.org
        <mailto:r-devel@r-project.org> <mailto:r-devel@r-project.org
        <mailto:r-devel@r-project.org>>>> wrote:
         >          >> >
         >          >> >> Without 'collapse', 'paste' pastes
        (concatenates) its
         >     arguments
         >          >> >> elementwise (separated by 'sep', " " by
        default). New in
         >     R devel
         >          >> and R
         >          >> >> patched, specifying recycle0 = FALSE makes mixing
         >     zero-length and
         >          >> >> nonzero-length arguments results in length
        zero. The
         >     result of
         >          >> paste(n,
         >          >> >> "th", sep = "", recycle0 = FALSE) always have
        the same
         >     length as
         >          >> 'n'.
         >          >> >> Previously, the result is still as long as the
        longest
         >     argument,
         >          >> with the
         >          >> >> zero-length argument like "". If all og the
        arguments have
         >          >> length zero,
         >          >> >> 'recycle0' doesn't matter.
         >          >> >>
         >          >> >> As far as I understand, 'paste' with
        'collapse' as a
         >     character
         >          >> string is
         >          >> >> supposed to put together elements of a vector
        into a single
         >          >> character
         >          >> >> string. I think 'recycle0' shouldn't change it.
         >          >> >>
         >          >> >> In current R devel and R patched,
        paste(character(0),
         >     collapse = "",
         >          >> >> recycle0 = FALSE) is character(0). I think it
        should be
         >     "", like
         >          >> >> paste(character(0), collapse="").
         >          >> >>
         >          >> >> paste(c("4", "5"), "th", sep = "", collapse =
        ", ",
         >     recycle0 =
         >          >> FALSE)
         >          >> >> is
         >          >> >> "4th, 5th".
         >          >> >> paste(c("4"     ), "th", sep = "", collapse =
        ", ",
         >     recycle0 =
         >          >> FALSE)
         >          >> >> is
         >          >> >> "4th".
         >          >> >> I think
         >          >> >> paste(c(        ), "th", sep = "", collapse =
        ", ",
         >     recycle0 =
         >          >> FALSE)
         >          >> >> should be
         >          >> >> "",
         >          >> >> not character(0).
         >          >> >>
         >          >> >> ______________________________________________
         >          >> >> R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>
         >     <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>>
         >     mailing list
         >          >> >>
         >          >>
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
         >          >> >>
         >          >> >
         >          >> >       [[alternative HTML version deleted]]
         >          >> >
         >          >> > ______________________________________________
         >          >> > R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>
         >     <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>>
         >     mailing list
         >          >> >
         >          >>
         >
        
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=776IovW06eUHr1EDrabHLY7F47rU9CCUEItSDI96zc0&s=xN84DhkZeoxzn6SG0QTMpOGg2w_ThmjZmZymGUuD0Uw&e=
         >          >> >
         >          >>
         >          >> --
         >          >> Hervé Pagès
         >          >>
         >          >> Program in Computational Biology
         >          >> Division of Public Health Sciences
         >          >> Fred Hutchinson Cancer Research Center
         >          >> 1100 Fairview Ave. N, M1-B514
         >          >> P.O. Box 19024
         >          >> Seattle, WA 98109-1024
         >          >>
         >          >> E-mail: hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>
         >     <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>>
         >          >> Phone:  (206) 667-5791
         >          >> Fax:    (206) 667-1319
         >          >>
         >          >> ______________________________________________
         >          >> R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>
         >     <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>>>
         >     mailing list
         >          >> https://stat.ethz.ch/mailman/listinfo/r-devel
        
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=>
>  <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=>
         >          >>
>  <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=cC2qctlVXd0qHMPvCyYvuVMqR8GU3DjTTqKJ0zjIFj8&s=COnDeGgHNnHJlLLZOznMlhcaFU1nIRlkaSbssvlrMvw&e=>
         >          >>
         >
         >          > --
         >          > Hervé Pagès
         >
         >          > Program in Computational Biology
         >          > Division of Public Health Sciences
         >          > Fred Hutchinson Cancer Research Center
         >          > 1100 Fairview Ave. N, M1-B514
         >          > P.O. Box 19024
         >          > Seattle, WA 98109-1024
         >
         >          > E-mail: hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org> <mailto:hpa...@fredhutch.org
        <mailto:hpa...@fredhutch.org>>
         >          > Phone:  (206) 667-5791
         >          > Fax:    (206) 667-1319
         >
         >          > ______________________________________________
         >          > R-devel@r-project.org
        <mailto:R-devel@r-project.org> <mailto:R-devel@r-project.org
        <mailto:R-devel@r-project.org>> mailing list
         >          > https://stat.ethz.ch/mailman/listinfo/r-devel
        
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Z1o-HO3_OqxOR9LaRguGvnG7X4vF_z1_q13I7zmjcfY&s=CDOaP2RJnAyhpbHe6-O752uc4IPMugypbcgdYzhoF_8&e=>
>  <https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=q5ueyHReS5hHK6TZ0dJ1N7Ro8dx-rsLHys8GrCugOls&s=OLA7CqaU5uKeid1aGw41XJ_2Uq7JXbcwpPOrTWWG2v4&e=>
         >

-- Hervé Pagès

        Program in Computational Biology
        Division of Public Health Sciences
        Fred Hutchinson Cancer Research Center
        1100 Fairview Ave. N, M1-B514
        P.O. Box 19024
        Seattle, WA 98109-1024

        E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
        Phone:  (206) 667-5791
        Fax:    (206) 667-1319


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to