Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Jim Lemon
I suppose that it is far too late to offer such a suggestion, but it
seems to me that the problem is in some measure the mechanism of
inheritance.

First, the tibble (although the name is incomprehensible, why not
something like "data.blob") is superior to the bog standard R
data.frame.

This may not be a good metaphor, but consider the problem of including
tigers in the mixed martial arts competitions. Tigers are much better
than the average (or perhaps all) MMA fighters at damaging their
opponents. However, they change the whole game. All of the usual
techniques are out the window if one encounters a tiger.

Suppose the tibble (or data.blob) did not inherit from the data.frame,
but had a different path of inheritance. Like the evolutionary
development of Felidae and Hominoidea, it would branch way back around
Mammalia. Then it would not fool the referees into letting it into the
MMA competition. If one wanted to use the improved functionality, it
would not be necessary to consider whether this thing that said it was
a data frame had too much hair and retractable claws. I can't say
whether this is an effective suggestion or even a good one, but I
thought it was worthwhile making.

Jim

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread J C Nash
Duncan's observation is correct. The background work to the standards
I worked on was a big effort, and the content was a lot smaller than R,
though possibly similar in scope to dealing with the current question.
The "voting" was also very late in the process, after the proposals
were developed, discussed and written, so more a confirmation of a
decision than a vote to do some work.

On the other hand, I do think such effort has to be made from time to
time. On this particular matter I don't feel well-suited. However, the
collective body of material that is R is mostly a result of those of us
who are willing to put out the effort, particularly R-core members.

JN

On 2017-09-26 07:00 PM, Duncan Murdoch wrote:
> On 26/09/2017 4:52 PM, Jens Oehlschlägel wrote:
>>
>> On 26.09.2017 15:37, Hadley Wickham wrote:
>>> I decided to make [.tibble type-stable (i.e. always return a data
>>> frame) because this behaviour causes substantial problems in real data
>>> analysis code. I did it understanding that it would cause some package
>>> developers frustration, but I think it's better for a handful of
>>> package maintainers to be frustrated than hundreds of users creating
>>> dangerous code.g
>>>
>>> Hadley
>>>
>>
>> If that is right -- and I tend to believe it is right -- this change had
>> better been done in R core and not on package level. I think the root of
>> this evil is design inconsistencies of the language together with the
>> lack of removing these inconsistencies. The longer we hesitated, the
>> more packages such a change could break. The lack of addressing issues
>> in R core drives people to try to solve issues on package level. But now
>> we have two conflicting standards, i.e. a fork-within-the-language: Am I
>> a member of the tidyverse or not? Am I writing a package for the
>> tidyverse or for standard-R or for both. With a fork-of-the-language we
>> would at least have a majority vote for one of the two and only the
>> fitter would survive. But with a fork-within-the-language 'R' gets more
>> and more complex, and working with it more and more difficult. There is
>> not only the tidyverse, also the Rcppverse and I don't know how many
>> other verses. If there is no extinction of inconsistencies in R, not
>> sufficient evolution in R, but lots of evolution in Julia, evolution
>> will extinct R together with all its foobarverses in favor of Julia (or
>> Python). May be that's a good thing.
>>
>> I think tibble should respect drop=TRUE and respect the work of all
>> package authors who wrote defensive code and explicitly passed drop=
>> instead of relying on the (wrong) default. Again: better would be a
>> long-term clean-up roadmap of R itself and one simple standard called
>> 'data.frame'. Instead of forking or betting on any particular
>> foobarverse: why not have direct democratic votes about certain critical
>> features of such a long-term roadmap in such a big community?
> 
> 
> I think R Core would not be interested in a vote, because you'd be voting to 
> give them work to do, and that's really rude.
> 
> What would have a better chance of success would be for someone to write a 
> short article describing the proposal in
> detail, and listing all changes to CRAN and Bioconductor packages that would 
> be necessary to implement it.  That's a lot
> of work!  Do you have time to do it?
> 
> Duncan Murdoch
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Duncan Murdoch

On 26/09/2017 4:52 PM, Jens Oehlschlägel wrote:


On 26.09.2017 15:37, Hadley Wickham wrote:

I decided to make [.tibble type-stable (i.e. always return a data
frame) because this behaviour causes substantial problems in real data
analysis code. I did it understanding that it would cause some package
developers frustration, but I think it's better for a handful of
package maintainers to be frustrated than hundreds of users creating
dangerous code.g

Hadley



If that is right -- and I tend to believe it is right -- this change had
better been done in R core and not on package level. I think the root of
this evil is design inconsistencies of the language together with the
lack of removing these inconsistencies. The longer we hesitated, the
more packages such a change could break. The lack of addressing issues
in R core drives people to try to solve issues on package level. But now
we have two conflicting standards, i.e. a fork-within-the-language: Am I
a member of the tidyverse or not? Am I writing a package for the
tidyverse or for standard-R or for both. With a fork-of-the-language we
would at least have a majority vote for one of the two and only the
fitter would survive. But with a fork-within-the-language 'R' gets more
and more complex, and working with it more and more difficult. There is
not only the tidyverse, also the Rcppverse and I don't know how many
other verses. If there is no extinction of inconsistencies in R, not
sufficient evolution in R, but lots of evolution in Julia, evolution
will extinct R together with all its foobarverses in favor of Julia (or
Python). May be that's a good thing.

I think tibble should respect drop=TRUE and respect the work of all
package authors who wrote defensive code and explicitly passed drop=
instead of relying on the (wrong) default. Again: better would be a
long-term clean-up roadmap of R itself and one simple standard called
'data.frame'. Instead of forking or betting on any particular
foobarverse: why not have direct democratic votes about certain critical
features of such a long-term roadmap in such a big community?



I think R Core would not be interested in a vote, because you'd be 
voting to give them work to do, and that's really rude.


What would have a better chance of success would be for someone to write 
a short article describing the proposal in detail, and listing all 
changes to CRAN and Bioconductor packages that would be necessary to 
implement it.  That's a lot of work!  Do you have time to do it?


Duncan Murdoch

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Dirk Eddelbuettel

On 26 September 2017 at 22:52, Jens Oehlschlägel wrote:
| also the Rcppverse

Not really, in the context of this thread.

Rcpp does not impose or suggest a particular way of doing things at the R
level.  Rcpp, really, is mostly about making it a little easier to interface
with C/C++ level code from R (and again is not required or imposed and people
still write C accessing packages without).  And there really is no
"Rcppverse" though I (and at least some others) like the prefix in the
package names.

Agree with the rest of your post though.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread J C Nash
Having been around a while and part of several programming language and
other standards (see ISO 6373:1984 and IEEE 754-1985), I prefer some democracy 
at the
level of getting a standard. Though perhaps at the design level I can agree
with Hadley. However, we're now at the stage of needing to clean up R
and actually get rid of some serious annoyances, in which I would include
my own contributions that appear in optim(), namely the Nelder-Mead,
BFGS and CG options for which there are replacements.

In the tibble/data-frame issue, it would appear there could be a resolution
with some decision making at the R-core level, and whether that is democratic
or ad-hoc, it needs to happen.

JN


On 2017-09-26 05:08 PM, Hadley Wickham wrote:

> 
> I'm not sure that democracy works for programming language design.
> 
> Hadley
>

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
> If that is right -- and I tend to believe it is right -- this change had
> better been done in R core and not on package level. I think the root of
> this evil is design inconsistencies of the language together with the lack
> of removing these inconsistencies. The longer we hesitated, the more
> packages such a change could break. The lack of addressing issues in R core
> drives people to try to solve issues on package level. But now we have two
> conflicting standards, i.e. a fork-within-the-language: Am I a member of the
> tidyverse or not? Am I writing a package for the tidyverse or for standard-R
> or for both. With a fork-of-the-language we would at least have a majority
> vote for one of the two and only the fitter would survive. But with a
> fork-within-the-language 'R' gets more and more complex, and working with it
> more and more difficult. There is not only the tidyverse, also the Rcppverse
> and I don't know how many other verses. If there is no extinction of
> inconsistencies in R, not sufficient evolution in R, but lots of evolution
> in Julia, evolution will extinct R together with all its foobarverses in
> favor of Julia (or Python). May be that's a good thing.

I think you are making a slippery slope argument, and I'm not sure I
buy it. I am quite aware of the danger of introducing additional
inconsistencies, and do it very selectively, only when I'm convinced
that the pain is worth it.

> I think tibble should respect drop=TRUE and respect the work of all package
> authors who wrote defensive code and explicitly passed drop= instead of
> relying on the (wrong) default

We'll consider for the next major release:
https://github.com/tidyverse/tibble/issues/311

> . Again: better would be a long-term clean-up
> roadmap of R itself and one simple standard called 'data.frame'. Instead of
> forking or betting on any particular foobarverse: why not have direct
> democratic votes about certain critical features of such a long-term roadmap
> in such a big community?

I'm not sure that democracy works for programming language design.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 12:15 PM, Patrick Perry  wrote:
> Pro ignoring x[,1,drop=TRUE]:
> (1) it forces users to write consistent code for extracting a vector from a
> data frame
>
> Con:
> (1) functions that accept both matrices and data frames might break
> (x[[j]][i] doesn't work for a matrix)

I generally think that it's better to keep matrices and data frame
completely separate, but point taken.

> (2) functions that use the access pattern x[i,j,drop = TRUE] will break

This seems pretty rare, and I don't think anyone has complained about it yet.

I don't love adding support for drop = TRUE because it makes [.tibble
type-unstable, but maybe it's reasonable to do so in order to slightly
improve backward compatibility. I've filed an issue so we consider it
for the next major release:
https://github.com/tidyverse/tibble/issues/311

> Perhaps a bigger issue with tibbles is that they don't let you index with
> row names:
>
>> y <- tibble(x = letters)
>> rownames(y)
>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
> "15"
> [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
>> y[rownames(y)[c(1,5,9,15,21)],]
> # A tibble: 5 x 1
>   x
>   
> 1  
> 2  
> 3  
> 4  
> 5  

I'd argue that this is not as big as an issue, as I have no
recollection of anyone complaining about it.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Jens Oehlschlägel


On 26.09.2017 15:37, Hadley Wickham wrote:

I decided to make [.tibble type-stable (i.e. always return a data
frame) because this behaviour causes substantial problems in real data
analysis code. I did it understanding that it would cause some package
developers frustration, but I think it's better for a handful of
package maintainers to be frustrated than hundreds of users creating
dangerous code.g

Hadley



If that is right -- and I tend to believe it is right -- this change had 
better been done in R core and not on package level. I think the root of 
this evil is design inconsistencies of the language together with the 
lack of removing these inconsistencies. The longer we hesitated, the 
more packages such a change could break. The lack of addressing issues 
in R core drives people to try to solve issues on package level. But now 
we have two conflicting standards, i.e. a fork-within-the-language: Am I 
a member of the tidyverse or not? Am I writing a package for the 
tidyverse or for standard-R or for both. With a fork-of-the-language we 
would at least have a majority vote for one of the two and only the 
fitter would survive. But with a fork-within-the-language 'R' gets more 
and more complex, and working with it more and more difficult. There is 
not only the tidyverse, also the Rcppverse and I don't know how many 
other verses. If there is no extinction of inconsistencies in R, not 
sufficient evolution in R, but lots of evolution in Julia, evolution 
will extinct R together with all its foobarverses in favor of Julia (or 
Python). May be that's a good thing.


I think tibble should respect drop=TRUE and respect the work of all 
package authors who wrote defensive code and explicitly passed drop= 
instead of relying on the (wrong) default. Again: better would be a 
long-term clean-up roadmap of R itself and one simple standard called 
'data.frame'. Instead of forking or betting on any particular 
foobarverse: why not have direct democratic votes about certain critical 
features of such a long-term roadmap in such a big community?


Kind regards


Jens Oehlschlägel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 10:40 AM, Joris Meys  wrote:
> On Tue, Sep 26, 2017 at 5:33 PM, Hadley Wickham  wrote:
>>
>> > I for one am happy this discussion pops up, because it's a piece of
>> > information I give to my students as well: convert to a data.frame when
>> > you
>> > start your analysis just to play safe. And this discussion shows why
>> > that is
>> > -for the time being!- a good advice. The moment tibbles become the
>> > default
>> > data format in R, or some R++, or in Julia for all I care, I'll be more
>> > than
>> > happy to burn that drop = FALSE on a stake. But for now we can't ignore
>> > the
>> > differences and the potential for conflicts when you try to use a tibble
>> > instead of a data.frame.
>>
>> I think this is sub-optimal advice because most functions do work fine
>> with tibbles.
>
>
> Most. Not all. Either tibbles work exactly like a data.frame, or they don't.
> If they do, I wouldn't give that advice. But they don't.

They work 95% like a data frame. Seems odd to recommend that you
coerce 100% of the time for a <5% of the time problem.

>> It is only a few packages (largely written some time
>> ago) that don't. And typically, if they don't work with tibbles,
>> you'll get a (usually slightly confusing) error message because some
>> function will get a data frame instead of a vector. So as far I can
>> tell, you only need to as.data.frame() retrospectively, not
>> prospectively. Are you aware of any code that returns an incorrect
>> result (i.e. no error) when given a tibble instead of a data frame?
>
>
> x <- tibble(a = 1:5, b = 5:1)
>
> relcount <- function(x, id){
>   table(x[,id]) / length(x[,id])
> }
> relcount(x, "a")
> relcount(as.data.frame(x), "a")
>
> You're welcome.

Obviously you can contrive an example that fails (why wouldn't you use
nrow() here?). I meant an existing function in a package.

Hadley


-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Patrick Perry
Pro ignoring x[,1,drop=TRUE]:
(1) it forces users to write consistent code for extracting a vector 
from a data frame

Con:
(1) functions that accept both matrices and data frames might break 
(x[[j]][i] doesn't work for a matrix)
(2) functions that use the access pattern x[i,j,drop = TRUE] will break

Most of the breakages for Con (2) can be fixed by changing to x[[j]][i], 
but not all of them:

 > x <- data.frame(V=1:26, row.names = letters)
 > x[c("a","e","i","o","u"), "V", drop = TRUE]
[1]  1  5  9 15 21
 > x[["V"]][c("a","e","i","o","u")]
[1] NA NA NA NA NA

To me, the Cons outweigh the Pro, but I understand that the tidyverse 
puts a heavy weight on "one way to do things".

Perhaps a bigger issue with tibbles is that they don't let you index 
with row names:

 > y <- tibble(x = letters)
 > rownames(y)
  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" 
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
 > y[rownames(y)[c(1,5,9,15,21)],]
# A tibble: 5 x 1
   x

1 
2 
3 
4 
5 

If you want to write code that supports both tibbles and data frames, 
then you either have to avoid row names and drop = TRUE, or else you 
have to call `as.data.frame` on the input. This goes the other way, too. 
If you want to write a tidyverse function that also accepts data.frames, 
then you should call as_tibble on the input, otherwise your function 
will break when you index the input like x[,1].


Patrick
> Hadley Wickham 
> September 26, 2017 at 11:29 AM
> On Tue, Sep 26, 2017 at 9:22 AM, Patrick Perry  wrote:
>> Would it be possible to change tibbles so that
>>
>> x[,1,drop=TRUE]
>>
>> returns a vector, not a data frame? I certainly find it surprising that
>> tibbles ignore
>> the drop argument. If tibbles respeced the drop argument, then package
>> developers could rely on
>>
>> x[,1,drop=FALSE]
>>
>> or
>>
>> x[,1,drop=TRUE]
>>
>> behaving consistently, regardless of whether the argument is a tibble or a
>> data.frame.
>
> They can currently rely on x[[1]] returning alway a vector and x[, 1,
> drop = FALSE] always returning a data frame whether x is a tibble or a
> data frame. I personally don't believe that an additional approach
> would help.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
On Tue, Sep 26, 2017 at 5:33 PM, Hadley Wickham  wrote:

> > I for one am happy this discussion pops up, because it's a piece of
> > information I give to my students as well: convert to a data.frame when
> you
> > start your analysis just to play safe. And this discussion shows why
> that is
> > -for the time being!- a good advice. The moment tibbles become the
> default
> > data format in R, or some R++, or in Julia for all I care, I'll be more
> than
> > happy to burn that drop = FALSE on a stake. But for now we can't ignore
> the
> > differences and the potential for conflicts when you try to use a tibble
> > instead of a data.frame.
>
> I think this is sub-optimal advice because most functions do work fine
> with tibbles.


Most. Not all. Either tibbles work exactly like a data.frame, or they
don't. If they do, I wouldn't give that advice. But they don't.

It is only a few packages (largely written some time
> ago) that don't. And typically, if they don't work with tibbles,
> you'll get a (usually slightly confusing) error message because some
> function will get a data frame instead of a vector. So as far I can
> tell, you only need to as.data.frame() retrospectively, not
> prospectively. Are you aware of any code that returns an incorrect
> result (i.e. no error) when given a tibble instead of a data frame?
>

x <- tibble(a = 1:5, b = 5:1)

relcount <- function(x, id){
  table(x[,id]) / length(x[,id])
}
relcount(x, "a")
relcount(as.data.frame(x), "a")

You're welcome.


>
> Hadley
>
> --
> http://hadley.nz
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
> I for one am happy this discussion pops up, because it's a piece of
> information I give to my students as well: convert to a data.frame when you
> start your analysis just to play safe. And this discussion shows why that is
> -for the time being!- a good advice. The moment tibbles become the default
> data format in R, or some R++, or in Julia for all I care, I'll be more than
> happy to burn that drop = FALSE on a stake. But for now we can't ignore the
> differences and the potential for conflicts when you try to use a tibble
> instead of a data.frame.

I think this is sub-optimal advice because most functions do work fine
with tibbles. It is only a few packages (largely written some time
ago) that don't. And typically, if they don't work with tibbles,
you'll get a (usually slightly confusing) error message because some
function will get a data frame instead of a vector. So as far I can
tell, you only need to as.data.frame() retrospectively, not
prospectively. Are you aware of any code that returns an incorrect
result (i.e. no error) when given a tibble instead of a data frame?

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 9:22 AM, Patrick Perry  wrote:
> Would it be possible to change tibbles so that
>
> x[,1,drop=TRUE]
>
> returns a vector, not a data frame? I certainly find it surprising that
> tibbles ignore
> the drop argument. If tibbles respeced the drop argument, then package
> developers could rely on
>
> x[,1,drop=FALSE]
>
> or
>
> x[,1,drop=TRUE]
>
> behaving consistently, regardless of whether the argument is a tibble or a
> data.frame.

They can currently rely on x[[1]] returning alway a vector and x[, 1,
drop = FALSE] always returning a data frame whether x is a tibble or a
data frame. I personally don't believe that an additional approach
would help.

> Alternatively, would it be possible to make is.data.frame return FALSE for a
> tibble? Then
> developers that want base-R data.frame behavior can do
>
> if (!is.data.frame(x)) {
> x <- as.data.frame(x)
> }
>

As I've said elsewhere in the thread that would effectively render
tibbles useless because they wouldn't work with many functions.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström



On 2017-09-26 15:37, Hadley Wickham wrote:

On Tue, Sep 26, 2017 at 2:30 AM, Göran Broström  wrote:

I am beginning to get complaints from users of my CRAN packages (especially
'eha') to the effect that they get error messages like "Error: Unsupported
use of matrix or array for column indexing".

It turns out that they are sticking in tibbles into functions that expect
data frames as input. And I am using the kind of subsetting that Hadley
dislikes (eha is an old package, much older than tibbles). It is of course a
simple matter to change the code so it handles both data frames and tibbles
correctly, but this affects many functions, and it will take some time. And
when the next guy introduces 'troubles' as an improvement of 'tibbles', I
will have to rewrite the code again.


Changing df[, x] to df[[x]] is not very hard and makes your code
easier to understand because it more clearly conveys the intent that
you want a single column.


Couldn't agree more: Not because it works with tibbles, but because it 
works with lists. And therefore with data frames. If we trust inheritance.


Göran


While I like Hadley's way of doing it, I think it is a mistake to let a
tibble also be of class data frame. To me it is a matter of inheritance and
backwards compability: A tibble should add nice things to a data frame, not
change basic behaviour, in order to call itself a data frame.

Is it correct to let a tibble be of class "data.frame"?


If it not inherit from data frame, it would be not work with the 99%
of functions that work with data frames and don't deliberately take
advantage of the dropping behaviour of [. In other words, it would be
pointless.

I decided to make [.tibble type-stable (i.e. always return a data
frame) because this behaviour causes substantial problems in real data
analysis code.
I did it understanding that it would cause some package
developers frustration, but I think it's better for a handful of
package maintainers to be frustrated than hundreds of users creating
dangerous code.



Hadley



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
On Tue, Sep 26, 2017 at 3:38 PM, Hadley Wickham  wrote:

>
> So we should never try and improve upon legacy behaviour? I don't
> understand what you're arguing for here. If a tibble didn't inherit
> from a data frame, it would be useless.
>
> Hadley
>
> --
> http://hadley.nz
>

I didn't say that. I said a tibble does not react always like a data.frame,
and you of all people know very well that this is by design. I consider
this a good design choice, but it also means one shouldn't expect that all
code that works with a data.frame will also work with a tibble.

By design.

That's what initiated this entire discussion, and that is a correct
assessment. I am not arguing for any change, and said before that it is up
to the package developer to choose whether he/she takes tibbles into
account. It's not because I comment on tibbles or the tidyverse, that it is
meant as a frontal attack on your work. I cannot repeat enough how much I
value that. But the difference should be acknowledged, and imho in the
first place by the user.

I for one am happy this discussion pops up, because it's a piece of
information I give to my students as well: convert to a data.frame when you
start your analysis just to play safe. And this discussion shows why that
is -for the time being!- a good advice. The moment tibbles become the
default data format in R, or some R++, or in Julia for all I care, I'll be
more than happy to burn that drop = FALSE on a stake. But for now we can't
ignore the differences and the potential for conflicts when you try to use
a tibble instead of a data.frame.

With respect
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 8:51 AM, Pedro J. Aphalo
 wrote:
> What I think is troublesome is that data.frame is part of the definition
> of the R language, and the expectation based on R's normal behaviour is
> that testing with is.data.frame() should be enough to ensure that an
> object can be treated as a data frame. We can think of different
> solutions for use in our packages, but the naive R user will be always
> surprised by the behaviour of tibbles because package 'tibble' breaks
> the expectations of the R language with an exception.
>
> I do not know what could be the best solution... though. Maybe thinking
> of tibbles as a step towards R++ or R 4 or whatever future enhanced
> version of R, in which they will replace data frames completely. Hadley
> is correct in that they are a very significant improvement to R, but the
> problem is the inconsistent behaviour.

There are basically two classes of surprise:

1) You might be surprised that [ sometimes returns a vector and
sometimes returns a data frame.

2) You might be surprised that tibbles behave differently to data
frames for this one method.

I obviously believe that 1) is the worse surprise, but others differ.
This wouldn't normally be a problem but people use tibbles with
packages written by people who don't share my belief that 1) is
surprising and dangerous.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Pedro J. Aphalo
What I think is troublesome is that data.frame is part of the definition 
of the R language, and the expectation based on R's normal behaviour is 
that testing with is.data.frame() should be enough to ensure that an 
object can be treated as a data frame. We can think of different 
solutions for use in our packages, but the naive R user will be always 
surprised by the behaviour of tibbles because package 'tibble' breaks 
the expectations of the R language with an exception.

I do not know what could be the best solution... though. Maybe thinking 
of tibbles as a step towards R++ or R 4 or whatever future enhanced 
version of R, in which they will replace data frames completely. Hadley 
is correct in that they are a very significant improvement to R, but the 
problem is the inconsistent behaviour.

Pedro.


On 2017-09-26 16:01, Göran Broström wrote:
> Thanks Gábor,
>
> that is OK. However, if I would like an input tibble remain a tibble 
> (after massaging) in output, as a courtesy to the user, this will 
> fail. I think that it works if I instead treat the input as a list: 
> That's all 'the tibble way' does (in my case at least).
>
> Göran
>
> On 2017-09-26 14:17, Gábor Csárdi wrote:
>> Yes, basically tibbles violate the substitution principle. A lot of
>> other packages do, probably base R as well, although it is sometimes
>> hard to say, because there is no clear object hierarchy.
>>
>> Let's take a step back, and see how you can check for a data frame 
>> argument.
>>
>> 1. Weak check.
>>
>> is.data.frame(arg)
>>
>> This essentially means that you trust subclasses of data.frame to
>> adhere to the substitution principle. While this is nice in theory, a
>> lot packages (including both major packages implementing subclasses of
>> data.frame!) do not always adhere. So this is not really a safe
>> solution.
>>
>> Base R does this as well, sometimes, e.g. aggregate.data.frame has:
>>
>>  if (!is.data.frame(x))
>>  x <- as.data.frame(x)
>>
>> which is essentially equivalent to the weak check, since it leaves
>> data.frame subclasses untouched.
>>
>> 2. Strong "check".
>>
>> arg <- as.data.frame(arg)
>>
>> This is safer, because it does not rely on subclass implementors. It
>> also has the additional benefit that your code is polymorphic: it
>> works with any input, as long as it can be converted to a data frame.
>>
>> Base R also uses this often, e.g. in merge.data.frame:
>>
>>  nx <- nrow(x <- as.data.frame(x))
>>  ny <- nrow(y <- as.data.frame(y))
>>
>> Gabor
>>
>> Disclaimer: I do not represent the tibble authors in any way.
>>
>> On Tue, Sep 26, 2017 at 11:21 AM, David Hugh-Jones
>>  wrote:
>>> These replies seem to be missing the point, which is that old code 
>>> has to be
>>> rewritten because tibbles don't behave like data frames.
>>>
>>> It is true that subclasses can override behaviour, but there is an 
>>> implicit
>>> contract that the same methods should do the same things.
>>>
>>> The as.xxx pattern seems weird to me, though I see it a lot. What is 
>>> the
>>> point of inheritance if you always have to convert an object upwards 
>>> before
>>> you can treat it as a member of the superclass?
>>>
>>> I can see this argument will run...
>>>
>>> David
>>>
>>> On 26 September 2017 at 11:15, Gábor Csárdi  
>>> wrote:

 What is the benefit here, compared to just calling as.data.frame() 
 on it?

 Gabor

 On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
 wrote:
> Since tibbles add their class attributes first, you could use:
>
> tb <- tibble(a = 5)
> inherits(tb, "data.frame", which = TRUE) == 1
>
> if "tb" is a data frame (only), TRUE is returned, for tibble 
> FALSE. You
> could then coerce to data frame: as.data.frame(tb)
>
> -Ursprüngliche Nachricht-
> Von: R-package-devel 
> [mailto:r-package-devel-boun...@r-project.org] Im
> Auftrag von Göran Broström
> Gesendet: Dienstag, 26. September 2017 12:09
> An: r-package-devel@r-project.org
> Betreff: Re: [R-pkg-devel] tibbles are not data frames
>
>
>
> On 2017-09-26 11:56, Gábor Csárdi wrote:
>> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
>> wrote:
>>> I don't like the dropping of dimensions either. That doesn't change
>>> the fact that a tibble reacts different from a data.frame. So 
>>> tibbles
>>> do not inherit correctly from the class data.frame, and it can thus
>>> be argued that it's against OOP paradigms to pretend tibbles 
>>> inherit
>>> from the class data.frame.
>>
>> I have yet to see an OOP system in which a subclass cannot override
>> the methods of its superclass. Not only is this in line with OOP
>> paradigms, it is actually one of the essential OOP features.
>>
>> To be more constructive, if you have a function that only works with
>> data frame inputs, then it is good practice to check that the 
>> supplied
>> input is i

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 8:35 AM, Joris Meys  wrote:
> Where its parent class _sometimes_ returns an atomic vector and
>>
>> _sometimes_ returns a data frame.
>
> Indeed. And a tibble doesn't, so there's a conflict. Nobody said data.frame
> works better than tibble. Actually, we all agree that the legacy behaviour
> sucks. But it exists, and causes conflicts when users expect a tibble to
> behave the same as a data.frame.
>
> It does not.

So we should never try and improve upon legacy behaviour? I don't
understand what you're arguing for here. If a tibble didn't inherit
from a data frame, it would be useless.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 2:30 AM, Göran Broström  wrote:
> I am beginning to get complaints from users of my CRAN packages (especially
> 'eha') to the effect that they get error messages like "Error: Unsupported
> use of matrix or array for column indexing".
>
> It turns out that they are sticking in tibbles into functions that expect
> data frames as input. And I am using the kind of subsetting that Hadley
> dislikes (eha is an old package, much older than tibbles). It is of course a
> simple matter to change the code so it handles both data frames and tibbles
> correctly, but this affects many functions, and it will take some time. And
> when the next guy introduces 'troubles' as an improvement of 'tibbles', I
> will have to rewrite the code again.

Changing df[, x] to df[[x]] is not very hard and makes your code
easier to understand because it more clearly conveys the intent that
you want a single column.

> While I like Hadley's way of doing it, I think it is a mistake to let a
> tibble also be of class data frame. To me it is a matter of inheritance and
> backwards compability: A tibble should add nice things to a data frame, not
> change basic behaviour, in order to call itself a data frame.
>
> Is it correct to let a tibble be of class "data.frame"?

If it not inherit from data frame, it would be not work with the 99%
of functions that work with data frames and don't deliberately take
advantage of the dropping behaviour of [. In other words, it would be
pointless.

I decided to make [.tibble type-stable (i.e. always return a data
frame) because this behaviour causes substantial problems in real data
analysis code. I did it understanding that it would cause some package
developers frustration, but I think it's better for a handful of
package maintainers to be frustrated than hundreds of users creating
dangerous code.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
Where its parent class _sometimes_ returns an atomic vector and

> _sometimes_ returns a data frame.
>
> Hadley
>

Indeed. And a tibble doesn't, so there's a conflict. Nobody said data.frame
works better than tibble. Actually, we all agree that the legacy behaviour
sucks. But it exists, and causes conflicts when users expect a tibble to
behave the same as a data.frame.

It does not.

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Hadley Wickham
On Tue, Sep 26, 2017 at 8:28 AM, Jeroen Ooms  wrote:
> On Tue, Sep 26, 2017 at 11:56 AM, Gábor Csárdi  wrote:
>>
>> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
>> > I don't like the dropping of dimensions either. That doesn't change the
>> > fact that a tibble reacts different from a data.frame. So tibbles do not
>> > inherit correctly from the class data.frame, and it can thus be argued that
>> > it's against OOP paradigms to pretend tibbles inherit from the class
>> > data.frame.
>>
>> I have yet to see an OOP system in which a subclass cannot override the 
>> methods
>> of its superclass. Not only is this in line with OOP paradigms, it is
>> actually one of the essential OOP features.
>
> Not if this compromises type safety. Formal OOP languages enforce the
> signature matches when you override a method. The fact that R is
> dynamically typed puts this responsibility at the developer. The fact
> that tibble [ returns a data frame where it's parent class returns an
> atomic vector violates this principle, resulting in the obvious type
> errors where tibbles are used as data frames.

Where its parent class _sometimes_ returns an atomic vector and
_sometimes_ returns a data frame.

Hadley

-- 
http://hadley.nz

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Jeroen Ooms
On Tue, Sep 26, 2017 at 11:56 AM, Gábor Csárdi  wrote:
>
> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
> > I don't like the dropping of dimensions either. That doesn't change the
> > fact that a tibble reacts different from a data.frame. So tibbles do not
> > inherit correctly from the class data.frame, and it can thus be argued that
> > it's against OOP paradigms to pretend tibbles inherit from the class
> > data.frame.
>
> I have yet to see an OOP system in which a subclass cannot override the 
> methods
> of its superclass. Not only is this in line with OOP paradigms, it is
> actually one of the essential OOP features.

Not if this compromises type safety. Formal OOP languages enforce the
signature matches when you override a method. The fact that R is
dynamically typed puts this responsibility at the developer. The fact
that tibble [ returns a data frame where it's parent class returns an
atomic vector violates this principle, resulting in the obvious type
errors where tibbles are used as data frames.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström

Thanks Gábor,

that is OK. However, if I would like an input tibble remain a tibble 
(after massaging) in output, as a courtesy to the user, this will fail. 
I think that it works if I instead treat the input as a list: That's all 
'the tibble way' does (in my case at least).


Göran

On 2017-09-26 14:17, Gábor Csárdi wrote:

Yes, basically tibbles violate the substitution principle. A lot of
other packages do, probably base R as well, although it is sometimes
hard to say, because there is no clear object hierarchy.

Let's take a step back, and see how you can check for a data frame argument.

1. Weak check.

is.data.frame(arg)

This essentially means that you trust subclasses of data.frame to
adhere to the substitution principle. While this is nice in theory, a
lot packages (including both major packages implementing subclasses of
data.frame!) do not always adhere. So this is not really a safe
solution.

Base R does this as well, sometimes, e.g. aggregate.data.frame has:

 if (!is.data.frame(x))
 x <- as.data.frame(x)

which is essentially equivalent to the weak check, since it leaves
data.frame subclasses untouched.

2. Strong "check".

arg <- as.data.frame(arg)

This is safer, because it does not rely on subclass implementors. It
also has the additional benefit that your code is polymorphic: it
works with any input, as long as it can be converted to a data frame.

Base R also uses this often, e.g. in merge.data.frame:

 nx <- nrow(x <- as.data.frame(x))
 ny <- nrow(y <- as.data.frame(y))

Gabor

Disclaimer: I do not represent the tibble authors in any way.

On Tue, Sep 26, 2017 at 11:21 AM, David Hugh-Jones
 wrote:

These replies seem to be missing the point, which is that old code has to be
rewritten because tibbles don't behave like data frames.

It is true that subclasses can override behaviour, but there is an implicit
contract that the same methods should do the same things.

The as.xxx pattern seems weird to me, though I see it a lot. What is the
point of inheritance if you always have to convert an object upwards before
you can treat it as a member of the superclass?

I can see this argument will run...

David

On 26 September 2017 at 11:15, Gábor Csárdi  wrote:


What is the benefit here, compared to just calling as.data.frame() on it?

Gabor

On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
wrote:

Since tibbles add their class attributes first, you could use:

tb <- tibble(a = 5)
inherits(tb, "data.frame", which = TRUE) == 1

if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
could then coerce to data frame: as.data.frame(tb)

-Ursprüngliche Nachricht-
Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
Auftrag von Göran Broström
Gesendet: Dienstag, 26. September 2017 12:09
An: r-package-devel@r-project.org
Betreff: Re: [R-pkg-devel] tibbles are not data frames



On 2017-09-26 11:56, Gábor Csárdi wrote:

On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
wrote:

I don't like the dropping of dimensions either. That doesn't change
the fact that a tibble reacts different from a data.frame. So tibbles
do not inherit correctly from the class data.frame, and it can thus
be argued that it's against OOP paradigms to pretend tibbles inherit
from the class data.frame.


I have yet to see an OOP system in which a subclass cannot override
the methods of its superclass. Not only is this in line with OOP
paradigms, it is actually one of the essential OOP features.

To be more constructive, if you have a function that only works with
data frame inputs, then it is good practice to check that the supplied
input is indeed a data frame. This is independent of tibbles.


It is not. I check input for being a data frame, but tibbles pass that
test. That's the essence of the problem.


In practice it seems to me that an easy fix is to just call
as.data.frame on the input. This should either convert it to a data
frame, or throw an error.


Sure, but I still need to rewrite the package.

Görn


For tibbles it
drops the tbl* classes.

Gabor


Defensive coding techniques would check if it's a tibble and return
an error saying a data.frame is expected. Unless tibbles inherit
correctly from data.frame.

I have nothing against tibbles. But calling them "data.frame" raises
expectations that can't be fulfilled.


[...]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread CJ Yetman
The problem is not with a data.frame or a tibble... the problem is when a
package unwittingly converts a data.frame/tibble to a vector, because of
bad defaults in data.frame methods, and then later on expects that vector
to be a vector without explicitly making it a vector or checking if it is a
vector.

On Tue, Sep 26, 2017 at 1:10 PM, Alexandre Courtiol <
alexandre.court...@gmail.com> wrote:

> David is right,
>
> imagine an old silly code such as:
>
> get_a.data.frame <- function(d) if("data.frame" %in% class(d)) d["a" ,]
>
> This line of code giving you the row "a" of a data.frame could be in any
> package.
> No matter how ugly it is, it is technically correct and conforms to the
> original definition of data.frames.
>
> Now you have a data.frame:
>
> foo <- data.frame(x=1:3, row.names = c("a", "b", "c"))
>
> > geta.data.frame(foo)
> [1] 1
>
> this is expected
>
> > geta.data.frame(as.matrix(foo))
> [1]
>
> This returns nothing, again it is expected as a matrix is not a data.frame
>
> But here comes the tibble trouble:
>
> > get_a.data.frame(as.tibble(foo))
> # A tibble: 1 x 1
>   x
>   
> 1NA
>
> And now the old package is broken.
> Also if we tolerate this, think what would happen if this kind of practice
> would scale up!
> If anyone can call any classes the way they want without fulfilling the law
> of inheritance we will soon be in big troubles, lost among the mutants.
>
> Tibbles are great, data.frame are widely used, Tibbles should not be of the
> class data.frame, unless tibbles start behaving as data.frame do.
>
> Alex
>
>
>
>
>
>
>
>
>
> On 26 September 2017 at 12:21, Stefan McKinnon Høj-Edwards 
> wrote:
>
> > There is no benefit. It is a rather cumbersome approach to checking
> whether
> > something behaves as you expect it to. `as.data.frame` will force it into
> > what you need; if it cannot be forced, then it will fail. That it can be
> > converted to a data.frame is the class' designers responsibility, not
> > yours. So you can use `as.data.frame` on *any* input that you need to
> > behave as a data.frame.
> > Consider a grouped tribble; now you have to test 2 different classes.
> >
> > Kindly,
> > Stefan
> >
> > Stefan McKinnon Høj-Edwards
> > ph.d. Genetics
> > +44 (0)776 231 2464
> > +45 2888 6598
> > Skype: stefan_edwards
> >
> > 2017-09-26 11:15 GMT+01:00 Gábor Csárdi :
> >
> > > What is the benefit here, compared to just calling as.data.frame() on
> it?
> > >
> > > Gabor
> > >
> > > On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
> > > wrote:
> > > > Since tibbles add their class attributes first, you could use:
> > > >
> > > > tb <- tibble(a = 5)
> > > > inherits(tb, "data.frame", which = TRUE) == 1
> > > >
> > > > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE.
> You
> > > could then coerce to data frame: as.data.frame(tb)
> > > >
> > > > -Ursprüngliche Nachricht-
> > > > Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org]
> Im
> > > Auftrag von Göran Broström
> > > > Gesendet: Dienstag, 26. September 2017 12:09
> > > > An: r-package-devel@r-project.org
> > > > Betreff: Re: [R-pkg-devel] tibbles are not data frames
> > > >
> > > >
> > > >
> > > > On 2017-09-26 11:56, Gábor Csárdi wrote:
> > > >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
> > > wrote:
> > > >>> I don't like the dropping of dimensions either. That doesn't change
> > > >>> the fact that a tibble reacts different from a data.frame. So
> tibbles
> > > >>> do not inherit correctly from the class data.frame, and it can thus
> > > >>> be argued that it's against OOP paradigms to pretend tibbles
> inherit
> > > >>> from the class data.frame.
> > > >>
> > > >> I have yet to see an OOP system in which a subclass cannot override
> > > >> the methods of its superclass. Not only is this in line with OOP
> > > >> paradigms, it is actually one of the essential OOP features.
> > > >>
> > > >> To be more constructive, if you have a function that only works with
> > > >> data frame inputs, then it is good practice to check that the
> supplied
> > > >> input is indeed a data frame. This is independent of tibbles.
> > > >
> > > > It is not. I check input for being a data frame, but tibbles pass
> that
> > > test. That's the essence of the problem.
> > > >
> > > >> In practice it seems to me that an easy fix is to just call
> > > >> as.data.frame on the input. This should either convert it to a data
> > > >> frame, or throw an error.
> > > >
> > > > Sure, but I still need to rewrite the package.
> > > >
> > > > Görn
> > > >
> > > >> For tibbles it
> > > >> drops the tbl* classes.
> > > >>
> > > >> Gabor
> > > >>
> > > >>> Defensive coding techniques would check if it's a tibble and return
> > > >>> an error saying a data.frame is expected. Unless tibbles inherit
> > > >>> correctly from data.frame.
> > > >>>
> > > >>> I have nothing against tibbles. But calling them "data.frame"
> raises
> > > >>> expectations that can't be fulfilled.
> > > >>
> > > >> [...]
> > > >>
> > > >> _

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Gábor Csárdi
Yes, basically tibbles violate the substitution principle. A lot of
other packages do, probably base R as well, although it is sometimes
hard to say, because there is no clear object hierarchy.

Let's take a step back, and see how you can check for a data frame argument.

1. Weak check.

is.data.frame(arg)

This essentially means that you trust subclasses of data.frame to
adhere to the substitution principle. While this is nice in theory, a
lot packages (including both major packages implementing subclasses of
data.frame!) do not always adhere. So this is not really a safe
solution.

Base R does this as well, sometimes, e.g. aggregate.data.frame has:

if (!is.data.frame(x))
x <- as.data.frame(x)

which is essentially equivalent to the weak check, since it leaves
data.frame subclasses untouched.

2. Strong "check".

arg <- as.data.frame(arg)

This is safer, because it does not rely on subclass implementors. It
also has the additional benefit that your code is polymorphic: it
works with any input, as long as it can be converted to a data frame.

Base R also uses this often, e.g. in merge.data.frame:

nx <- nrow(x <- as.data.frame(x))
ny <- nrow(y <- as.data.frame(y))

Gabor

Disclaimer: I do not represent the tibble authors in any way.

On Tue, Sep 26, 2017 at 11:21 AM, David Hugh-Jones
 wrote:
> These replies seem to be missing the point, which is that old code has to be
> rewritten because tibbles don't behave like data frames.
>
> It is true that subclasses can override behaviour, but there is an implicit
> contract that the same methods should do the same things.
>
> The as.xxx pattern seems weird to me, though I see it a lot. What is the
> point of inheritance if you always have to convert an object upwards before
> you can treat it as a member of the superclass?
>
> I can see this argument will run...
>
> David
>
> On 26 September 2017 at 11:15, Gábor Csárdi  wrote:
>>
>> What is the benefit here, compared to just calling as.data.frame() on it?
>>
>> Gabor
>>
>> On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
>> wrote:
>> > Since tibbles add their class attributes first, you could use:
>> >
>> > tb <- tibble(a = 5)
>> > inherits(tb, "data.frame", which = TRUE) == 1
>> >
>> > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
>> > could then coerce to data frame: as.data.frame(tb)
>> >
>> > -Ursprüngliche Nachricht-
>> > Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
>> > Auftrag von Göran Broström
>> > Gesendet: Dienstag, 26. September 2017 12:09
>> > An: r-package-devel@r-project.org
>> > Betreff: Re: [R-pkg-devel] tibbles are not data frames
>> >
>> >
>> >
>> > On 2017-09-26 11:56, Gábor Csárdi wrote:
>> >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
>> >> wrote:
>> >>> I don't like the dropping of dimensions either. That doesn't change
>> >>> the fact that a tibble reacts different from a data.frame. So tibbles
>> >>> do not inherit correctly from the class data.frame, and it can thus
>> >>> be argued that it's against OOP paradigms to pretend tibbles inherit
>> >>> from the class data.frame.
>> >>
>> >> I have yet to see an OOP system in which a subclass cannot override
>> >> the methods of its superclass. Not only is this in line with OOP
>> >> paradigms, it is actually one of the essential OOP features.
>> >>
>> >> To be more constructive, if you have a function that only works with
>> >> data frame inputs, then it is good practice to check that the supplied
>> >> input is indeed a data frame. This is independent of tibbles.
>> >
>> > It is not. I check input for being a data frame, but tibbles pass that
>> > test. That's the essence of the problem.
>> >
>> >> In practice it seems to me that an easy fix is to just call
>> >> as.data.frame on the input. This should either convert it to a data
>> >> frame, or throw an error.
>> >
>> > Sure, but I still need to rewrite the package.
>> >
>> > Görn
>> >
>> >> For tibbles it
>> >> drops the tbl* classes.
>> >>
>> >> Gabor
>> >>
>> >>> Defensive coding techniques would check if it's a tibble and return
>> >>> an error saying a data.frame is expected. Unless tibbles inherit
>> >>> correctly from data.frame.
>> >>>
>> >>> I have nothing against tibbles. But calling them "data.frame" raises
>> >>> expectations that can't be fulfilled.
>> >>
>> >> [...]
>> >>
>> >> __
>> >> R-package-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >>
>> >
>> > __
>> > R-package-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >
>> > --
>> >
>> > _
>> >
>> > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
>> > Rechts; Gerichtsstand: Hamburg | www.uke.de
>> > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
>> >

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström



On 2017-09-26 14:01, Daniel Lüdecke wrote:

You wrote:

The correct and logical way (which I use in 'eha') is to check if input is a 
data frame, and if not, throw an error.

If you want to check for a data frame (and a data frame only), because you 
don't want to coerce *any* object to data frames, then this would be one way to 
check for df/tibble, and coerce tibbles only. That's what I had in mind...

But as I mentioned before, since simplifying is the most (or even only?) relevant point 
when dealing with tibbles, I have re-written all parts in my packages that used df[, x] 
indexing, and replaced with df[[x]], resp. used df[, x, drop = FALSE], or - if a vector 
is needed - you can use "dplyr::pull()" to make sure you get a vector.


One important thing for me (my packages) is to stay out of dependence on 
other packages, as far as it is possible. But I am in the process of 
doing what you suggest: Treat a data frame as the _list_ it is. Back to 
basics!


Göran



Best
Daniel

-Ursprüngliche Nachricht-
Von: Gábor Csárdi [mailto:csardi.ga...@gmail.com]
Gesendet: Dienstag, 26. September 2017 12:15
An: Daniel Lüdecke 
Cc: R Package Devel 
Betreff: Re: [R-pkg-devel] tibbles are not data frames

What is the benefit here, compared to just calling as.data.frame() on it?

Gabor

On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke  wrote:

Since tibbles add their class attributes first, you could use:

tb <- tibble(a = 5)
inherits(tb, "data.frame", which = TRUE) == 1

if "tb" is a data frame (only), TRUE is returned, for tibble FALSE.
You could then coerce to data frame: as.data.frame(tb)

-Ursprüngliche Nachricht-
Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
Auftrag von Göran Broström
Gesendet: Dienstag, 26. September 2017 12:09
An: r-package-devel@r-project.org
Betreff: Re: [R-pkg-devel] tibbles are not data frames



On 2017-09-26 11:56, Gábor Csárdi wrote:

On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:

I don't like the dropping of dimensions either. That doesn't change
the fact that a tibble reacts different from a data.frame. So
tibbles do not inherit correctly from the class data.frame, and it
can thus be argued that it's against OOP paradigms to pretend
tibbles inherit from the class data.frame.


I have yet to see an OOP system in which a subclass cannot override
the methods of its superclass. Not only is this in line with OOP
paradigms, it is actually one of the essential OOP features.

To be more constructive, if you have a function that only works with
data frame inputs, then it is good practice to check that the
supplied input is indeed a data frame. This is independent of tibbles.


It is not. I check input for being a data frame, but tibbles pass that test. 
That's the essence of the problem.


In practice it seems to me that an easy fix is to just call
as.data.frame on the input. This should either convert it to a data
frame, or throw an error.


Sure, but I still need to rewrite the package.

Görn


For tibbles it
drops the tbl* classes.

Gabor


Defensive coding techniques would check if it's a tibble and return
an error saying a data.frame is expected. Unless tibbles inherit
correctly from data.frame.

I have nothing against tibbles. But calling them "data.frame" raises
expectations that can't be fulfilled.


[...]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

SAVE PAPER - THINK BEFORE PRINTING
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe 
Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

SAVE PAPER - THINK BEFORE PRINTING
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Pedro J. Aphalo
Hi,

But the point is that

inherits(x, "data.frame", TRUE) == 1

will not distinguish between tibbles and other classes derived from 
data.frame that do respect the original syntax. You cannot/want in most 
cases block the use of every class derived from data.frame. One would 
need to use a test that specifically disallows 'tibble', which is 
possible, but inelegant.

However, maybe the main thing is that packages like tibble and magrittr 
are changing the syntax of R. Maybe we are reaching a point where we 
need to define R++, with tibbles REPLACING data frames, etc. Then it 
would be clear that some porting is required, and with only one syntax 
and behaviour in use no confusion created.

Pedro.

On 2017-09-26 14:47, Iñaki Úcar wrote:
> 2017-09-26 13:41 GMT+02:00 Holger Hoefling :
>> Hi Thierry,
>>
>> You write:
>>
>> "If a package requires a data.frame, then it is up to the _user_ to
>> provide a data.frame (and a tibble is not a data.frame). "
>>
>> Actually, as pointed out before, calling
>>
>> is.data.frame
>>
>> on a tibble returns TRUE. So I think that R says - yes, a tibble is a data
>> frame. What would be the point of having a "is.data.frame" function, if you
>> can't trust its answer?
> is.data.frame is just a wrapper for inherits(x, "data.frame"). As
> Daniel pointed out before, inherits(x, "data.frame", TRUE) == 1
> returns TRUE for data frames and FALSE for tibbles.
>
> Iñaki
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

-- 

Pedro J. Aphalo
University Lecturer, Principal Investigator
(Office 4417, Biocenter 3, Viikinkaari 1)

Department of Biosciences
Plant Biology
P.O. Box 65
00014 University of Helsinki
Finland

e-mail: pedro.aph...@helsinki.fi 
Tel. (mobile) +358 50 4150623
Tel. (office) +358 2941 57897


*Web sites and blogs*
Web site (research group): http://blogs.helsinki.fi/senpep-blog/
Web site (own teaching): http://www.helsinki.fi/people/pedro.aphalo/
Web site (using R in photobiology): http://www.r4photobiology.info/

*Societies*
UV4Plants  (communications officer), ESP 
 (member) SEB  
(member), BES  (member), SPPS 
 (member), SMS 
 (member), TUG 
 (member), FOAS  (member).


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Daniel Lüdecke
You wrote:

The correct and logical way (which I use in 'eha') is to check if input is a 
data frame, and if not, throw an error.

If you want to check for a data frame (and a data frame only), because you 
don't want to coerce *any* object to data frames, then this would be one way to 
check for df/tibble, and coerce tibbles only. That's what I had in mind...

But as I mentioned before, since simplifying is the most (or even only?) 
relevant point when dealing with tibbles, I have re-written all parts in my 
packages that used df[, x] indexing, and replaced with df[[x]], resp. used df[, 
x, drop = FALSE], or - if a vector is needed - you can use "dplyr::pull()" to 
make sure you get a vector.

Best
Daniel

-Ursprüngliche Nachricht-
Von: Gábor Csárdi [mailto:csardi.ga...@gmail.com] 
Gesendet: Dienstag, 26. September 2017 12:15
An: Daniel Lüdecke 
Cc: R Package Devel 
Betreff: Re: [R-pkg-devel] tibbles are not data frames

What is the benefit here, compared to just calling as.data.frame() on it?

Gabor

On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke  wrote:
> Since tibbles add their class attributes first, you could use:
>
> tb <- tibble(a = 5)
> inherits(tb, "data.frame", which = TRUE) == 1
>
> if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. 
> You could then coerce to data frame: as.data.frame(tb)
>
> -Ursprüngliche Nachricht-
> Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im 
> Auftrag von Göran Broström
> Gesendet: Dienstag, 26. September 2017 12:09
> An: r-package-devel@r-project.org
> Betreff: Re: [R-pkg-devel] tibbles are not data frames
>
>
>
> On 2017-09-26 11:56, Gábor Csárdi wrote:
>> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
>>> I don't like the dropping of dimensions either. That doesn't change 
>>> the fact that a tibble reacts different from a data.frame. So 
>>> tibbles do not inherit correctly from the class data.frame, and it 
>>> can thus be argued that it's against OOP paradigms to pretend 
>>> tibbles inherit from the class data.frame.
>>
>> I have yet to see an OOP system in which a subclass cannot override 
>> the methods of its superclass. Not only is this in line with OOP 
>> paradigms, it is actually one of the essential OOP features.
>>
>> To be more constructive, if you have a function that only works with 
>> data frame inputs, then it is good practice to check that the 
>> supplied input is indeed a data frame. This is independent of tibbles.
>
> It is not. I check input for being a data frame, but tibbles pass that test. 
> That's the essence of the problem.
>
>> In practice it seems to me that an easy fix is to just call 
>> as.data.frame on the input. This should either convert it to a data 
>> frame, or throw an error.
>
> Sure, but I still need to rewrite the package.
>
> Görn
>
>> For tibbles it
>> drops the tbl* classes.
>>
>> Gabor
>>
>>> Defensive coding techniques would check if it's a tibble and return 
>>> an error saying a data.frame is expected. Unless tibbles inherit 
>>> correctly from data.frame.
>>>
>>> I have nothing against tibbles. But calling them "data.frame" raises 
>>> expectations that can't be fulfilled.
>>
>> [...]
>>
>> __
>> R-package-devel@r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>
> __
> R-package-devel@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
>
> _
>
> Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen 
> Rechts; Gerichtsstand: Hamburg | www.uke.de
> Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. 
> Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.) 
> _
>
> SAVE PAPER - THINK BEFORE PRINTING
> __
> R-package-devel@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe 
Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

SAVE PAPER - THINK BEFORE PRINTING
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Iñaki Úcar
2017-09-26 13:41 GMT+02:00 Holger Hoefling :
> Hi Thierry,
>
> You write:
>
> "If a package requires a data.frame, then it is up to the _user_ to
> provide a data.frame (and a tibble is not a data.frame). "
>
> Actually, as pointed out before, calling
>
> is.data.frame
>
> on a tibble returns TRUE. So I think that R says - yes, a tibble is a data
> frame. What would be the point of having a "is.data.frame" function, if you
> can't trust its answer?

is.data.frame is just a wrapper for inherits(x, "data.frame"). As
Daniel pointed out before, inherits(x, "data.frame", TRUE) == 1
returns TRUE for data frames and FALSE for tibbles.

Iñaki

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Holger Hoefling
Hi Thierry,

You write:

"If a package requires a data.frame, then it is up to the _user_ to
provide a data.frame (and a tibble is not a data.frame). "

Actually, as pointed out before, calling

is.data.frame

on a tibble returns TRUE. So I think that R says - yes, a tibble is a data
frame. What would be the point of having a "is.data.frame" function, if you
can't trust its answer?

And you can also look at it from the other side: Why does tibble need to
inherit from a data.frame? I don't know exactly what the original intention
behind this was, but I would guess that it was intended to make tibbles a
drop-in replacement for data.frames. And it looks like it is not succeeding
at this task.

Best

Holger Hoefling

On Tue, Sep 26, 2017 at 1:32 PM, Thierry Onkelinx 
wrote:

> Dear all,
>
> IMHO the problem is being look at from the wrong perspective. The
> tibble doesn't change the data.frame, it uses all methods from
> data.frame which it doesn't implement itself. Hence it behaves like at
> data.frame to some extent.
>
> If a package requires a data.frame, then it is up to the _user_ to
> provide a data.frame (and a tibble is not a data.frame). Documenting
> this in the package documentation/FAQ or issuing a warning "don't use
> tibble" when the package is loaded should be sufficient.
>
> Best regards,
>
> ir. Thierry Onkelinx
> Statisticus/ Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
> AND FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkel...@inbo.be
> Kliniekstraat 25, B-1070 Brussel
> www.inbo.be
>
> 
> ///
> To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. ~ John Tukey
> 
> ///
>
>
> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
> Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
> Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000
> Brussel.
>
> 
> ///
>
>
> 2017-09-26 13:18 GMT+02:00 Joris Meys :
> >
> > On Tue, Sep 26, 2017 at 11:56 AM, Gábor Csárdi 
> > wrote:
> >
> > >
> > > I have yet to see an OOP system in which a subclass cannot override the
> > > methods
> > > of its superclass. Not only is this in line with OOP paradigms, it is
> > > actually one of
> > > the essential OOP features.
> > >
> >
> > Fair enough. And I shouldn't have used the word "inherit" in the first
> > place, we're talking S3 after all. Fwiw, overriding a method to do the
> > exact same except for one detail isn't encouraged in the OOP world
> either.
> >
> >
> > > To be more constructive, if you have a function that only works with
> > > data frame inputs, then
> > > it is good practice to check that the supplied input is indeed a data
> > > frame. This is
> > > independent of tibbles.
> > >
> >
> > Actually it's not independent of tibbles as illustrated by others.
> > is.data.frame() returns TRUE for tibbles. It doesn't for matrices or
> > vectors.
> >
> >
> > >
> > > In practice it seems to me that an easy fix is to just call
> > > as.data.frame on the input. This should
> > > either convert it to a data frame, or throw an error. For tibbles it
> > > drops the tbl* classes.
> > >
> >
> > This would also allow matrices or vectors to be converted to data.frames,
> > and that might or might not be warranted.
> >
> > I agree that the S3 system allows you to do this, and think it's up to
> the
> > package manager to decide whether or not they would allow their users to
> > use tibbles instead of data.frame objects.
> >
> > I think the bigger frustration is that tibble users are more prone to
> > expect all code to work exactly like it does with data.frames. Which it
> > obviously doesn't.
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Ghent University
> > Faculty of Bioscience Engineering
> > Department of Mathematical Modelling, Statistics and Bio-Informatics
> >
> > tel : +32 9 264 59 87
> > joris.m...@ugent.be
> > ---
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> __
> R-package-devel@r-project.org maili

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Thierry Onkelinx
Dear all,

IMHO the problem is being look at from the wrong perspective. The
tibble doesn't change the data.frame, it uses all methods from
data.frame which it doesn't implement itself. Hence it behaves like at
data.frame to some extent.

If a package requires a data.frame, then it is up to the _user_ to
provide a data.frame (and a tibble is not a data.frame). Documenting
this in the package documentation/FAQ or issuing a warning "don't use
tibble" when the package is loaded should be sufficient.

Best regards,

ir. Thierry Onkelinx
Statisticus/ Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///


2017-09-26 13:18 GMT+02:00 Joris Meys :
>
> On Tue, Sep 26, 2017 at 11:56 AM, Gábor Csárdi 
> wrote:
>
> >
> > I have yet to see an OOP system in which a subclass cannot override the
> > methods
> > of its superclass. Not only is this in line with OOP paradigms, it is
> > actually one of
> > the essential OOP features.
> >
>
> Fair enough. And I shouldn't have used the word "inherit" in the first
> place, we're talking S3 after all. Fwiw, overriding a method to do the
> exact same except for one detail isn't encouraged in the OOP world either.
>
>
> > To be more constructive, if you have a function that only works with
> > data frame inputs, then
> > it is good practice to check that the supplied input is indeed a data
> > frame. This is
> > independent of tibbles.
> >
>
> Actually it's not independent of tibbles as illustrated by others.
> is.data.frame() returns TRUE for tibbles. It doesn't for matrices or
> vectors.
>
>
> >
> > In practice it seems to me that an easy fix is to just call
> > as.data.frame on the input. This should
> > either convert it to a data frame, or throw an error. For tibbles it
> > drops the tbl* classes.
> >
>
> This would also allow matrices or vectors to be converted to data.frames,
> and that might or might not be warranted.
>
> I agree that the S3 system allows you to do this, and think it's up to the
> package manager to decide whether or not they would allow their users to
> use tibbles instead of data.frame objects.
>
> I think the bigger frustration is that tibble users are more prone to
> expect all code to work exactly like it does with data.frames. Which it
> obviously doesn't.
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
On Tue, Sep 26, 2017 at 11:56 AM, Gábor Csárdi 
wrote:

>
> I have yet to see an OOP system in which a subclass cannot override the
> methods
> of its superclass. Not only is this in line with OOP paradigms, it is
> actually one of
> the essential OOP features.
>

Fair enough. And I shouldn't have used the word "inherit" in the first
place, we're talking S3 after all. Fwiw, overriding a method to do the
exact same except for one detail isn't encouraged in the OOP world either.


> To be more constructive, if you have a function that only works with
> data frame inputs, then
> it is good practice to check that the supplied input is indeed a data
> frame. This is
> independent of tibbles.
>

Actually it's not independent of tibbles as illustrated by others.
is.data.frame() returns TRUE for tibbles. It doesn't for matrices or
vectors.


>
> In practice it seems to me that an easy fix is to just call
> as.data.frame on the input. This should
> either convert it to a data frame, or throw an error. For tibbles it
> drops the tbl* classes.
>

This would also allow matrices or vectors to be converted to data.frames,
and that might or might not be warranted.

I agree that the S3 system allows you to do this, and think it's up to the
package manager to decide whether or not they would allow their users to
use tibbles instead of data.frame objects.

I think the bigger frustration is that tibble users are more prone to
expect all code to work exactly like it does with data.frames. Which it
obviously doesn't.

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Alexandre Courtiol
David is right,

imagine an old silly code such as:

get_a.data.frame <- function(d) if("data.frame" %in% class(d)) d["a" ,]

This line of code giving you the row "a" of a data.frame could be in any
package.
No matter how ugly it is, it is technically correct and conforms to the
original definition of data.frames.

Now you have a data.frame:

foo <- data.frame(x=1:3, row.names = c("a", "b", "c"))

> geta.data.frame(foo)
[1] 1

this is expected

> geta.data.frame(as.matrix(foo))
[1]

This returns nothing, again it is expected as a matrix is not a data.frame

But here comes the tibble trouble:

> get_a.data.frame(as.tibble(foo))
# A tibble: 1 x 1
  x
  
1NA

And now the old package is broken.
Also if we tolerate this, think what would happen if this kind of practice
would scale up!
If anyone can call any classes the way they want without fulfilling the law
of inheritance we will soon be in big troubles, lost among the mutants.

Tibbles are great, data.frame are widely used, Tibbles should not be of the
class data.frame, unless tibbles start behaving as data.frame do.

Alex









On 26 September 2017 at 12:21, Stefan McKinnon Høj-Edwards 
wrote:

> There is no benefit. It is a rather cumbersome approach to checking whether
> something behaves as you expect it to. `as.data.frame` will force it into
> what you need; if it cannot be forced, then it will fail. That it can be
> converted to a data.frame is the class' designers responsibility, not
> yours. So you can use `as.data.frame` on *any* input that you need to
> behave as a data.frame.
> Consider a grouped tribble; now you have to test 2 different classes.
>
> Kindly,
> Stefan
>
> Stefan McKinnon Høj-Edwards
> ph.d. Genetics
> +44 (0)776 231 2464
> +45 2888 6598
> Skype: stefan_edwards
>
> 2017-09-26 11:15 GMT+01:00 Gábor Csárdi :
>
> > What is the benefit here, compared to just calling as.data.frame() on it?
> >
> > Gabor
> >
> > On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
> > wrote:
> > > Since tibbles add their class attributes first, you could use:
> > >
> > > tb <- tibble(a = 5)
> > > inherits(tb, "data.frame", which = TRUE) == 1
> > >
> > > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
> > could then coerce to data frame: as.data.frame(tb)
> > >
> > > -Ursprüngliche Nachricht-
> > > Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
> > Auftrag von Göran Broström
> > > Gesendet: Dienstag, 26. September 2017 12:09
> > > An: r-package-devel@r-project.org
> > > Betreff: Re: [R-pkg-devel] tibbles are not data frames
> > >
> > >
> > >
> > > On 2017-09-26 11:56, Gábor Csárdi wrote:
> > >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
> > wrote:
> > >>> I don't like the dropping of dimensions either. That doesn't change
> > >>> the fact that a tibble reacts different from a data.frame. So tibbles
> > >>> do not inherit correctly from the class data.frame, and it can thus
> > >>> be argued that it's against OOP paradigms to pretend tibbles inherit
> > >>> from the class data.frame.
> > >>
> > >> I have yet to see an OOP system in which a subclass cannot override
> > >> the methods of its superclass. Not only is this in line with OOP
> > >> paradigms, it is actually one of the essential OOP features.
> > >>
> > >> To be more constructive, if you have a function that only works with
> > >> data frame inputs, then it is good practice to check that the supplied
> > >> input is indeed a data frame. This is independent of tibbles.
> > >
> > > It is not. I check input for being a data frame, but tibbles pass that
> > test. That's the essence of the problem.
> > >
> > >> In practice it seems to me that an easy fix is to just call
> > >> as.data.frame on the input. This should either convert it to a data
> > >> frame, or throw an error.
> > >
> > > Sure, but I still need to rewrite the package.
> > >
> > > Görn
> > >
> > >> For tibbles it
> > >> drops the tbl* classes.
> > >>
> > >> Gabor
> > >>
> > >>> Defensive coding techniques would check if it's a tibble and return
> > >>> an error saying a data.frame is expected. Unless tibbles inherit
> > >>> correctly from data.frame.
> > >>>
> > >>> I have nothing against tibbles. But calling them "data.frame" raises
> > >>> expectations that can't be fulfilled.
> > >>
> > >> [...]
> > >>
> > >> __
> > >> R-package-devel@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> > >>
> > >
> > > __
> > > R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/
> > listinfo/r-package-devel
> > >
> > > --
> > >
> > > _
> > >
> > > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
> > Rechts; Gerichtsstand: Hamburg | www.uke.de
> > > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
> > Dr. Uwe Koch-Gromus, Joachim Pr

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Stefan McKinnon Høj-Edwards
There is no benefit. It is a rather cumbersome approach to checking whether
something behaves as you expect it to. `as.data.frame` will force it into
what you need; if it cannot be forced, then it will fail. That it can be
converted to a data.frame is the class' designers responsibility, not
yours. So you can use `as.data.frame` on *any* input that you need to
behave as a data.frame.
Consider a grouped tribble; now you have to test 2 different classes.

Kindly,
Stefan

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464
+45 2888 6598
Skype: stefan_edwards

2017-09-26 11:15 GMT+01:00 Gábor Csárdi :

> What is the benefit here, compared to just calling as.data.frame() on it?
>
> Gabor
>
> On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
> wrote:
> > Since tibbles add their class attributes first, you could use:
> >
> > tb <- tibble(a = 5)
> > inherits(tb, "data.frame", which = TRUE) == 1
> >
> > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
> could then coerce to data frame: as.data.frame(tb)
> >
> > -Ursprüngliche Nachricht-
> > Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
> Auftrag von Göran Broström
> > Gesendet: Dienstag, 26. September 2017 12:09
> > An: r-package-devel@r-project.org
> > Betreff: Re: [R-pkg-devel] tibbles are not data frames
> >
> >
> >
> > On 2017-09-26 11:56, Gábor Csárdi wrote:
> >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
> wrote:
> >>> I don't like the dropping of dimensions either. That doesn't change
> >>> the fact that a tibble reacts different from a data.frame. So tibbles
> >>> do not inherit correctly from the class data.frame, and it can thus
> >>> be argued that it's against OOP paradigms to pretend tibbles inherit
> >>> from the class data.frame.
> >>
> >> I have yet to see an OOP system in which a subclass cannot override
> >> the methods of its superclass. Not only is this in line with OOP
> >> paradigms, it is actually one of the essential OOP features.
> >>
> >> To be more constructive, if you have a function that only works with
> >> data frame inputs, then it is good practice to check that the supplied
> >> input is indeed a data frame. This is independent of tibbles.
> >
> > It is not. I check input for being a data frame, but tibbles pass that
> test. That's the essence of the problem.
> >
> >> In practice it seems to me that an easy fix is to just call
> >> as.data.frame on the input. This should either convert it to a data
> >> frame, or throw an error.
> >
> > Sure, but I still need to rewrite the package.
> >
> > Görn
> >
> >> For tibbles it
> >> drops the tbl* classes.
> >>
> >> Gabor
> >>
> >>> Defensive coding techniques would check if it's a tibble and return
> >>> an error saying a data.frame is expected. Unless tibbles inherit
> >>> correctly from data.frame.
> >>>
> >>> I have nothing against tibbles. But calling them "data.frame" raises
> >>> expectations that can't be fulfilled.
> >>
> >> [...]
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >
> > __
> > R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/
> listinfo/r-package-devel
> >
> > --
> >
> > _
> >
> > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
> Rechts; Gerichtsstand: Hamburg | www.uke.de
> > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
> Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
> > _
> >
> > SAVE PAPER - THINK BEFORE PRINTING
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread David Hugh-Jones
These replies seem to be missing the point, which is that old code has to
be rewritten because tibbles don't behave like data frames.

It is true that subclasses can override behaviour, but there is an implicit
contract that the same methods should do the same things.

The as.xxx pattern seems weird to me, though I see it a lot. What is the
point of inheritance if you always have to convert an object upwards before
you can treat it as a member of the superclass?

I can see this argument will run...

David

On 26 September 2017 at 11:15, Gábor Csárdi  wrote:

> What is the benefit here, compared to just calling as.data.frame() on it?
>
> Gabor
>
> On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke 
> wrote:
> > Since tibbles add their class attributes first, you could use:
> >
> > tb <- tibble(a = 5)
> > inherits(tb, "data.frame", which = TRUE) == 1
> >
> > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
> could then coerce to data frame: as.data.frame(tb)
> >
> > -Ursprüngliche Nachricht-
> > Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im
> Auftrag von Göran Broström
> > Gesendet: Dienstag, 26. September 2017 12:09
> > An: r-package-devel@r-project.org
> > Betreff: Re: [R-pkg-devel] tibbles are not data frames
> >
> >
> >
> > On 2017-09-26 11:56, Gábor Csárdi wrote:
> >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys 
> wrote:
> >>> I don't like the dropping of dimensions either. That doesn't change
> >>> the fact that a tibble reacts different from a data.frame. So tibbles
> >>> do not inherit correctly from the class data.frame, and it can thus
> >>> be argued that it's against OOP paradigms to pretend tibbles inherit
> >>> from the class data.frame.
> >>
> >> I have yet to see an OOP system in which a subclass cannot override
> >> the methods of its superclass. Not only is this in line with OOP
> >> paradigms, it is actually one of the essential OOP features.
> >>
> >> To be more constructive, if you have a function that only works with
> >> data frame inputs, then it is good practice to check that the supplied
> >> input is indeed a data frame. This is independent of tibbles.
> >
> > It is not. I check input for being a data frame, but tibbles pass that
> test. That's the essence of the problem.
> >
> >> In practice it seems to me that an easy fix is to just call
> >> as.data.frame on the input. This should either convert it to a data
> >> frame, or throw an error.
> >
> > Sure, but I still need to rewrite the package.
> >
> > Görn
> >
> >> For tibbles it
> >> drops the tbl* classes.
> >>
> >> Gabor
> >>
> >>> Defensive coding techniques would check if it's a tibble and return
> >>> an error saying a data.frame is expected. Unless tibbles inherit
> >>> correctly from data.frame.
> >>>
> >>> I have nothing against tibbles. But calling them "data.frame" raises
> >>> expectations that can't be fulfilled.
> >>
> >> [...]
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >
> > __
> > R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/
> listinfo/r-package-devel
> >
> > --
> >
> > _
> >
> > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
> Rechts; Gerichtsstand: Hamburg | www.uke.de
> > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
> Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
> > _
> >
> > SAVE PAPER - THINK BEFORE PRINTING
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Gábor Csárdi
What is the benefit here, compared to just calling as.data.frame() on it?

Gabor

On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke  wrote:
> Since tibbles add their class attributes first, you could use:
>
> tb <- tibble(a = 5)
> inherits(tb, "data.frame", which = TRUE) == 1
>
> if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You could 
> then coerce to data frame: as.data.frame(tb)
>
> -Ursprüngliche Nachricht-
> Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im 
> Auftrag von Göran Broström
> Gesendet: Dienstag, 26. September 2017 12:09
> An: r-package-devel@r-project.org
> Betreff: Re: [R-pkg-devel] tibbles are not data frames
>
>
>
> On 2017-09-26 11:56, Gábor Csárdi wrote:
>> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
>>> I don't like the dropping of dimensions either. That doesn't change
>>> the fact that a tibble reacts different from a data.frame. So tibbles
>>> do not inherit correctly from the class data.frame, and it can thus
>>> be argued that it's against OOP paradigms to pretend tibbles inherit
>>> from the class data.frame.
>>
>> I have yet to see an OOP system in which a subclass cannot override
>> the methods of its superclass. Not only is this in line with OOP
>> paradigms, it is actually one of the essential OOP features.
>>
>> To be more constructive, if you have a function that only works with
>> data frame inputs, then it is good practice to check that the supplied
>> input is indeed a data frame. This is independent of tibbles.
>
> It is not. I check input for being a data frame, but tibbles pass that test. 
> That's the essence of the problem.
>
>> In practice it seems to me that an easy fix is to just call
>> as.data.frame on the input. This should either convert it to a data
>> frame, or throw an error.
>
> Sure, but I still need to rewrite the package.
>
> Görn
>
>> For tibbles it
>> drops the tbl* classes.
>>
>> Gabor
>>
>>> Defensive coding techniques would check if it's a tibble and return
>>> an error saying a data.frame is expected. Unless tibbles inherit
>>> correctly from data.frame.
>>>
>>> I have nothing against tibbles. But calling them "data.frame" raises
>>> expectations that can't be fulfilled.
>>
>> [...]
>>
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>
> __
> R-package-devel@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
>
> _
>
> Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
> Gerichtsstand: Hamburg | www.uke.de
> Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. 
> Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
> _
>
> SAVE PAPER - THINK BEFORE PRINTING
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Daniel Lüdecke
Since tibbles add their class attributes first, you could use:

tb <- tibble(a = 5)
inherits(tb, "data.frame", which = TRUE) == 1

if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You could 
then coerce to data frame: as.data.frame(tb)

-Ursprüngliche Nachricht-
Von: R-package-devel [mailto:r-package-devel-boun...@r-project.org] Im Auftrag 
von Göran Broström
Gesendet: Dienstag, 26. September 2017 12:09
An: r-package-devel@r-project.org
Betreff: Re: [R-pkg-devel] tibbles are not data frames



On 2017-09-26 11:56, Gábor Csárdi wrote:
> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
>> I don't like the dropping of dimensions either. That doesn't change 
>> the fact that a tibble reacts different from a data.frame. So tibbles 
>> do not inherit correctly from the class data.frame, and it can thus 
>> be argued that it's against OOP paradigms to pretend tibbles inherit 
>> from the class data.frame.
> 
> I have yet to see an OOP system in which a subclass cannot override 
> the methods of its superclass. Not only is this in line with OOP 
> paradigms, it is actually one of the essential OOP features.
> 
> To be more constructive, if you have a function that only works with 
> data frame inputs, then it is good practice to check that the supplied 
> input is indeed a data frame. This is independent of tibbles.

It is not. I check input for being a data frame, but tibbles pass that test. 
That's the essence of the problem.

> In practice it seems to me that an easy fix is to just call 
> as.data.frame on the input. This should either convert it to a data 
> frame, or throw an error.

Sure, but I still need to rewrite the package.

Görn

> For tibbles it
> drops the tbl* classes.
> 
> Gabor
> 
>> Defensive coding techniques would check if it's a tibble and return 
>> an error saying a data.frame is expected. Unless tibbles inherit 
>> correctly from data.frame.
>>
>> I have nothing against tibbles. But calling them "data.frame" raises 
>> expectations that can't be fulfilled.
> 
> [...]
> 
> __
> R-package-devel@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

__
R-package-devel@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-package-devel

--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe 
Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_

SAVE PAPER - THINK BEFORE PRINTING
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström



On 2017-09-26 11:56, Gábor Csárdi wrote:

On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:

I don't like the dropping of dimensions either. That doesn't change the
fact that a tibble reacts different from a data.frame. So tibbles do not
inherit correctly from the class data.frame, and it can thus be argued that
it's against OOP paradigms to pretend tibbles inherit from the class
data.frame.


I have yet to see an OOP system in which a subclass cannot override the methods
of its superclass. Not only is this in line with OOP paradigms, it is
actually one of
the essential OOP features.

To be more constructive, if you have a function that only works with
data frame inputs, then
it is good practice to check that the supplied input is indeed a data
frame. This is
independent of tibbles.


It is not. I check input for being a data frame, but tibbles pass that 
test. That's the essence of the problem.



In practice it seems to me that an easy fix is to just call
as.data.frame on the input. This should
either convert it to a data frame, or throw an error.


Sure, but I still need to rewrite the package.

Görn


For tibbles it
drops the tbl* classes.

Gabor


Defensive coding techniques would check if it's a tibble and
return an error saying a data.frame is expected. Unless tibbles inherit
correctly from data.frame.

I have nothing against tibbles. But calling them "data.frame" raises
expectations that can't be fulfilled.


[...]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström

On 2017-09-26 11:35, Joris Meys wrote:
I don't like the dropping of dimensions either. That doesn't change the 
fact that a tibble reacts different from a data.frame. So tibbles do not 
inherit correctly from the class data.frame, and it can thus be argued 
that it's against OOP paradigms to pretend tibbles inherit from the 
class data.frame. Defensive coding techniques would check if it's a 
tibble and return an error saying a data.frame is expected. Unless 
tibbles inherit correctly from data.frame.


The correct and logical way (which I use in 'eha') is to check if input 
is a data frame, and if not, throw an error. Checking for other things 
would soon be too overwhelming.




I have nothing against tibbles. But calling them "data.frame" raises 
expectations that can't be fulfilled.


Exactly what I think. I wouldn't object to changing base data frames to 
behave like tibbles (with a few exceptions).


Göran



On Tue, Sep 26, 2017 at 11:23 AM, Stefan McKinnon Høj-Edwards 
mailto:s...@iysik.com>> wrote:


Thanks for the examples. Personally, I have been struck out multiple
times by data frames dropping dimensions, so I have a distaste for
this dropping behaviour.

Personally, I prefer data frame *not* to drop dimensions. They are
not arrays, where slicing drops a dimension makes sense because all
entries are same data type.
You can pull out a column in vector form from both tribbles and data
frame with the $ index; subsetting a row from a data frame and
forcing it into an atomic vector will require cast all columns to
lowest common denominator, often character.

So I would argue that yes, tribbles are data.frame with extra bells
and whistles, even if I do not understand the use of list columns.

I suggest a defensive coding technique; if you need a data frame
subset to really be a vector, cast it as a vector. Users *will*
attempt to throw unexpected structures at your methods. When your
methods fails in mysterious ways because it didn't extract a vector,
users will be stupefied. Fail at `as.vector` will indicate why.

Kindly,
Stefan

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464 
+45 2888 6598 
Skype: stefan_edwards

2017-09-26 10:05 GMT+01:00 Joris Meys mailto:joris.m...@ugent.be>>:

Here's one difference:

atib <- tibble(a = 1:5, b = letters[5:1])
atib[3,"a"]
as.data.frame(atib)[3,"a"]

The second line returns a tibble (no dropping dimensions), the
third line does (dropping dimensions). Huge difference if you
use [ , aColumn] to select a vector from a data frame.

Cheers
Joris

On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards
mailto:s...@iysik.com>> wrote:

Hi Göran,

Could you please elaborate on which kind of subsetting that
Hadley dislikes?
I am yet to encounter operations on data frames that are not
possible on
tribbles.

Kindly,
Stefan McKinnon Hoj-Edwards

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464 
+45 2888 6598 
Skype: stefan_edwards

2017-09-26 8:30 GMT+01:00 Göran Broström
mailto:goran.brost...@umu.se>>:

 > I am beginning to get complaints from users of my CRAN
packages
 > (especially 'eha') to the effect that they get error
messages like "Error:
 > Unsupported use of matrix or array for column indexing".
 >
 > It turns out that they are sticking in tibbles into
functions that expect
 > data frames as input. And I am using the kind of
subsetting that Hadley
 > dislikes (eha is an old package, much older than
tibbles). It is of course
 > a simple matter to change the code so it handles both
data frames and
 > tibbles correctly, but this affects many functions, and
it will take some
 > time. And when the next guy introduces 'troubles' as an
improvement of
 > 'tibbles', I will have to rewrite the code again.
 >
 > While I like Hadley's way of doing it, I think it is a
mistake to let a
 > tibble also be of class data frame. To me it is a matter
of inheritance and
 > backwards compability: A tibble should add nice things to
a data frame, not
 > change basic behaviour, in order to call itself a data frame.
 >
 > Is it correct to let a tibble be of class "data.frame"?
 >
 > Göran Broström
 >
 > __
 > R-package-devel@r-project.org

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Gábor Csárdi
On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys  wrote:
> I don't like the dropping of dimensions either. That doesn't change the
> fact that a tibble reacts different from a data.frame. So tibbles do not
> inherit correctly from the class data.frame, and it can thus be argued that
> it's against OOP paradigms to pretend tibbles inherit from the class
> data.frame.

I have yet to see an OOP system in which a subclass cannot override the methods
of its superclass. Not only is this in line with OOP paradigms, it is
actually one of
the essential OOP features.

To be more constructive, if you have a function that only works with
data frame inputs, then
it is good practice to check that the supplied input is indeed a data
frame. This is
independent of tibbles.

In practice it seems to me that an easy fix is to just call
as.data.frame on the input. This should
either convert it to a data frame, or throw an error. For tibbles it
drops the tbl* classes.

Gabor

> Defensive coding techniques would check if it's a tibble and
> return an error saying a data.frame is expected. Unless tibbles inherit
> correctly from data.frame.
>
> I have nothing against tibbles. But calling them "data.frame" raises
> expectations that can't be fulfilled.

[...]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
I don't like the dropping of dimensions either. That doesn't change the
fact that a tibble reacts different from a data.frame. So tibbles do not
inherit correctly from the class data.frame, and it can thus be argued that
it's against OOP paradigms to pretend tibbles inherit from the class
data.frame. Defensive coding techniques would check if it's a tibble and
return an error saying a data.frame is expected. Unless tibbles inherit
correctly from data.frame.

I have nothing against tibbles. But calling them "data.frame" raises
expectations that can't be fulfilled.


On Tue, Sep 26, 2017 at 11:23 AM, Stefan McKinnon Høj-Edwards  wrote:

> Thanks for the examples. Personally, I have been struck out multiple times
> by data frames dropping dimensions, so I have a distaste for this dropping
> behaviour.
>
> Personally, I prefer data frame *not* to drop dimensions. They are not
> arrays, where slicing drops a dimension makes sense because all entries are
> same data type.
> You can pull out a column in vector form from both tribbles and data frame
> with the $ index; subsetting a row from a data frame and forcing it into an
> atomic vector will require cast all columns to lowest common denominator,
> often character.
>
> So I would argue that yes, tribbles are data.frame with extra bells and
> whistles, even if I do not understand the use of list columns.
>
> I suggest a defensive coding technique; if you need a data frame subset to
> really be a vector, cast it as a vector. Users *will* attempt to throw
> unexpected structures at your methods. When your methods fails in
> mysterious ways because it didn't extract a vector, users will be
> stupefied. Fail at `as.vector` will indicate why.
>
> Kindly,
> Stefan
>
> Stefan McKinnon Høj-Edwards
> ph.d. Genetics
> +44 (0)776 231 2464 <+44%207762%20312464>
> +45 2888 6598 <+45%2028%2088%2065%2098>
> Skype: stefan_edwards
>
> 2017-09-26 10:05 GMT+01:00 Joris Meys :
>
>> Here's one difference:
>>
>> atib <- tibble(a = 1:5, b = letters[5:1])
>> atib[3,"a"]
>> as.data.frame(atib)[3,"a"]
>>
>> The second line returns a tibble (no dropping dimensions), the third line
>> does (dropping dimensions). Huge difference if you use [ , aColumn] to
>> select a vector from a data frame.
>>
>> Cheers
>> Joris
>>
>> On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards <
>> s...@iysik.com> wrote:
>>
>>> Hi Göran,
>>>
>>> Could you please elaborate on which kind of subsetting that Hadley
>>> dislikes?
>>> I am yet to encounter operations on data frames that are not possible on
>>> tribbles.
>>>
>>> Kindly,
>>> Stefan McKinnon Hoj-Edwards
>>>
>>> Stefan McKinnon Høj-Edwards
>>> ph.d. Genetics
>>> +44 (0)776 231 2464
>>> +45 2888 6598
>>> Skype: stefan_edwards
>>>
>>> 2017-09-26 8:30 GMT+01:00 Göran Broström :
>>>
>>> > I am beginning to get complaints from users of my CRAN packages
>>> > (especially 'eha') to the effect that they get error messages like
>>> "Error:
>>> > Unsupported use of matrix or array for column indexing".
>>> >
>>> > It turns out that they are sticking in tibbles into functions that
>>> expect
>>> > data frames as input. And I am using the kind of subsetting that Hadley
>>> > dislikes (eha is an old package, much older than tibbles). It is of
>>> course
>>> > a simple matter to change the code so it handles both data frames and
>>> > tibbles correctly, but this affects many functions, and it will take
>>> some
>>> > time. And when the next guy introduces 'troubles' as an improvement of
>>> > 'tibbles', I will have to rewrite the code again.
>>> >
>>> > While I like Hadley's way of doing it, I think it is a mistake to let a
>>> > tibble also be of class data frame. To me it is a matter of
>>> inheritance and
>>> > backwards compability: A tibble should add nice things to a data
>>> frame, not
>>> > change basic behaviour, in order to call itself a data frame.
>>> >
>>> > Is it correct to let a tibble be of class "data.frame"?
>>> >
>>> > Göran Broström
>>> >
>>> > __
>>> > R-package-devel@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-package-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Mathematical Modelling, Statistics and Bio-Informatics
>>
>> tel : +32 9 264 59 87 <+32%209%20264%2059%2087>
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Stefan McKinnon Høj-Edwards
Thanks for the examples. Personally, I have been struck out multiple times
by data frames dropping dimensions, so I have a distaste for this dropping
behaviour.

Personally, I prefer data frame *not* to drop dimensions. They are not
arrays, where slicing drops a dimension makes sense because all entries are
same data type.
You can pull out a column in vector form from both tribbles and data frame
with the $ index; subsetting a row from a data frame and forcing it into an
atomic vector will require cast all columns to lowest common denominator,
often character.

So I would argue that yes, tribbles are data.frame with extra bells and
whistles, even if I do not understand the use of list columns.

I suggest a defensive coding technique; if you need a data frame subset to
really be a vector, cast it as a vector. Users *will* attempt to throw
unexpected structures at your methods. When your methods fails in
mysterious ways because it didn't extract a vector, users will be
stupefied. Fail at `as.vector` will indicate why.

Kindly,
Stefan

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464
+45 2888 6598
Skype: stefan_edwards

2017-09-26 10:05 GMT+01:00 Joris Meys :

> Here's one difference:
>
> atib <- tibble(a = 1:5, b = letters[5:1])
> atib[3,"a"]
> as.data.frame(atib)[3,"a"]
>
> The second line returns a tibble (no dropping dimensions), the third line
> does (dropping dimensions). Huge difference if you use [ , aColumn] to
> select a vector from a data frame.
>
> Cheers
> Joris
>
> On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards <
> s...@iysik.com> wrote:
>
>> Hi Göran,
>>
>> Could you please elaborate on which kind of subsetting that Hadley
>> dislikes?
>> I am yet to encounter operations on data frames that are not possible on
>> tribbles.
>>
>> Kindly,
>> Stefan McKinnon Hoj-Edwards
>>
>> Stefan McKinnon Høj-Edwards
>> ph.d. Genetics
>> +44 (0)776 231 2464
>> +45 2888 6598
>> Skype: stefan_edwards
>>
>> 2017-09-26 8:30 GMT+01:00 Göran Broström :
>>
>> > I am beginning to get complaints from users of my CRAN packages
>> > (especially 'eha') to the effect that they get error messages like
>> "Error:
>> > Unsupported use of matrix or array for column indexing".
>> >
>> > It turns out that they are sticking in tibbles into functions that
>> expect
>> > data frames as input. And I am using the kind of subsetting that Hadley
>> > dislikes (eha is an old package, much older than tibbles). It is of
>> course
>> > a simple matter to change the code so it handles both data frames and
>> > tibbles correctly, but this affects many functions, and it will take
>> some
>> > time. And when the next guy introduces 'troubles' as an improvement of
>> > 'tibbles', I will have to rewrite the code again.
>> >
>> > While I like Hadley's way of doing it, I think it is a mistake to let a
>> > tibble also be of class data frame. To me it is a matter of inheritance
>> and
>> > backwards compability: A tibble should add nice things to a data frame,
>> not
>> > change basic behaviour, in order to call itself a data frame.
>> >
>> > Is it correct to let a tibble be of class "data.frame"?
>> >
>> > Göran Broström
>> >
>> > __
>> > R-package-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87 <+32%209%20264%2059%2087>
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Joris Meys
Here's one difference:

atib <- tibble(a = 1:5, b = letters[5:1])
atib[3,"a"]
as.data.frame(atib)[3,"a"]

The second line returns a tibble (no dropping dimensions), the third line
does (dropping dimensions). Huge difference if you use [ , aColumn] to
select a vector from a data frame.

Cheers
Joris

On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards  wrote:

> Hi Göran,
>
> Could you please elaborate on which kind of subsetting that Hadley
> dislikes?
> I am yet to encounter operations on data frames that are not possible on
> tribbles.
>
> Kindly,
> Stefan McKinnon Hoj-Edwards
>
> Stefan McKinnon Høj-Edwards
> ph.d. Genetics
> +44 (0)776 231 2464
> +45 2888 6598
> Skype: stefan_edwards
>
> 2017-09-26 8:30 GMT+01:00 Göran Broström :
>
> > I am beginning to get complaints from users of my CRAN packages
> > (especially 'eha') to the effect that they get error messages like
> "Error:
> > Unsupported use of matrix or array for column indexing".
> >
> > It turns out that they are sticking in tibbles into functions that expect
> > data frames as input. And I am using the kind of subsetting that Hadley
> > dislikes (eha is an old package, much older than tibbles). It is of
> course
> > a simple matter to change the code so it handles both data frames and
> > tibbles correctly, but this affects many functions, and it will take some
> > time. And when the next guy introduces 'troubles' as an improvement of
> > 'tibbles', I will have to rewrite the code again.
> >
> > While I like Hadley's way of doing it, I think it is a mistake to let a
> > tibble also be of class data frame. To me it is a matter of inheritance
> and
> > backwards compability: A tibble should add nice things to a data frame,
> not
> > change basic behaviour, in order to call itself a data frame.
> >
> > Is it correct to let a tibble be of class "data.frame"?
> >
> > Göran Broström
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström

Hej Stefan,

On 2017-09-26 10:57, Stefan McKinnon Høj-Edwards wrote:

Hi Göran,

Could you please elaborate on which kind of subsetting that Hadley dislikes?
I am yet to encounter operations on data frames that are not possible on 
tribbles.


For instance, if 'dat' is a data frame, dat[1:3, 5] returns a vector of 
length 3. If 'dat' is a tibble, you do dat[[5]][1:3] to get the same 
vector. A tibble never 'drops dimensions'. See Hadley's book, on the web.


Göran



Kindly,
Stefan McKinnon Hoj-Edwards

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464
+45 2888 6598
Skype: stefan_edwards

2017-09-26 8:30 GMT+01:00 Göran Broström >:


I am beginning to get complaints from users of my CRAN packages
(especially 'eha') to the effect that they get error messages like
"Error: Unsupported use of matrix or array for column indexing".

It turns out that they are sticking in tibbles into functions that
expect data frames as input. And I am using the kind of subsetting
that Hadley dislikes (eha is an old package, much older than
tibbles). It is of course a simple matter to change the code so it
handles both data frames and tibbles correctly, but this affects
many functions, and it will take some time. And when the next guy
introduces 'troubles' as an improvement of 'tibbles', I will have to
rewrite the code again.

While I like Hadley's way of doing it, I think it is a mistake to
let a tibble also be of class data frame. To me it is a matter of
inheritance and backwards compability: A tibble should add nice
things to a data frame, not change basic behaviour, in order to call
itself a data frame.

Is it correct to let a tibble be of class "data.frame"?

Göran Broström

__
R-package-devel@r-project.org 
mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel





__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Alexandre Courtiol
I could not agree more with Göran and we had to change code in our packages
because if this too. I also see students often facing bugs because of it.
Again, with all the respect I have for Hadley.


On 26 Sep 2017 9:32 a.m., "Göran Broström"  wrote:

I am beginning to get complaints from users of my CRAN packages (especially
'eha') to the effect that they get error messages like "Error: Unsupported
use of matrix or array for column indexing".

It turns out that they are sticking in tibbles into functions that expect
data frames as input. And I am using the kind of subsetting that Hadley
dislikes (eha is an old package, much older than tibbles). It is of course
a simple matter to change the code so it handles both data frames and
tibbles correctly, but this affects many functions, and it will take some
time. And when the next guy introduces 'troubles' as an improvement of
'tibbles', I will have to rewrite the code again.

While I like Hadley's way of doing it, I think it is a mistake to let a
tibble also be of class data frame. To me it is a matter of inheritance and
backwards compability: A tibble should add nice things to a data frame, not
change basic behaviour, in order to call itself a data frame.

Is it correct to let a tibble be of class "data.frame"?

Göran Broström

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Stefan McKinnon Høj-Edwards
Hi Göran,

Could you please elaborate on which kind of subsetting that Hadley dislikes?
I am yet to encounter operations on data frames that are not possible on
tribbles.

Kindly,
Stefan McKinnon Hoj-Edwards

Stefan McKinnon Høj-Edwards
ph.d. Genetics
+44 (0)776 231 2464
+45 2888 6598
Skype: stefan_edwards

2017-09-26 8:30 GMT+01:00 Göran Broström :

> I am beginning to get complaints from users of my CRAN packages
> (especially 'eha') to the effect that they get error messages like "Error:
> Unsupported use of matrix or array for column indexing".
>
> It turns out that they are sticking in tibbles into functions that expect
> data frames as input. And I am using the kind of subsetting that Hadley
> dislikes (eha is an old package, much older than tibbles). It is of course
> a simple matter to change the code so it handles both data frames and
> tibbles correctly, but this affects many functions, and it will take some
> time. And when the next guy introduces 'troubles' as an improvement of
> 'tibbles', I will have to rewrite the code again.
>
> While I like Hadley's way of doing it, I think it is a mistake to let a
> tibble also be of class data frame. To me it is a matter of inheritance and
> backwards compability: A tibble should add nice things to a data frame, not
> change basic behaviour, in order to call itself a data frame.
>
> Is it correct to let a tibble be of class "data.frame"?
>
> Göran Broström
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[R-pkg-devel] tibbles are not data frames

2017-09-26 Thread Göran Broström
I am beginning to get complaints from users of my CRAN packages 
(especially 'eha') to the effect that they get error messages like 
"Error: Unsupported use of matrix or array for column indexing".


It turns out that they are sticking in tibbles into functions that 
expect data frames as input. And I am using the kind of subsetting that 
Hadley dislikes (eha is an old package, much older than tibbles). It is 
of course a simple matter to change the code so it handles both data 
frames and tibbles correctly, but this affects many functions, and it 
will take some time. And when the next guy introduces 'troubles' as an 
improvement of 'tibbles', I will have to rewrite the code again.


While I like Hadley's way of doing it, I think it is a mistake to let a 
tibble also be of class data frame. To me it is a matter of inheritance 
and backwards compability: A tibble should add nice things to a data 
frame, not change basic behaviour, in order to call itself a data frame.


Is it correct to let a tibble be of class "data.frame"?

Göran Broström

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel