Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-13 Thread Gabor Grothendieck
Try this:

> class(attributes(x)$row.names)
[1] "integer"
> rownames(x) <- as.character(rownames(x))
> class(attributes(x)$row.names)
[1] "character"

On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> Reading the list of changes for R version 2.4.0, I was happy to see that the
> row names of dataframes can be stored compactly (as the integer n when
> row.names(df) is 1:n).
>
> help(row.names) contains this paragraph:
>
>Row names of the form '1:n' for 'n > 2' are stored internally in a
>compact form, which might be seen by calling 'attributes' but never
>via 'row.names' or 'attr(x, "row.names")'.
>
> I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> misreading the documentation?  Does "might be seen" mean "possibly in some
> future version of R" in this case?
>
> > (x <- as.data.frame(matrix(1:9, nrow=3)))
>  V1 V2 V3
> 1  1  4  7
> 2  2  5  8
> 3  3  6  9
> > attributes(x)$row.names
> [1] 1 2 3
> > row.names(x) <- seq(len=nrow(x))
> > attributes(x)$row.names
> [1] 1 2 3
>
> Best,
> Hsiu-Khuern.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-13 Thread Hsiu-Khuern Tang
Hi Gabor,

* On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck ([EMAIL PROTECTED]) wrote:
> Try this:
> 
> >class(attributes(x)$row.names)
> [1] "integer"
> >rownames(x) <- as.character(rownames(x))
> >class(attributes(x)$row.names)
> [1] "character"

Yes, but this doesn't show that row.names was stored as a _single_
integer (3) instead of a vector of integers (1:3).

Reading the changes again:

The internal storage of row.names = 1:n just records 'n', for
efficiency with very long vectors.

The "row.names" attribute must be a character or integer
vector, and this is now enforced by the C code.

I think row.names is always _printed_ as a vector.  I had misinterpreted the
help(row.names) paragraph in my original posting to mean that the internal
storage can be revealed by attributes(x, "row.names").  That paragraph implies
that attributes(x)$row.names and attr(x, "row.names") can have different
classes, but I can't create such an example.

I did this experiment:

> n <- 1
> x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> head(x)
  V1V2
1  1 10001
2  2 10002
3  3 10003
4  4 10004
5  5 10005
6  6 10006
> class(attributes(x)$row.names)
[1] "integer"
> save(x, file="x1", compress=FALSE)
> row.names(x) <- 2:(n+1)
> class(attributes(x)$row.names)
[1] "integer"
> save(x, file="x2", compress=FALSE)
> subset(file.info(c("x1", "x2")), select=size)
 size
x1  80205
x2 120197

The difference in size is about nrow(x) * 4 bytes.  I think this shows that 1:n
was stored compactly as a single integer but 2:(n+1) was not.

> On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> >Reading the list of changes for R version 2.4.0, I was happy to see that 
> >the
> >row names of dataframes can be stored compactly (as the integer n when
> >row.names(df) is 1:n).
> >
> >help(row.names) contains this paragraph:
> >
> >   Row names of the form '1:n' for 'n > 2' are stored internally in a
> >   compact form, which might be seen by calling 'attributes' but never
> >   via 'row.names' or 'attr(x, "row.names")'.
> >
> >I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> >misreading the documentation?  Does "might be seen" mean "possibly in some
> >future version of R" in this case?
> >
> >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> > V1 V2 V3
> >1  1  4  7
> >2  2  5  8
> >3  3  6  9
> >> attributes(x)$row.names
> >[1] 1 2 3
> >> row.names(x) <- seq(len=nrow(x))
> >> attributes(x)$row.names
> >[1] 1 2 3
> >
> >Best,
> >Hsiu-Khuern.
> >
> >__
> >R-help@stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide 
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >

Best,
Hsiu-Khuern.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-13 Thread jim holtman
Take a look with 'dput' and you will see the difference:

> row.names(x) <- 1:n
> dput(x)
structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(NA,
10), class = "data.frame")
> row.names(x) <- 2:(n+1)
> dput(x)
structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(2,
3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")
>

'row.names' is different.

On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> Hi Gabor,
>
> * On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck ([EMAIL PROTECTED]) wrote:
> > Try this:
> >
> > >class(attributes(x)$row.names)
> > [1] "integer"
> > >rownames(x) <- as.character(rownames(x))
> > >class(attributes(x)$row.names)
> > [1] "character"
>
> Yes, but this doesn't show that row.names was stored as a _single_
> integer (3) instead of a vector of integers (1:3).
>
> Reading the changes again:
>
>The internal storage of row.names = 1:n just records 'n', for
>efficiency with very long vectors.
>
>The "row.names" attribute must be a character or integer
>vector, and this is now enforced by the C code.
>
> I think row.names is always _printed_ as a vector.  I had misinterpreted the
> help(row.names) paragraph in my original posting to mean that the internal
> storage can be revealed by attributes(x, "row.names").  That paragraph implies
> that attributes(x)$row.names and attr(x, "row.names") can have different
> classes, but I can't create such an example.
>
> I did this experiment:
>
> > n <- 1
> > x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> > head(x)
>  V1V2
> 1  1 10001
> 2  2 10002
> 3  3 10003
> 4  4 10004
> 5  5 10005
> 6  6 10006
> > class(attributes(x)$row.names)
> [1] "integer"
> > save(x, file="x1", compress=FALSE)
> > row.names(x) <- 2:(n+1)
> > class(attributes(x)$row.names)
> [1] "integer"
> > save(x, file="x2", compress=FALSE)
> > subset(file.info(c("x1", "x2")), select=size)
> size
> x1  80205
> x2 120197
>
> The difference in size is about nrow(x) * 4 bytes.  I think this shows that 
> 1:n
> was stored compactly as a single integer but 2:(n+1) was not.
>
> > On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> > >Reading the list of changes for R version 2.4.0, I was happy to see that
> > >the
> > >row names of dataframes can be stored compactly (as the integer n when
> > >row.names(df) is 1:n).
> > >
> > >help(row.names) contains this paragraph:
> > >
> > >   Row names of the form '1:n' for 'n > 2' are stored internally in a
> > >   compact form, which might be seen by calling 'attributes' but never
> > >   via 'row.names' or 'attr(x, "row.names")'.
> > >
> > >I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> > >misreading the documentation?  Does "might be seen" mean "possibly in some
> > >future version of R" in this case?
> > >
> > >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> > > V1 V2 V3
> > >1  1  4  7
> > >2  2  5  8
> > >3  3  6  9
> > >> attributes(x)$row.names
> > >[1] 1 2 3
> > >> row.names(x) <- seq(len=nrow(x))
> > >> attributes(x)$row.names
> > >[1] 1 2 3
> > >
> > >Best,
> > >Hsiu-Khuern.
> > >
> > >__
> > >R-help@stat.math.ethz.ch mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> > >
>
> Best,
> Hsiu-Khuern.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-13 Thread Hsiu-Khuern Tang
* On Fri 10:14PM, 13 Oct 2006, jim holtman ([EMAIL PROTECTED]) wrote:
> Take a look with 'dput' and you will see the difference:
> 
> >row.names(x) <- 1:n
> >dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = 
> c(NA,
> 10), class = "data.frame")
> >row.names(x) <- 2:(n+1)
> >dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = 
> c(2,
> 3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")
> >
> 
> 'row.names' is different.

So it is!  Thank you!  This also explains why the n x 2 dataframe became
larger by exactly (n-2) * 4 bytes when row.names changed from 1:n to
2:(n+1).

> On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> >Hi Gabor,
> >
> >* On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck 
> >([EMAIL PROTECTED]) wrote:
> >> Try this:
> >>
> >> >class(attributes(x)$row.names)
> >> [1] "integer"
> >> >rownames(x) <- as.character(rownames(x))
> >> >class(attributes(x)$row.names)
> >> [1] "character"
> >
> >Yes, but this doesn't show that row.names was stored as a _single_
> >integer (3) instead of a vector of integers (1:3).
> >
> >Reading the changes again:
> >
> >   The internal storage of row.names = 1:n just records 'n', for
> >   efficiency with very long vectors.
> >
> >   The "row.names" attribute must be a character or integer
> >   vector, and this is now enforced by the C code.
> >
> >I think row.names is always _printed_ as a vector.  I had misinterpreted 
> >the
> >help(row.names) paragraph in my original posting to mean that the internal
> >storage can be revealed by attributes(x, "row.names").  That paragraph 
> >implies
> >that attributes(x)$row.names and attr(x, "row.names") can have different
> >classes, but I can't create such an example.
> >
> >I did this experiment:
> >
> >> n <- 1
> >> x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> >> head(x)
> > V1V2
> >1  1 10001
> >2  2 10002
> >3  3 10003
> >4  4 10004
> >5  5 10005
> >6  6 10006
> >> class(attributes(x)$row.names)
> >[1] "integer"
> >> save(x, file="x1", compress=FALSE)
> >> row.names(x) <- 2:(n+1)
> >> class(attributes(x)$row.names)
> >[1] "integer"
> >> save(x, file="x2", compress=FALSE)
> >> subset(file.info(c("x1", "x2")), select=size)
> >size
> >x1  80205
> >x2 120197
> >
> >The difference in size is about nrow(x) * 4 bytes.  I think this shows 
> >that 1:n
> >was stored compactly as a single integer but 2:(n+1) was not.
> >
> >> On 10/13/06, Hsiu-Khuern Tang <[EMAIL PROTECTED]> wrote:
> >> >Reading the list of changes for R version 2.4.0, I was happy to see that
> >> >the
> >> >row names of dataframes can be stored compactly (as the integer n when
> >> >row.names(df) is 1:n).
> >> >
> >> >help(row.names) contains this paragraph:
> >> >
> >> >   Row names of the form '1:n' for 'n > 2' are stored internally in a
> >> >   compact form, which might be seen by calling 'attributes' but never
> >> >   via 'row.names' or 'attr(x, "row.names")'.
> >> >
> >> >I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> >> >misreading the documentation?  Does "might be seen" mean "possibly in 
> >some
> >> >future version of R" in this case?
> >> >
> >> >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> >> > V1 V2 V3
> >> >1  1  4  7
> >> >2  2  5  8
> >> >3  3  6  9
> >> >> attributes(x)$row.names
> >> >[1] 1 2 3
> >> >> row.names(x) <- seq(len=nrow(x))
> >> >> attributes(x)$row.names
> >> >[1] 1 2 3
> >> >
> >> >Best,
> >> >Hsiu-Khuern.
> >> >
> >> >__
> >> >R-help@stat.math.ethz.ch mailing list
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible code.
> >> >
> >
> >Best,
> >Hsiu-Khuern.
> >
> >__
> >R-help@stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide 
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?

Best,
Hsiu-Khuern.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-14 Thread Prof Brian Ripley
On Fri, 13 Oct 2006, Hsiu-Khuern Tang wrote:

> Reading the list of changes for R version 2.4.0, I was happy to see that the
> row names of dataframes can be stored compactly (as the integer n when
> row.names(df) is 1:n).
>
> help(row.names) contains this paragraph:
>
>Row names of the form '1:n' for 'n > 2' are stored internally in a
>compact form, which might be seen by calling 'attributes' but never
>via 'row.names' or 'attr(x, "row.names")'.
>
> I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> misreading the documentation?

Definitely.  It does not say the 'compact form' is 'just nrow(x)', does 
it?  (It is not.)

> Does "might be seen" mean "possibly in some
> future version of R" in this case?

It is not intended that the user ever sees the compact form from R code 
(it can be seen from C code and also by deparsing), but there were 
circumstances under which attributes() would show it (but no longer, I 
believe).

>> (x <- as.data.frame(matrix(1:9, nrow=3)))
>  V1 V2 V3
> 1  1  4  7
> 2  2  5  8
> 3  3  6  9
>> attributes(x)$row.names
> [1] 1 2 3
>> row.names(x) <- seq(len=nrow(x))
>> attributes(x)$row.names
> [1] 1 2 3

But see what dump() gives you on that object.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to see if row names of a dataframe are stored compactly

2006-10-14 Thread Prof Brian Ripley
On Fri, 13 Oct 2006, jim holtman wrote:

> Take a look with 'dput' and you will see the difference:
>
>> row.names(x) <- 1:n
>> dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = 
> c(NA,
> 10), class = "data.frame")
>> row.names(x) <- 2:(n+1)
>> dput(x)
> structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
> 12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(2,
> 3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")
>>
>
> 'row.names' is different.

Unfortunately, dput() does not give an accurate representation of these 
objects: see the warnings on its help page.  That is why I mention dump() 
not dput() in my reply.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.