Re: [R] Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-14 Thread Prof Brian Ripley
On Mon, 14 Feb 2005, Gregor GORJANC wrote:
Sending this also to r-help so anyone can read it also there and maybe also 
help me with my puzzle if this trivial and I don't see it.
Please don't, and especially do not after having removed the context.
So I have replied only on R-devel.
Prof Brian Ripley wrote:
[... removed some ...]
You add a column, not replace part of a non-existent column.  Isn't that 
obvious, given what you wrote?
Not if you subsequently remove what you wrote and re-post elsewhere, of 
course,

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R] Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-14 Thread Prof Brian Ripley
On Mon, 14 Feb 2005, Gregor GORJANC wrote:
Sending this also to r-help so anyone can read it also there and maybe also 
help me with my puzzle if this trivial and I don't see it.
Please don't, and especially do not after having removed the context.
So I have removed R-help from the follow-up.
Prof Brian Ripley wrote:
[... removed some ...]
The question I answered has been removed here, which is discourteous both 
to your helper and to your readers.

You add a column, not replace part of a non-existent column.  Isn't that 
obvious, given what you wrote?
Not if you subsequently remove what you wrote, of course.
# OK. If I do
tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp[1:2, "y2"] <- 2
tmp
# I am changing nonexistent column y2 in data frame tmp.
# If I do
tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp$y2 <- NA
tmp[1:2, "y2"] <- 2
tmp
# I am changing existent column. I understand now the difference. However,
# it is weird for me that this is OK (if column y2 does not yet exist)
tmp["y2"] <- 2
# but this is not
tmp[1:2, "y2"] <- 2
What is `wierd' is your insistence that this makes sense.  Columns in a 
data frame are required to be the same length.  How is that supposed to be 
made up to the correct length?  Possible for a numeric column with NAs, 
but not sensible for a raw column or a data frame column or 

There is a lot of basic documentation on data manipulation in R/S, and a 
whole chapter in MASS4.  Somehow most other people don't seem to find this 
a problem.
I just ordered MASS4 last week and I am eager to get it in my hands. In 
meanwhile I read quite some documentation and what I more or less saw is

tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp$y2 <- 1:4
tmp$y3 <- 2*tmp$y1
...
...
i.e. everybody is adding full column to data frame. But I would like to add 
just one part.
But you cannot do so and not get a corrupt data frame. All you can hope 
for is to add a column and for something arbitrary to be added to your 
input to do so.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-14 Thread Gregor GORJANC
Hello!
Sending this also to r-help so anyone can read it also there and maybe also 
help me with my puzzle if this trivial and I don't see it.

Prof Brian Ripley wrote:
[... removed some ...]
You add a column, not replace part of a non-existent column.  Isn't that 
obvious, given what you wrote?
# OK. If I do
tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp[1:2, "y2"] <- 2
tmp
# I am changing nonexistent column y2 in data frame tmp.
# If I do
tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp$y2 <- NA
tmp[1:2, "y2"] <- 2
tmp
# I am changing existent column. I understand now the difference. However,
# it is weird for me that this is OK (if column y2 does not yet exist)
tmp["y2"] <- 2
# but this is not
tmp[1:2, "y2"] <- 2
There is a lot of basic documentation on data manipulation in R/S, and a 
whole chapter in MASS4.  Somehow most other people don't seem to find 
this a problem.
I just ordered MASS4 last week and I am eager to get it in my hands. In 
meanwhile I read quite some documentation and what I more or less saw is

tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp$y2 <- 1:4
tmp$y3 <- 2*tmp$y1
...
...
i.e. everybody is adding full column to data frame. But I would like to add 
just one part.

--
Lep pozdrav / With regards,
Gregor GORJANC
---
University of Ljubljana
Biotechnical Faculty   URI: http://www.bfro.uni-lj.si
Zootechnical Departmentmail: gregor.gorjanc  bfro.uni-lj.si
Groblje 3  tel: +386 (0)1 72 17 861
SI-1230 Domzalefax: +386 (0)1 72 17 888
Slovenia
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-14 Thread Prof Brian Ripley
On Mon, 14 Feb 2005, Gregor GORJANC wrote:
Prof Brian Ripley wrote:

You did create a corrupt data frame by using *replacement* on part of 
something that did not exist.  The simple workaround is not to do that. One 
can argue about what should happen in such a case and currently R assumes 
that you know what you are doing and will only treat the data frame as a 
list. We could make this an error, but that would add an overhead to be 
paid by careful users too.

I agree to some extent, however I was very surprised of this behaviour. I 
often deal with data that have missing values and now I really do not know 
how to manage such data. How can one add a column to existing data frame
in such a way, that you don't get corrupted data frames as in my example?
You add a column, not replace part of a non-existent column.  Isn't that 
obvious, given what you wrote?

There is a lot of basic documentation on data manipulation in R/S, and a 
whole chapter in MASS4.  Somehow most other people don't seem to find this 
a problem.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-14 Thread Gregor GORJANC
Hello!
Prof Brian Ripley wrote:
You did create a corrupt data frame by using *replacement* on part of 
something that did not exist.  The simple workaround is not to do that. 
One can argue about what should happen in such a case and currently R 
assumes that you know what you are doing and will only treat the data 
frame as a list. We could make this an error, but that would add an 
overhead to be paid by careful users too.
I agree to some extent, however I was very surprised of this behaviour. I 
often deal with data that have missing values and now I really do not know 
how to manage such data. How can one add a column to existing data frame
in such a way, that you don't get corrupted data frames as in my example?

(tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D"
  y1 f1
1  1  A
2  2  B
3  3  C
4  4  D
# Add new column, which is not full (missing some data for last
# records)
tmp[1:2, "y2"] <- 2
tmp
 y1 f1   y2
1  1  A2
2  2  B2
3  3  C 
4  4  D 
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
I hope that this is not the best solution:
tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
tmp$y2 <- NA
tmp[1:2, "y2"] <- 2
tmp
If you really want to understand what is going on here, please read the 
source code: R is a volunteer project and the volunteers do not have 
time to explain each and every one of your error messages to you -- we 
have already had several goes over including data frames in data frames.
I try to and I hope I did not take to much of your time.
[... removed the rest ...]
--
Lep pozdrav / With regards,
Gregor GORJANC
---
University of Ljubljana
Biotechnical Faculty   URI: http://www.bfro.uni-lj.si
Zootechnical Departmentmail: gregor.gorjanc  bfro.uni-lj.si
Groblje 3  tel: +386 (0)1 72 17 861
SI-1230 Domzalefax: +386 (0)1 72 17 888
Slovenia
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-13 Thread Prof Brian Ripley
On Mon, 14 Feb 2005, Gorjanc Gregor wrote:
Hello!
I posted on saturday mail with the same subject on r-help seeking
for help in my work, but now I realized that this list is more
appropriate for this. I think I found I bug.
You do not tell us what you think it is, though!  It *is* a bug in your 
code.

You did create a corrupt data frame by using *replacement* on part of 
something that did not exist.  The simple workaround is not to do that. 
One can argue about what should happen in such a case and currently R 
assumes that you know what you are doing and will only treat the data 
frame as a list. We could make this an error, but that would add an 
overhead to be paid by careful users too.

If you really want to understand what is going on here, please read the 
source code: R is a volunteer project and the volunteers do not have time 
to explain each and every one of your error messages to you -- we have 
already had several goes over including data frames in data frames.

Bellow are comments
and reproducible examples:
# Create a data frame
(tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D"
 y1 f1
1  1  A
2  2  B
3  3  C
4  4  D
# Add new column, which is not full (missing some data for last
# records)
tmp[1:2, "y2"] <- 2
tmp
 y1 f1   y2
1  1  A2
2  2  B2
3  3  C 
4  4  D 
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
# Why did I get corrupted data frame?
Because you tried to change elements in a non-existent column.
tmp[[3]]
[1] 2 2

# Add new factor column, which is not full (missing some data for last
# records)
tmp[1:2, "f2"] <- tmp[1:2, "f1"]
tmp
 y1 f1   y2   f2
1  1  A21
2  2  B22
3  3  C  
4  4  D  
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
# New column should have class factor, but got somehow converted to integer
class(tmp$f2)
[1] "integer"
# If new column is completely full, everything is OK
tmp$f3 <- tmp$f1
tmp
 y1 f1   y2   f2 f3
1  1  A21  A
2  2  B22  B
3  3  CC
4  4  DD
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
# Let's go further and try to convert one of new numeric column
# to factor
tmp$y2 <- factor(tmp$y2, labels="x")
tmp
 y1 f1 y2   f2 f3
1  1  A  x1  A
2  2  B  x2  B
3  3  C  x   C
4  4  D  x   D
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
# Why did also NAs get converted to level x?
They are *not* NAs: they print as NA with a warning.
# Let's continue and add additional column, which is again not
# full, but missing some data for first records
tmp[3:4, "y3"] <- 1
tmp
 y1 f1 y2   f2 f3 y3
1  1  A  x1  A NA
2  2  B  x2  B NA
3  3  C  x   C  1
4  4  D  x   D  1
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)
# Notice the difference between  in previous example and
# NA in current one.
Yes, we know.  The s are coming from the print, with the warning.
They are unexpected, hence the headers do not line up.
OTOH, for y3 you need to create a 4-long vector, and that is padded with 
numeric NAs.

# Try to convert this to factor
tmp$y3 <- factor(tmp$y3, labels="y")
tmp
 y1 f1 y2   f2 f3   y3
1  1  A  x1  A 
2  2  B  x2  B 
3  3  C  x   Cy
4  4  D  x   Dy
Warning message:
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits)

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

2005-02-13 Thread Gorjanc Gregor
Hello!

I posted on saturday mail with the same subject on r-help seeking
for help in my work, but now I realized that this list is more 
appropriate for this. I think I found I bug. Bellow are comments
and reproducible examples:

# Create a data frame
(tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D"
  y1 f1
1  1  A
2  2  B
3  3  C
4  4  D

# Add new column, which is not full (missing some data for last
# records)
tmp[1:2, "y2"] <- 2
tmp
  y1 f1   y2
1  1  A2
2  2  B2
3  3  C 
4  4  D 
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits) 

# Why did I get corrupted data frame? 

# Add new factor column, which is not full (missing some data for last
# records)
tmp[1:2, "f2"] <- tmp[1:2, "f1"]
tmp
  y1 f1   y2   f2
1  1  A21
2  2  B22
3  3  C  
4  4  D  
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs 
in: format.data.frame(x, digits = digits) 

# New column should have class factor, but got somehow converted to integer
class(tmp$f2)
[1] "integer"

# If new column is completely full, everything is OK
> tmp$f3 <- tmp$f1
> tmp
  y1 f1   y2   f2 f3
1  1  A21  A
2  2  B22  B
3  3  CC
4  4  DD
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs 
in: format.data.frame(x, digits = digits) 

# Let's go further and try to convert one of new numeric column 
# to factor
tmp$y2 <- factor(tmp$y2, labels="x")
tmp
  y1 f1 y2   f2 f3
1  1  A  x1  A
2  2  B  x2  B
3  3  C  x   C
4  4  D  x   D
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs 
in: format.data.frame(x, digits = digits)

# Why did also NAs get converted to level x?

# Let's continue and add additional column, which is again not
# full, but missing some data for first records
tmp[3:4, "y3"] <- 1
tmp
  y1 f1 y2   f2 f3 y3
1  1  A  x1  A NA
2  2  B  x2  B NA
3  3  C  x   C  1
4  4  D  x   D  1
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs
in: format.data.frame(x, digits = digits) 

# Notice the difference between  in previous example and
# NA in current one.

# Try to convert this to factor
tmp$y3 <- factor(tmp$y3, labels="y")
tmp
  y1 f1 y2   f2 f3   y3
1  1  A  x1  A 
2  2  B  x2  B 
3  3  C  x   Cy
4  4  D  x   Dy
Warning message: 
corrupt data frame: columns will be truncated or padded with NAs 
in: format.data.frame(x, digits = digits)

# Works as expected.
# My configuration:
Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 2
 minor = 0.1
 year = 2004
 month = 11
 day = 15
 language = R

Windows XP Professional (build 2600) Service Pack 0.0

--
Lep pozdrav / With regards,
Gregor GORJANC

---
University of Ljubljana
Biotechnical Faculty   URI: http://www.bfro.uni-lj.si
Zootechnical Departmentemail: gregor.gorjanc  bfro.uni-lj.si
Groblje 3  tel: +386 (0)1 72 17 861
SI-1230 Domzalefax: +386 (0)1 72 17 888
Slovenia

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel