Re: [Rd] read.table: wrong error message? (PR#10592)

2008-01-21 Thread Peter Dalgaard
[EMAIL PROTECTED] wrote:
 --Apple-Mail-44--797532055
 Content-Type: text/plain;
   charset=US-ASCII;
   format=flowed;
   delsp=yes
 Content-Transfer-Encoding: 7bit

 I believe read.table may report misleading errors. In this example,  
 where a header line in a file has an incorrect number of row names (28  
 instead of 29), I get the error message duplicate row.names are not  
 allowed.

 However, I cannot not find any duplicate row names. Fixing the header  
 line by adding an extra row name, however, avoids the error.

 The behavior is confusing - I would expect a different error message,  
 even if I should have quoted some row names.
   
Arguably, read.table() is too smart for its own good at times, but this
_is_ documented behaviour:


 If 'row.names' is not specified and the header line has one less
 entry than the number of columns, the first column is taken to be
 the row names.  This allows data frames to be read in from the
 format in which they are printed.  If 'row.names' is specified and
 does not refer to the first column, that column is discarded from
 such files.

So, if your COLUMN (sic!) name count is off by one, read.table() takes
the first variable as row.names, and when this is a 0/1 variable, you
will have duplicate names.

 I am attaching a sample file that reproduces the problem with R -- 
 vanilla, and then: read.table(m2).  I get it in 2.5.1 and 2.6.0.

 PS.:

 R version 2.6.0 (2007-10-03)
 ...
   x - read.table(m2, header=T)
 Error in read.table(m2, header = T) :
duplicate 'row.names' are not allowed
   traceback()
 2: stop(duplicate 'row.names' are not allowed)
 1: read.table(m2, header = T)
  

   
 --Apple-Mail-44--797532055
 Content-Disposition: attachment;
   filename=m2
 Content-Type: application/octet-stream;
   x-unix-mode=0644;
   name=m2
 Content-Transfer-Encoding: 7bit

 primed rule role dist starttime target.utt rule.freq primeperiod.length 
 dialogue.length dialogue.id words.repeated words.repeated.prop head.repeated 
 head.freq head.pos prime.gaze target.gaze eyecontact familiar convseq length 
 doc.score friend task.familiar same.specrule derivation pathlen distituent
 0 vp---vbg-vp i 2 5.67 4 False 172 70 261.5322 1 0 0/7 True na None None None 
 1 0 1 7 135 - - 1 None - -
 0 vp---to-vp r 1 6.03 4 False 758 13 261.5322 1 0 0/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 0 vp---to-vp i 2 6.03 4 False 758 70 261.5322 1 0 0/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 0 s---cc-s i 1 11.3 5 False 813 83 261.5322 1 0 2/9 True na None None None 1 
 0 1 9 135 - - 1 None - -
 0 s---cc-s i 3 11.3 5 False 813 70 261.5322 1 0 0/9 True na None None None 1 
 0 1 9 135 - - 1 None - -
 0 s---advp-s i 3 12.71 5 False 440 70 261.5322 1 0 0/8 True na None None None 
 1 0 1 8 135 - - 1 None - -
 1 vp---ber-vp i 1 12.98 5 False 406 83 261.5322 1 0 2/7 True na None None 
 None 1 0 1 7 135 - - 1 None - -
 1 vp---vbg-vp i 1 13.11 5 False 172 83 261.5322 1 0 3/7 True na None None 
 None 1 0 1 7 135 - - 1 None - -
 1 vp---to-vp i 1 13.52 5 False 758 83 261.5322 1 0 2/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 1 pp---ql-rp-pp i 1 14.12 5 False 282 83 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 0 pp---ql-rp-pp r 2 14.12 5 False 282 13 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 0 ap---ap r 2 18.75 5 False 120 13 261.5322 1 0 0/1 True na None None None 1 
 0 1 1 135 - - 1 None - -
 0 pp---pp-pp i 3 22.7 6 False 429 13 261.5322 1 0 0/4 True na None None None 
 1 0 1 4 135 - - 1 None - -
 1 pp---ql-rp r 2 22.7 6 False 505 83 261.5322 1 0 1/2 True na None None None 
 1 0 1 2 135 - - 1 None - -
 0 pp---rp-pp r 2 23.54 6 False 1124 83 261.5322 1 0 0/2 True na None None 
 None 1 0 1 2 135 - - 1 None - -
 0 pp---pp-cc-rb-pp i 2 25.27 7 False 64 200 261.5322 1 0 1/6 True na None 
 None None 1 0 1 6 135 - - 1 None - -
 0 pp---pp-cc-rb-pp r 4 25.27 7 False 64 13 261.5322 1 0 0/6 True na None None 
 None 1 0 1 6 135 - - 1 None - -
 1 pp---ql-rp-pp i 2 25.99 7 False 282 200 261.5322 1 0 2/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 1 pp---ql-rp-pp i 3 25.99 7 False 282 83 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -

 --Apple-Mail-44--797532055--

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.table: wrong error message? (PR#10592)

2008-01-21 Thread ripley
On Sun, 20 Jan 2008, [EMAIL PROTECTED] wrote:


 --Apple-Mail-44--797532055
 Content-Type: text/plain;
   charset=US-ASCII;
   format=flowed;
   delsp=yes
 Content-Transfer-Encoding: 7bit

 I believe read.table may report misleading errors. In this example,
 where a header line in a file has an incorrect number of row names (28
 instead of 29), I get the error message duplicate row.names are not
 allowed.

The first column of your file repeats 0 and 1, and your specification 
asked for the first column to be taken as the row names: from the help 
file

  If 'row.names' is not specified and the header line has one less
  entry than the number of columns, the first column is taken to be
  the row names.

 However, I cannot not find any duplicate row names. Fixing the header
 line by adding an extra row name, however, avoids the error.

An extra *column* name?  The behaviour of header=TRUE depends on the 
number of fields on the header line: see the quote above.

 The behavior is confusing - I would expect a different error message,
 even if I should have quoted some row names.

It may have confused you, but it has been in R/S for 20 years and is 
described in the help, the 'R Data Import/Export Manual', most books on 
R/S 

It is fortunate that you did get an error message: if you give an input 
format different from what you intended it can easily be valid but not 
that you intended.

If you want to be safer, supply the argument 'row.names'.

 I am attaching a sample file that reproduces the problem with R --
 vanilla, and then: read.table(m2).  I get it in 2.5.1 and 2.6.0.

Also 2.6,1 and R-devel, as it is the documented behaviour.


 PS.:

 R version 2.6.0 (2007-10-03)
 ...
  x - read.table(m2, header=T)
 Error in read.table(m2, header = T) :
   duplicate 'row.names' are not allowed
  traceback()
 2: stop(duplicate 'row.names' are not allowed)
 1: read.table(m2, header = T)
 


 --Apple-Mail-44--797532055
 Content-Disposition: attachment;
   filename=m2
 Content-Type: application/octet-stream;
   x-unix-mode=0644;
   name=m2
 Content-Transfer-Encoding: 7bit

 primed rule role dist starttime target.utt rule.freq primeperiod.length 
 dialogue.length dialogue.id words.repeated words.repeated.prop head.repeated 
 head.freq head.pos prime.gaze target.gaze eyecontact familiar convseq length 
 doc.score friend task.familiar same.specrule derivation pathlen distituent
 0 vp---vbg-vp i 2 5.67 4 False 172 70 261.5322 1 0 0/7 True na None None None 
 1 0 1 7 135 - - 1 None - -
 0 vp---to-vp r 1 6.03 4 False 758 13 261.5322 1 0 0/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 0 vp---to-vp i 2 6.03 4 False 758 70 261.5322 1 0 0/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 0 s---cc-s i 1 11.3 5 False 813 83 261.5322 1 0 2/9 True na None None None 1 
 0 1 9 135 - - 1 None - -
 0 s---cc-s i 3 11.3 5 False 813 70 261.5322 1 0 0/9 True na None None None 1 
 0 1 9 135 - - 1 None - -
 0 s---advp-s i 3 12.71 5 False 440 70 261.5322 1 0 0/8 True na None None None 
 1 0 1 8 135 - - 1 None - -
 1 vp---ber-vp i 1 12.98 5 False 406 83 261.5322 1 0 2/7 True na None None 
 None 1 0 1 7 135 - - 1 None - -
 1 vp---vbg-vp i 1 13.11 5 False 172 83 261.5322 1 0 3/7 True na None None 
 None 1 0 1 7 135 - - 1 None - -
 1 vp---to-vp i 1 13.52 5 False 758 83 261.5322 1 0 2/6 True na None None None 
 1 0 1 6 135 - - 1 None - -
 1 pp---ql-rp-pp i 1 14.12 5 False 282 83 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 0 pp---ql-rp-pp r 2 14.12 5 False 282 13 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 0 ap---ap r 2 18.75 5 False 120 13 261.5322 1 0 0/1 True na None None None 1 
 0 1 1 135 - - 1 None - -
 0 pp---pp-pp i 3 22.7 6 False 429 13 261.5322 1 0 0/4 True na None None None 
 1 0 1 4 135 - - 1 None - -
 1 pp---ql-rp r 2 22.7 6 False 505 83 261.5322 1 0 1/2 True na None None None 
 1 0 1 2 135 - - 1 None - -
 0 pp---rp-pp r 2 23.54 6 False 1124 83 261.5322 1 0 0/2 True na None None 
 None 1 0 1 2 135 - - 1 None - -
 0 pp---pp-cc-rb-pp i 2 25.27 7 False 64 200 261.5322 1 0 1/6 True na None 
 None None 1 0 1 6 135 - - 1 None - -
 0 pp---pp-cc-rb-pp r 4 25.27 7 False 64 13 261.5322 1 0 0/6 True na None None 
 None 1 0 1 6 135 - - 1 None - -
 1 pp---ql-rp-pp i 2 25.99 7 False 282 200 261.5322 1 0 2/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -
 1 pp---ql-rp-pp i 3 25.99 7 False 282 83 261.5322 1 0 0/3 True na None None 
 None 1 0 1 3 135 - - 1 None - -

 --Apple-Mail-44--797532055--

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


Re: [Rd] read.table: wrong error message? (PR#10592)

2008-01-21 Thread david . reitter

--Apple-Mail-1--729254567
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit

On 21 Jan 2008, at 11:38, Prof Brian Ripley wrote:

 The first column of your file repeats 0 and 1, and your  
 specification asked for the first column to be taken as the row  
 names: from the help file

 If 'row.names' is not specified and the header line has one less
 entry than the number of columns, the first column is taken to be
 the row names.

I understand now. Thanks for the explanation.

I would have figured it out myself had I paid attention to the fact  
that the error message reports row names rather than column names,  
or had the error message actually contained the conflicting names  
(either 0 or 1) or the full set of read names.

--
David Reitter
ICCS/HCRC, Informatics, University of Edinburgh
http://www.david-reitter.com





--Apple-Mail-1--729254567
Content-Disposition: attachment;
filename=smime.p7s
Content-Type: application/pkcs7-signature;
name=smime.p7s
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFxDCCAn0w
ggHmoAMCAQICEEfSZyInpN9bVdFPsdICbiIwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkEx
JTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQ
ZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MTIwOTIzMDM0OFoXDTA4MTIwODIzMDM0
OFowYjEQMA4GA1UEBBMHUmVpdHRlcjEOMAwGA1UEKhMFRGF2aWQxFjAUBgNVBAMTDURhdmlkIFJl
aXR0ZXIxJjAkBgkqhkiG9w0BCQEWF2RhdmlkLnJlaXR0ZXJAZ21haWwuY29tMIGfMA0GCSqGSIb3
DQEBAQUAA4GNADCBiQKBgQDIS4kZGULD+CxprkxDnEccAnZ2GQQxfn55aiAPoy5kJ+uLhelaZP4p
lgx3Vq8xYK1bjsLepGYp8qzVlHsyQPb6OfvRzrUXHxlJrHME0+Lblx37PBLxGDxc5W8qfZFJpej8
Dk3LUMga8laR9CRghbALoTV3UThUSMqk1cVwTY99KQIDAQABozQwMjAiBgNVHREEGzAZgRdkYXZp
ZC5yZWl0dGVyQGdtYWlsLmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBQUAA4GBAKU8XIS6
R/R3Hg6ae5p0i4iUJURq+sDVMX9RGB3Ge5V8oKKglIQGaY03leem65pqICL6mdgX/Px21JoaKVAA
8XITLFVEFHL6BfZ/eePnzNi93rnaNe7hLUwLdGztdYmqN8zF0DOCybL1M1JDB3cTr0XwHJwxziSf
oEeENGNETykjMIIDPzCCAqigAwIBAgIBDTANBgkqhkiG9w0BAQUFADCB0TELMAkGA1UEBhMCWkEx
FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYDVQQKExFUaGF3
dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjEk
MCIGA1UEAxMbVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIENBMSswKQYJKoZIhvcNAQkBFhxwZXJz
b25hbC1mcmVlbWFpbEB0aGF3dGUuY29tMB4XDTAzMDcxNzAwMDAwMFoXDTEzMDcxNjIzNTk1OVow
YjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAq
BgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMIGfMA0GCSqGSIb3DQEB
AQUAA4GNADCBiQKBgQDEpjxVc1X7TrnKmVoeaMB1BHCd3+n/ox7svc31W/Iadr1/DDph8r9RzgHU
5VAKMNcCY1osiRVwjt3J8CuFWqo/cVbLrzwLB+fxH5E2JCoTzyvV84J3PQO+K/67GD4Hv0CAAmTX
p6a7n2XRxSpUhQ9IBH+nttE8YQRAHmQZcmC3+wIDAQABo4GUMIGRMBIGA1UdEwEB/wQIMAYBAf8C
AQAwQwYDVR0fBDwwOjA4oDagNIYyaHR0cDovL2NybC50aGF3dGUuY29tL1RoYXd0ZVBlcnNvbmFs
RnJlZW1haWxDQS5jcmwwCwYDVR0PBAQDAgEGMCkGA1UdEQQiMCCkHjAcMRowGAYDVQQDExFQcml2
YXRlTGFiZWwyLTEzODANBgkqhkiG9w0BAQUFAAOBgQBIjNFQg+oLLswNo2asZw9/r6y+whehQ5aU
nX9MIbj4Nh+qLZ82L8D0HFAgk3A8/a3hYWLD2ToZfoSxmRsAxRoLgnSeJVCUYsfbJ3FXJY3dqZw5
jowgT2Vfldr394fWxghOrvbqNOUQGls1TXfjViF4gtwhGTXeJLHTHUb/XV9lTzGCAo8wggKLAgEB
MHYwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4x
LDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAhBH0mciJ6TfW1XR
T7HSAm4iMAkGBSsOAwIaBQCgggFvMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcN
AQkFMQ8XDTA4MDEyMTExNTA0NVowIwYJKoZIhvcNAQkEMRYEFJN8Qc4lkZZVU7GKZnk9AnkCkaj4
MIGFBgkrBgEEAYI3EAQxeDB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3Vs
dGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWlu
ZyBDQQIQR9JnIiek31tV0U+x0gJuIjCBhwYLKoZIhvcNAQkQAgsxeKB2MGIxCzAJBgNVBAYTAlpB
MSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUg
UGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQR9JnIiek31tV0U+x0gJuIjANBgkqhkiG9w0B
AQEFAASBgJaMRB2ro5JDA0vHTeC8kLMSU6sI1OF4hew1+u3BYe5iIEgoTM7rh8ICl5v7v2JUqCme
vIq+bG3Hm81MTr9f3WNumrDyCTJHIthKyn8yJkk8mXT7rGSwR3378iFu0J8IZZ6Fa/895enf9FLs
7119hVIlS0G4t4/d1R8hbCevrUtQ

--Apple-Mail-1--729254567--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.table: wrong error message? (PR#10592)

2008-01-21 Thread Prof Brian Ripley

On Mon, 21 Jan 2008, David Reitter wrote:


On 21 Jan 2008, at 11:38, Prof Brian Ripley wrote:


The first column of your file repeats 0 and 1, and your specification asked 
for the first column to be taken as the row names: from the help file


   If 'row.names' is not specified and the header line has one less
   entry than the number of columns, the first column is taken to be
   the row names.


I understand now. Thanks for the explanation.

I would have figured it out myself had I paid attention to the fact that the 
error message reports row names rather than column names, or had the 
error message actually contained the conflicting names (either 0 or 1) or 
the full set of read names.


We can fairly easily do something along those lines: I've added code to 
R-devel to do



library(MASS)
row.names(hills)[35] - Lomonds

Error in `row.names-.data.frame`(`*tmp*`, value = c(Greenmantle, Carnethy, 
 :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘Lomonds’

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel