Re: [R] Reading Text files from UK Met Office into R again...
First one needs to remove the extraneous line-ends that you created by using an editor that inserts those line-ends (or perhaps it was your mail-client that added them because you failed to post in plain-text. I removed those files "by hand" and then created a text "file". txt <- "2015-01-01 00:00, 03002, WMO, SYNOP, 1, 12, 1011, 4, 7, 200, 18, 82, , , 8, , , , , 100, 450, 1005.4, 5, , 102, 4, , 129, , , , , , , , 8.7, 7.5, 8.1,1003.6, , , , , , , 1, 1, 1, , , 1, , , , , 1, 1, 1, 1, 1, 1, , 1, , 1, 1, , , , , , , , , , 1, , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , , 2015-01-01 00:00, 03005, WMO, SYNOP, 1, 9, 1011, 4, 1, 210, 26, 62, 8, 6, ,8, 8, , , 8, 30, 700, 1006, 1, 8, 54, 7, 6, 105, , , , , , , , 8.6, 7.3, 8, 996.1, , 01, , , , , 1, 1, 1, 1, 1, 1, 1, , , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, , , , , , , , 1, , , , , 2014-12-31 23:55, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , 0, 1 2015-01-01 00:00, 03006, WMO, SYNOP, 1, 10, 1011, 4, 6, 210, 23, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 1, 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , , , , , , , A, , , , 2015-01-01 00:00, 03010, WMO, SYNOP, 1, 17, 1011, 4, 6, 230, 21, , , , , , , , , , , 1006.1, , , , , , , , , , , , , , 9.4, 6.2, 7.9, , , , , , , , 1, 1, , , , , , , , , , , 1, 1, 1, 1, , , , , , , , , , , , , , , , , , , ," # Then use `count.fields` count.fields(file=textConnection(txt)) [1] 104 106 105 81 # So i'm guessing you arbitrarily snipped in the middl of own of the text lines dat <- read.table(text=txt, sep=",", fill=TRUE, row.names=NULL, head=FALSE) str(dat) 'data.frame': 4 obs. of 105 variables: $ V1 : chr "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" $ V2 : int 3002 3005 3006 3010 $ V3 : chr " WMO" " WMO" " WMO" " WMO" $ V4 : chr " SYNOP" " SYNOP" " SYNOP" " SYNOP" $ V5 : int 1 1 1 1 $ V6 : int 12 9 10 17 $ V7 : int 1011 1011 1011 1011 $ V8 : int 4 4 4 4 $ V9 : int 7 1 6 6 $ V10 : int 200 210 210 230 $ V11 : int 18 26 23 21 $ V12 : int 82 62 NA NA $ V13 : int NA 8 NA NA $ V14 : int NA 6 NA NA $ V15 : int 8 NA NA NA $ V16 : int NA 8 NA NA $ V17 : int NA 8 NA NA $ V18 : logi NA NA NA NA $ V19 : logi NA NA NA NA $ V20 : int 100 8 NA NA #snipped about 80 lines ... $ V99 : num 91.7 NA NA NA [list output truncated] ALWAYS use a programming editor and always post in plain-text. -- David. > On Oct 9, 2022, at 4:50 PM, Ivan Krylov wrote: > > On Sun, 9 Oct 2022 12:01:27 +0100 > Nick Wray wrote: > >> Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) : >> duplicate 'row.names' are not allowed > > Since you don't pass the `header` argument, I think that the automatic > header detection is here at play. This is what ?read.table has to say > about row names: > >>> If there is a header and the first row contains one fewer field than >>> the number of columns, the first column in the input is used for the >>> row names. Otherwise if ‘row.names’ is missing, the rows are >>> numbered. > > Perhaps the "one fewer field in the header than the number of columns" > condition is true for files after 2010? I'm too lazy to sign up for a > CEDA account and I'm not sure I'd be given access to hourly datasets > anyway. > > If this is the reason for the failure (first column used as rownames() > and turns out to be non-unique), there's an easy way to fix that: > >>> Using ‘row.names = NULL’ forces row numbering. > > I don't see a header in your example. If there's actually no header > containing column names, passing `header = FALSE` will both prevent the > error and avoid eating the first line of the file. > > -- > Best regards, > Ivan > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Text files from UK Met Office into R again...
Does it say what the new format is? On 2022-10-09 13:01 , Nick Wray wrote: [...] > Up to 2010 everything's fine and dandy - the data is in nice neat columns > and I can download it and filter out what I don't want. But after 2010 the > format changes (The Met Office in fact say on their guidelines that it > changes) - it's still a text doc but instead of columns it seems to be one > long vector. Here is a short sample: [...] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Text files from UK Met Office into R again...
On Sun, 9 Oct 2022 12:01:27 +0100 Nick Wray wrote: > Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) : > duplicate 'row.names' are not allowed Since you don't pass the `header` argument, I think that the automatic header detection is here at play. This is what ?read.table has to say about row names: >> If there is a header and the first row contains one fewer field than >> the number of columns, the first column in the input is used for the >> row names. Otherwise if ‘row.names’ is missing, the rows are >> numbered. Perhaps the "one fewer field in the header than the number of columns" condition is true for files after 2010? I'm too lazy to sign up for a CEDA account and I'm not sure I'd be given access to hourly datasets anyway. If this is the reason for the failure (first column used as rownames() and turns out to be non-unique), there's an easy way to fix that: >> Using ‘row.names = NULL’ forces row numbering. I don't see a header in your example. If there's actually no header containing column names, passing `header = FALSE` will both prevent the error and avoid eating the first line of the file. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading Text files from UK Met Office into R again...
Hello I've had some invaluable help from folk about downloading files from the UK Met Office - unfortunately I now have another one which I can't solve and I wonder whether anyone's got any ideas. I'm trying to download hourly weather records from the Met Office https://data.ceda.ac.uk/badc/ukmo-midas/data/WH/yearly_files Up to 2010 everything's fine and dandy - the data is in nice neat columns and I can download it and filter out what I don't want. But after 2010 the format changes (The Met Office in fact say on their guidelines that it changes) - it's still a text doc but instead of columns it seems to be one long vector. Here is a short sample: 2015-01-01 00:00, 03002, WMO, SYNOP, 1, 12, 1011, 4, 7, 200, 18, 82, , , 8, , , , , 100, 450, 1005.4, 5, , 102, 4, , 129, , , , , , , , 8.7, 7.5, 8.1, 1003.6, , , , , , , 1, 1, 1, , , 1, , , , , 1, 1, 1, 1, 1, 1, , 1, , 1, 1, , 1, , , , , , , , 1, , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , , 2015-01-01 00:00, 03005, WMO, SYNOP, 1, 9, 1011, 4, 1, 210, 26, 62, 8, 6, 8, 8, , , 8, 30, 700, 1006, 1, 8, 54, 7, 6, 105, , , , , , , , 8.6, 7.3, 8, 996.1, , 01, , , , , 1, 1, 1, 1, 1, 1, 1, , , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, , , , , , , , 1, , , , , 2014-12-31 23:55, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , 0, 1 2015-01-01 00:00, 03006, WMO, SYNOP, 1, 10, 1011, 4, 6, 210, 23, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 1, 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , , , , , , , A, , , , 2015-01-01 00:00, 03010, WMO, SYNOP, 1, 17, 1011, 4, 6, 230, 21, , , , , , , , , , , 1006.1, , , , , , , , , , , , , , 9.4, 6.2, 7.9, , , , , , , , 1, 1, , , , , , , , , , , 1, 1, 1, 1, , , , , , , , , , , , , , , , , , , , If I could download it I should still be able to use it as I could identify each separate line as it will be headed by a date. The files are v large c 1GB and when I start to download in R it works for a while and then after maybe 30 seconds I get this error message: Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) : duplicate 'row.names' are not allowed The instruction in the error message works perfectly well up to 2010, and I can’t see where a “duplicate rowname” would come in this data anyway Is there a way of either downloading the file without getting the error message or of being able to identify at what point in the file the error message is being generated so that I could, by hand possibly, take out whatever the problem is? I’ve tried putting the downloaded text doc into other formats but nothing seems to work If anyone has any ideas I’d be v grateful Thanks Nick Wray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.