Hi Igor, It appears that the encoding is UTF-16.
> readLines("temp-mon.txt") [1] "þÿ" "" "" "" "" "" "" "" "" "" "" "" "" [14] "" "" "" "" "" "" "" A search for "þÿ" leads to the Wikipedia page http://en.wikipedia.org/wiki/Byte_order_mark, specifically UTF-16 section. > options(encoding="UTF-16") > system.time(Temperature<-read.table("temp-mon.txt",skip = 7, header = TRUE, > na.strings="NA",sep="")) user system elapsed 28.556 0.112 28.712 > ncol(Temperature) [1] 18001 > Temperature[, 1:10] YYYYMM X79.75N.49.75W X79.75N.49.25W X79.75N.48.75W X79.75N.48.25W X79.75N.47.75W X79.75N.47.25W 1 176512 -32.61 -32.92 -33.34 -33.65 -34.09 -34.21 2 176601 -31.89 -31.96 -32.26 -32.48 -32.71 -33.03 X79.75N.46.75W X79.75N.46.25W X79.75N.45.75W 1 -34.65 -34.98 -35.43 2 -33.29 -33.41 -33.76 Here you can see that I have downloaded just the first 1 MB of the file, so it only has two lines after the header, but 28 seconds to read it... I'm not sure how long it would take to read.table on the whole ~600 MB file. scan() might be faster: (and this does not require setting options(encoding="UTF-16")) > system.time(Temperature <- scan("temp-mon.txt", fileEncoding="UTF-16", > skip=8)) Read 36002 items user system elapsed 0.104 0.000 0.104 > Temperature <- matrix(Temperature, ncol=18001, byrow=TRUE) > Temperature.colnames <- scan("temp-mon.txt", character(), > fileEncoding="UTF-16", skip=7, nmax=18001) Read 18001 items > colnames(Temperature) <- Temperature.colnames > Temperature[, 1:10] YYYYMM 79.75N/49.75W 79.75N/49.25W 79.75N/48.75W 79.75N/48.25W 79.75N/47.75W 79.75N/47.25W [1,] 176512 -32.61 -32.92 -33.34 -33.65 -34.09 -34.21 [2,] 176601 -31.89 -31.96 -32.26 -32.48 -32.71 -33.03 79.75N/46.75W 79.75N/46.25W 79.75N/45.75W [1,] -34.65 -34.98 -35.43 [2,] -33.29 -33.41 -33.76 (note the different colnames, similar to using check.names=FALSE in read.table, and the result is a matrix, not a data frame as returned by read.table) HTH, Jeff On Sun, Dec 16, 2012 at 6:23 AM, <igor.drobysh...@uqat.ca> wrote: > Dear R experts, > > For quite some time I have been trying to solve a mistery of reading a > seemingly trouble-free text file. The data is temperature reconstruction > arranged as a huge grid, preceded by seven "header lines" (which you see > better if file is opened in Firefox or Chrome). > > This is the data (gridded temperature reconstruction) > ftp://ftp.ncdc.noaa.gov/pub/data/paleo/historical/europe/casty2007/temp-mon.txt > > And this is original data description: > ftp://ftp.ncdc.noaa.gov/pub/data/paleo/historical/europe/casty2007/readme-casty2007.txt > Basically, it is says "space-delimited ASCII format" there ... > > I tried this: > Temperature<-read.table(FileName,skip = 7, header = TRUE, > na.strings="NA",sep="") > > But .. > > >> Temperature <- read.table(FileName, skip = 7, header = FALSE, sep="") > Error in read.table(FileName, skip = 7, header = FALSE, sep = "") : > empty beginning of file > > > > > > Trying read.csv gives this: > > > > Error: cannot allocate vector of size 370.5 Mb > > > > I attempted to handle this by opening and resaving the file in another > software, but even if I can still see the first lines of the file in the > import dialog, the full reading of the file always ends up with an error, > possibly because of the huge humber of columns .. > > > > I believe the problem is with some special encoding but I cannot figure out > how to go around it. > > > > Could some of you give me any hint on that? > > > > many thanks in advance > > Igor > > Igor Drobyshev > Dendrochronological laboratory at Station de Recheche FERLD, director > Chaire industrielle CRSNG-UQAT-UQAM en aménagement forestier durable > Université du Québec en Abitibi-Témiscamingue > 445 boul . de l'Université > Rouyn-Noranda, QC > Canada J9X5E4 > http://www.dendro.uqat.ca/ > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.