Hi Igor,

It appears that the encoding is UTF-16.

> readLines("temp-mon.txt")
 [1] "þÿ" ""      ""      ""      ""      ""      ""      ""      ""
   ""      ""      ""      ""
[14] ""      ""      ""      ""      ""      ""      ""

A search for "þÿ" leads to the Wikipedia page
http://en.wikipedia.org/wiki/Byte_order_mark, specifically UTF-16
section.

> options(encoding="UTF-16")
> system.time(Temperature<-read.table("temp-mon.txt",skip = 7, header = TRUE, 
> na.strings="NA",sep=""))
   user  system elapsed
 28.556   0.112  28.712
> ncol(Temperature)
[1] 18001
> Temperature[, 1:10]
  YYYYMM X79.75N.49.75W X79.75N.49.25W X79.75N.48.75W X79.75N.48.25W
X79.75N.47.75W X79.75N.47.25W
1 176512         -32.61         -32.92         -33.34         -33.65
      -34.09         -34.21
2 176601         -31.89         -31.96         -32.26         -32.48
      -32.71         -33.03
  X79.75N.46.75W X79.75N.46.25W X79.75N.45.75W
1         -34.65         -34.98         -35.43
2         -33.29         -33.41         -33.76

Here you can see that I have downloaded just the first 1 MB of the
file, so it only has two lines after the header, but 28 seconds to
read it... I'm not sure how long it would take to read.table on the
whole ~600 MB file.

scan() might be faster:
(and this does not require setting options(encoding="UTF-16"))

> system.time(Temperature <- scan("temp-mon.txt", fileEncoding="UTF-16", 
> skip=8))
Read 36002 items
   user  system elapsed
  0.104   0.000   0.104
> Temperature <- matrix(Temperature, ncol=18001, byrow=TRUE)
> Temperature.colnames <- scan("temp-mon.txt", character(), 
> fileEncoding="UTF-16", skip=7, nmax=18001)
Read 18001 items
> colnames(Temperature) <- Temperature.colnames
> Temperature[, 1:10]
     YYYYMM 79.75N/49.75W 79.75N/49.25W 79.75N/48.75W 79.75N/48.25W
79.75N/47.75W 79.75N/47.25W
[1,] 176512        -32.61        -32.92        -33.34        -33.65
    -34.09        -34.21
[2,] 176601        -31.89        -31.96        -32.26        -32.48
    -32.71        -33.03
     79.75N/46.75W 79.75N/46.25W 79.75N/45.75W
[1,]        -34.65        -34.98        -35.43
[2,]        -33.29        -33.41        -33.76

(note the different colnames, similar to using check.names=FALSE in
read.table, and the result is a matrix, not a data frame as returned
by read.table)

HTH,
Jeff

On Sun, Dec 16, 2012 at 6:23 AM,  <igor.drobysh...@uqat.ca> wrote:
> Dear R experts,
>
> For quite some time I have been trying to solve a mistery of reading a 
> seemingly trouble-free text file. The data is temperature reconstruction 
> arranged as a huge grid, preceded by seven "header lines" (which you see 
> better if file is opened in Firefox or Chrome).
>
> This is the data (gridded temperature reconstruction)
> ftp://ftp.ncdc.noaa.gov/pub/data/paleo/historical/europe/casty2007/temp-mon.txt
>
> And this is original data description:
> ftp://ftp.ncdc.noaa.gov/pub/data/paleo/historical/europe/casty2007/readme-casty2007.txt
> Basically, it is says "space-delimited ASCII format" there ...
>
> I tried this:
> Temperature<-read.table(FileName,skip = 7, header = TRUE, 
> na.strings="NA",sep="")
>
> But ..
>
>
>> Temperature <- read.table(FileName, skip = 7, header = FALSE, sep="")
> Error in read.table(FileName, skip = 7, header = FALSE, sep = "") :
>   empty beginning of file
>
>
>
>
>
> Trying read.csv gives this:
>
>
>
> Error: cannot allocate vector of size 370.5 Mb
>
>
>
> I attempted to handle this by opening and resaving the file in another 
> software, but even if I can still see the first lines of the file in the 
> import dialog, the full reading of the file always ends up with an error, 
> possibly because of the huge humber of columns ..
>
>
>
> I believe the problem is with some special encoding but I cannot figure out 
> how to go around it.
>
>
>
> Could some of you give me any hint on that?
>
>
>
> many thanks in advance
>
> Igor
>
> Igor Drobyshev
> Dendrochronological laboratory at Station de Recheche FERLD, director
> Chaire industrielle CRSNG-UQAT-UQAM en aménagement forestier durable
> Université du Québec en Abitibi-Témiscamingue
> 445 boul . de l'Université
> Rouyn-Noranda, QC
> Canada J9X5E4
> http://www.dendro.uqat.ca/
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to