On Wed, 2007-05-09 at 10:55 -0500, Marc Schwartz wrote: > On Wed, 2007-05-09 at 15:47 +0100, Vittorio wrote: > > Each day the daily balance in the following link > > > > http://www. > > snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf > > > > is > > updated. > > > > I would like to set up an R procedure to be run daily in a > > server able to read the figures in a couple of lines only > > ("Industriale" and "Termoelettrico", towards the end of the balance) > > and put the data in a table. > > > > Is that possible? If yes, what R-packages > > should I use? > > > > Ciao > > Vittorio > > Vittorio, > > Keep in mind that PDF files are typically text files. Thus you can read > it in using readLines(): > > PDFFile <- > readLines("http://www.snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf") > > # Clean up > unlink("http://www.snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf") > > > > str(PDFFile) > chr [1:989] "%PDF-1.2" "6 0 obj" "<<" "/Length 7 0 R" ... > > > # Now find the lines containing the values you wish > # Use grep() with a regex for either term > Lines <- grep("(Industriale|Termoelettrico)", PDFFile) > > > Lines > [1] 33 34 > > > PDFFile[Lines] > [1] "/F3 1 Tf 9 0 0 9 204 304 Tm (Industriale )Tj 9 0 0 9 420 304 Tm ( > 46,6)Tj" > [2] "9 0 0 9 204 283 Tm (Termoelettrico )Tj 9 0 0 9 420 283 Tm ( > 99,3)Tj" > > > # Now parse the values out of the lines" > Vals <- sub(".*\\((.*)\\).*", "\\1", PDFFile[Lines]) > > > Vals > [1] " 46,6" " 99,3" > > > # Now convert them to numeric > # need to change the ',' to a '.' at least in my locale > > > as.numeric(gsub(",", "\\.", Vals)) > [1] 46.6 99.3
Vittorio, Just a quick tweak here, given the possibility that the order of the values may be subject to change. After reading the file and getting the lines, use: # Use sub() with 2 back references, 1 for each value in the line Vals <- sub(".*\\((.*)\\).*\\((.*)\\).*", "\\1 \\2", PDFFile[Lines]) > Vals [1] "Industriale 46,6" "Termoelettrico 99,3" This gives us the labels and the values. Now convert to a data frame and then coerce the values to numeric: DF <- read.table(textConnection(Vals)) > DF V1 V2 1 Industriale 46,6 2 Termoelettrico 99,3 DF$V2 <- as.numeric(sub(",", "\\.", DF$V2)) > DF V1 V2 1 Industriale 46.6 2 Termoelettrico 99.3 > str(DF) 'data.frame': 2 obs. of 2 variables: $ V1: Factor w/ 2 levels "Industriale",..: 1 2 $ V2: num 46.6 99.3 HTH, Marc ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.