On Mon, Nov 16, 2009 at 7:29 PM, Philip Leifeld <leif...@coll.mpg.de> wrote: > Hi, > > how can I parse Google search results? The following code returns > "integer(0)" instead of "1" although the results of the query clearly > contain the regex "cran". > > #### > address <- url("http://www.google.com/search?q=cran") > open(address) > lines <- readLines(address) > grep("cran", lines[3]) > ####
Hmmm how could that be? It's not like you're getting any warnings or anything... Or are you? I get a couple: > address <- url("http://www.google.com/search?q=cran") > open(address) > lines <- readLines(address) Warning message: In readLines(address) : incomplete final line found on 'http://www.google.com/search?q=cran' - but that's probably because there's no newline at the end of the data. Ignore that. > grep("cran",lines[3]) integer(0) Warning message: In grep("cran", lines[3]) : input string 1 is invalid in this locale Oh now that looks serious. And relevant. Did you get this warning? You didn't say. I'll assume you didn't, because otherwise you surely would have mentioned it. So I won't waste my time typing my solution in now. Oh alright. You may need to set the encoding when you open the url to 'latin1': > address <- url("http://www.google.com/search?q=cran",encoding="latin1") > grep("cran",lines[3]) [1] 1 So is that the problem? Did you get the warning message and not show us? Transcripts (inputs and outputs) are good. Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.