Hi Simon,

You'll notice the "test" data.frame has a whole mix of characters in
the columns you're interested, including a "-" for missing values, and
that the columns you're interested in are in fact factors.

as.numeric(factor) returns the level of the factor, not the value of
the level. (See ?levels and ?factor)--that's why it's giving you those
irrelevant integers. I always end up using something like this handy
code snippet to deal with the situation:

unfactor <- function(factors)
# From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor
# Transform a factor back into its factor names
{
   return(levels(factors)[factors])
}

Then, to get your data to where you want it, I'd do this:

require(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm";
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)


result <- test[11:42, 1:5] #Extract the actual data we want
names(result) <- c("Response", "Q1", "Q2","Q3","Q4")
for(i in 2:5) {
# Convert columns to factors
  result[,i] <- as.numeric(unfactor(result[,i]))
}
result

>From here you should be able to plot or do whatever else you want.

Hope this helps,
Ethan Brown


On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss <sjk...@gmail.com> wrote:
> Dear Colleagues,
> I used this code to scrape data from the URL conatined within.  This code
> should be reproducible.
>
> require("XML")
> library(XML)
> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm";
> tables <- readHTMLTable(theurl)
> n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
> class(tables)
> test<-data.frame(tables, stringsAsFactors=FALSE)
> test[16,c(2:5)]
> as.numeric(test[16,c(2:5)])
> quartz()
> plot(c(1:4), test[15, c(2:5)])
>
> calling the values from the row of interest using test[16, c(2:5)] can bring
> them up as represented on the screen, plotting them or coercing them to
> numeric changes the values and in a way that doesn't make sense to me. My
> intuitino is that there is something going on with the way the characters
> are coded or classed when they're scraped into R.  I've looked around the
> help files for converting from character to numeric but can't find a
> solution.
>
> I also tried this:
>
> as.numeric(as.character(test[16,c(2:5)] and that also changed the values
> from what they originally were.
>
> I'm grateful for any suggestions.
> Yours, Simon Kiss
>
>
>
> *********************************
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 519 761 7606
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to