Dear William Michels, On 2020-07-25 10:58 -0700, William Michels wrote: > > Dear Spencer Graves (and Rasmus Liland), > > I've had some luck just using gsub() > to alter the offending "</br>" > characters, appending a "___" tag at > each instance of "<br>" (first I > checked the text to make sure it > didn't contain any pre-existing > instances of "___"). See the output > snippet below: > > > library(RCurl) > > library(XML) > > sosURL <- > > "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975" > > sosChars <- getURL(sosURL) > > sosChars2 <- gsub("<br/>", "<br/>___", sosChars) > > MOcan <- readHTMLTable(sosChars2) > > MOcan[[2]] > Name > 1 Raleigh Ritter > 2 Mike Parson > 3 James W. (Jim) Neely > 4 Saundra McDowell > Mailing Address > 1 4476 FIVE MILE RD___SENECA MO 64865 > 2 1458 E 464 RD___BOLIVAR MO 65613 > 3 PO BOX 343___CAMERON MO 64429 > 4 3854 SOUTH AVENUE___SPRINGFIELD MO 65807 > Random Number Date Filed > 1 185 2/25/2020 > 2 348 2/25/2020 > 3 477 2/25/2020 > 4 3/31/2020 > > > > It's true, there's one a 'section' of > MOcan output that contains odd-looking > characters (see the "Total" line of > MOcan[[1]]). But my guess is you'll be > deleting this 'line' anyway--and > recalulating totals in R.
Perhaps it's the this table you mean? Offices Republican 1 Governor 4 2 Lieutenant Governor 4 3 Secretary of State 1 4 State Treasurer 1 5 Attorney General 1 6 U.S. Representative 24 7 State Senator 28 8 State Representative 187 9 Circuit Judge 18 10 Total 268\r\n___ Democratic Libertarian Green 1 5 1 1 2 2 1 1 3 1 1 1 4 1 1 1 5 2 1 0 6 16 9 0 7 22 2 1 8 137 6 2 9 1 0 0 10 187\r\n___ 22\r\n___ 7\r\n___ Constitution Total 1 0 11 2 0 8 3 1 5 4 0 4 5 0 4 6 0 49 7 0 53 8 1 333 9 0 19 10 2\r\n___ 486\r\n___ Yes, somehow the Windows[1] character "0xD" gets converted to "\r\n" after your gsub, "<br/>" is still ignored. There is not a "0xD" inside the td.AddressCol cells in the tables we are interested in. > Now that you have a comprehensive list > object, you should be able to pull out > districts/races of interest. You might > want to take a look at the "rlist" > package, to see if it can make your > work a little easier: > > https://CRAN.R-project.org/package=rlist > https://renkun-ken.github.io/rlist-tutorial/index.html Thank you, this package seems useful. Please can you provide a hint (maybe) as to which of the many functions you were thinking of? E.g. instead of using for over the index of the list of headers and tables, if typeof list or character, and updating variables to write in the political position to each table. V r [1] https://stackoverflow.com/questions/5843495/what-does-m-character-mean-in-vim
signature.asc
Description: PGP signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.