Dear William Michels,

On 2020-07-25 10:58 -0700, William Michels wrote:
> 
> Dear Spencer Graves (and Rasmus Liland),
> 
> I've had some luck just using gsub() 
> to alter the offending "</br>" 
> characters, appending a "___" tag at 
> each instance of "<br>" (first I 
> checked the text to make sure it 
> didn't contain any pre-existing 
> instances of "___"). See the output 
> snippet below:
> 
> > library(RCurl)
> > library(XML)
> > sosURL <- 
> > "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";
> > sosChars <- getURL(sosURL)
> > sosChars2 <- gsub("<br/>", "<br/>___", sosChars)
> > MOcan <- readHTMLTable(sosChars2)
> > MOcan[[2]]
>                   Name
> 1       Raleigh Ritter
> 2          Mike Parson
> 3 James W. (Jim) Neely
> 4     Saundra McDowell
>                            Mailing Address
> 1      4476 FIVE MILE RD___SENECA MO 64865
> 2         1458 E 464 RD___BOLIVAR MO 65613
> 3            PO BOX 343___CAMERON MO 64429
> 4 3854 SOUTH AVENUE___SPRINGFIELD MO 65807
>   Random Number Date Filed
> 1           185  2/25/2020
> 2           348  2/25/2020
> 3           477  2/25/2020
> 4                3/31/2020
> >
> 
> It's true, there's one a 'section' of 
> MOcan output that contains odd-looking 
> characters (see the "Total" line of 
> MOcan[[1]]). But my guess is you'll be 
> deleting this 'line' anyway--and 
> recalulating totals in R.

Perhaps it's the this table you mean?  

                        Offices Republican
        1              Governor          4
        2   Lieutenant Governor          4
        3    Secretary of State          1
        4       State Treasurer          1
        5      Attorney General          1
        6   U.S. Representative         24
        7         State Senator         28
        8  State Representative        187
        9         Circuit Judge         18
        10                Total 268\r\n___
           Democratic Libertarian    Green
        1           5           1        1
        2           2           1        1
        3           1           1        1
        4           1           1        1
        5           2           1        0
        6          16           9        0
        7          22           2        1
        8         137           6        2
        9           1           0        0
        10 187\r\n___   22\r\n___ 7\r\n___
           Constitution      Total
        1             0         11
        2             0          8
        3             1          5
        4             0          4
        5             0          4
        6             0         49
        7             0         53
        8             1        333
        9             0         19
        10     2\r\n___ 486\r\n___

Yes, somehow the Windows[1] character 
"0xD" gets converted to "\r\n" after 
your gsub, "<br/>" is still ignored.  

There is not a "0xD" inside the 
td.AddressCol cells in the tables we are 
interested in.

> Now that you have a comprehensive list 
> object, you should be able to pull out 
> districts/races of interest. You might 
> want to take a look at the "rlist" 
> package, to see if it can make your 
> work a little easier:
> 
> https://CRAN.R-project.org/package=rlist
> https://renkun-ken.github.io/rlist-tutorial/index.html

Thank you, this package seems useful.  

Please can you provide a hint (maybe) as 
to which of the many functions you were 
thinking of?  E.g. instead of using for 
over the index of the list of headers 
and tables, if typeof list or character, 
and updating variables to write in the 
political position to each table. 

V

r

[1] 
https://stackoverflow.com/questions/5843495/what-does-m-character-mean-in-vim

Attachment: signature.asc
Description: PGP signature

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to