Hello, All:

      I've failed with multiple attempts to scrape the table of candidates from the website of the Missouri Secretary of State:


https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975


      I've tried base::url, base::readLines, xml2::read_html, and XML::readHTMLTable; see summary below.


      Suggestions?
      Thanks,
      Spencer Graves


sosURL <- "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";

str(baseURL <- base::url(sosURL))
# this might give me something, but I don't know what

sosRead <- base::readLines(sosURL) # 404 Not Found
sosRb <- base::readLines(baseURL) # 404 Not Found

sosXml2 <- xml2::read_html(sosURL) # HTTP error 404.

sosXML <- XML::readHTMLTable(sosURL)
# List of 0;  does not seem to be XML

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2    curl_4.3
[4] xml2_1.3.2     XML_3.99-0.3

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to