Dear Rasmus Liland et al.:
On 2020-07-25 11:30, Rasmus Liland wrote:
On 2020-07-25 09:56 -0500, Spencer Graves wrote:
Dear Rasmus et al.:
It is LILAND et al., is it not? ... else it's customary to
put a comma in there, isn't it? ...
The APA Style recommends "Sharp et al., 2007":
https://blog.apastyle.org/apastyle/2011/11/the-proper-use-of-et-al-in-apa-style.html
Regarding Confucius, I'm confused.
right, moving on:
On 2020-07-25 04:10, Rasmus Liland wrote:
<snip>
Please research using Thunderbird, Claws
mail, or some other sane e-mail client;
they are great, I promise.
Thanks. I researched it and turned of HTML. Please excuse: I noticed
it was a problem, but hadn't prioritized time to research and fix it
until your comment. Thanks.
Please excuse:? Before my last post, I
had written code to do all that.?
Good!
In brief, the political offices are
"h3" tags.?
Yes, some type of header element at
least, in-between the various tables,
everything children of the div in the
element tree.
I used "strsplit" to split the string
at "<h3>".? I then wrote a
function to find "</h3>", extract the
political office and pass the rest to
"XML::readHTMLTable", adding columns
for party and political office.
Yes, doing that for the political office
is also possible, but the party is
inside the table's caption tag, which
end up as the name of the table in the
XML::readHTMLTable list ...
However, this suppressed "<br/>"
everywhere.?
Why is that, please explain.
I don't know why the Missouri Secretary of State's web site includes
"<br/>" to signal a new line, but it does. I also don't know why
XML::readHTMLTable suppressed "<br/>" everywhere it occurred, but it did
that. After I used gsub to replace "<br/>" with "\n", I found that
XML::readHTMLTable did not replace "\n", so I got what I wanted.
I thought there should be
an option with something like
"XML::readHTMLTable" that would not
delete "<br/>" everywhere, but I
couldn't find it.?
No, there is not, AFAIK. Please, if
anyone else knows, please say so *echoes
in the forest*
If you aren't aware of one, I can
gsub("<br/>", "\n", ...) on the string
for each political office before
passing it to "XML::readHTMLTable".? I
just tested this:? It works.
Such a great hack! IMHO, this is much
more flexible than using
xml2::read_html, rvest::read_table,
dplyr::mutate like here[1]
I have other web scraping problems in
my work plan for the few days.?
Maybe, idk ...
I will definitely try
XML::htmlTreeParse, etc., as you
suggest.
I wish you good luck,
Rasmus
[1]
https://stackoverflow.com/questions/38707669/how-to-read-an-html-table-and-account-for-line-breaks-within-cells
And I added my solution to this problem to this Stackoverflow thread.
Thanks again,
Spencer
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.