Humphrey -

Any "correct" method requires you to specify _uniquely_ what you are looking 
for. If the bookmark keyword is necessary and unique, it appears you have a 
working solution. Or what else where you trying to accomplish?

Cheers,
Boris


On Jun 16, 2015, at 9:01 AM, Humphrey Zhao <humphrey.z...@yahoo.com> wrote:

> Dear Sir/Madam:
> 
> Thank you for your attention to my question. I have downloaded the source 
> code of some web pages by RCurl, and I am trying to extract the URL from 
> them. In these web pages, there are many nodes contains the same URL, such 
> like the followings:
> 
> <a href=\"http://cos.name/2015/05/the-data-wisdom-for-data-science/\"; 
> rel=\"bookmark\">
> 
> <a 
> href=\"http://blog.shakirm.com/2015/03/a-statistical-view-of-deep-learning-ii-auto-encoders-and-free-energy/\";
>  target=\"_blank\">
> 
> <a 
> href=\"http://cos.name/2015/05/the-data-wisdom-for-data-science/#more-10947\"; 
> class=\"more-link\">
> 
> I want to accurately choose the URL I need(the "href" in the first one), and 
> I tried many ways the most accuracy is just like the following:
> 
> library(XML)
> 
> #links<-getHTMLLinks(base.html, xpQuery = "//a/@href")
> 
> links<-getHTMLLinks(base.html, xpQuery = c("//a/href[@rel='bookmark']"))
> 
> However, I still believe that there is a correct method to do this very well, 
> but I could not find it. I wonder if you could give me some advice on solving 
> this problem. And I would be most grateful if you could reply at your 
> earliest convenience. Looking forward to hearing from you. Thank you very 
> much.
> 
>                                      Sincerely yours 
> 
>                                      Humphrey Zhao
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to