maybe you could modify the following to suit your situation (i use this xPath expression to get links from google):
?htmlTreeParse ?getNodeSet > library(XML) > link <- > 'http://www.google.co.uk/search?hl=en&client=firefox-a&rls=org.mozilla:en-GB:official&hs=2XR&ei=mxa6SojjOeaMjAfJkcDuBQ&sa=X&oi=spell&resnum=0&ct=result&cd=1&q=Doctor+Who&spell=1' > html <- htmlTreeParse(link, useInternalNodes = TRUE, error=function(...){}) > nodes <- getNodeSet(html, "//a...@href][@class='l']") > sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]) [1] "http://www.bbc.co.uk/ doctorwho/" [2] "http://www.bbc.co.uk/doctorwho/ classic/" [3] "http://en.wikipedia.org/wiki/ Doctor_Who" [4] "http://www.youtube.com/watch? v=LF2x5IKxmAQ" [5] "http://www.youtube.com/watch? v=DnKNupdSH8g" [6] "http://www.telegraph.co.uk/culture/tvandradio/doctor-who/6199603/ Doctor-Who-Top-10-fans-vote-for-all-time-best-episode.html" [7] "http://www.google.com/hostednews/ap/article/ALeqM5i17A4FXTLhJX10- sCbhhnhdqY9HwD9ASO6A00" [8] "http://www.telegraph.co.uk/news/newstopics/celebritynews/6200053/ Doctor-Who-star-David-Tennant-voted-pupils-dream-head-teacher.html" [9] "http://www.imdb.com/title/ tt0436992/" [10] "http://www.imdb.com/title/ tt0056751/" [11] "http:// www.gallifreyone.com/" [12] "http:// www.doctorwho.co.uk/" [13] "http:// www.drwhoguide.com/" [14] "http://www.bbcamerica.com/content/123/index.jsp" On 23 Sep, 13:29, "Rene" <kaixinma...@gmail.com> wrote: > Dear All, > > Can someone please guide me how to get the certain part from a long html > language? > > e.g. > > "<td><a href='2005-01.html'>2005-01</a></td><td><a > href='2006-01.html'>2006-01</a></td><td><a > href='2007-01.html'>2007-01</a></td><td><a > href='2008-01.html'>2008-01</a></td><td><a > href='2009-01.html'>2009-01</a></td>" > > How to get only the wording of "2005-01.html", "2006-01.html", > "2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I > have tried to use gsub function, but not working. > > Please guide me on this. > > Thanks a lot. > > Rene. > > [[alternative HTML version deleted]] > > ______________________________________________ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.