Currently I am in the scrapy shell determining which xpaths work for me to retrieve data. The site I am using is http://stats.rleague.com/rl/rl_index.html
>From there I am using my starting URL as http://stats.rleague.com/rl/seas/2014.html This page contains summary of rounds and matches all on one page with page nav links to jump down to relevant round. In [9]: sel.xpath('//body/center/center/a').extract() Out[9]: [u'<a href="#1">1</a>', u'<a href="#2">2</a>', u'<a href="#3">3</a>', u'<a href="#4">4</a>', u'<a href="#5">5</a>', u'<a href="#6">6</a>', u'<a href="#7">7</a>', u'<a href="#8">8</a>', u'<a href="#9">9</a>', ...and so on to number 26, each number representing the round number. In each round there are links called match details(one link per match) which would take you to a link such as http://stats.rleague.com/rl/scorers/games/2014/201403060921.html with the 201403060921.html being different on each link based on date. If I use that link as a start url sayth:~$ scrapy shell "http://stats.rleague.com/rl/scorers/games/2014/201403060921.html" Then I can access most data in the table (excluding player names) with. In [2]: sel.xpath('//tr/td/text()').extract() Out[2]: [u'Pos', u'Player', u'T', u'G', u'FG', u'Pts', u'Pos', u'Player', u'T', u'G', u'FG', u'Pts', u'FB', u'3', u'\xa0', u'\xa0', u'12', u'FB', u'\xa0', u'\xa0', u'\xa0', u'\xa0', u'WG', ... and so on And I can extract player names with In [5]: sel.xpath('//tr/td/a/text()').extract() Out[5]: [u'Greg Inglis', u'Anthony Minichiello', u'Nathan Merritt', u'Daniel Tupou', u'Beau Champion', u'Michael Jennings', u'Bryson Goodwin', u'Shaun Kenny-Dowall', u'Lote Tuqiri', u'Roger Tuivasa-Sheck', ... and so on. How though should I best loop from first start URL into all subsequent 'Match details URL's' to extract the tables, and how should I combine correctly sel.xpath('//tr/td/text()').extract() and sel.xpath('//tr/td/a/text()').extract() so the data comes out all as one table? So for example I would get POS, FB; Player, Greg Inglis; Tries, 3 -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
