Hi Andrew, You might enjoy these links from a talk I gave at PyCon (and OSCON):
* http://asheesh.org/pub/scrapy-talk/#1 * http://pyvideo.org/video/1685/scrapy-it-gets-the-web On Wed, Mar 4, 2015 at 4:27 AM, Aaron Tao <[email protected]> wrote: > http://doc.scrapy.org/en/latest/intro/tutorial.html > > This is the scrapy tutorial from official website :) > > > On Wednesday, March 4, 2015 at 2:38:58 AM UTC+8, Andrew Stringfield wrote: >> >> Hello all, >> >> I am totally new to scrapy. I have tried writing spiders but to little >> avail. I finally figured out how to decently get the general data that I >> want. I could sed and awk until I get it right, but I have seen the power >> of scrapy. I would like to get this right. The data fields that I want are >> full url of the links, date information, and location. Here is what I have >> so far: >> #####My Bash script >> #!/bin/bash >> scrapy shell https://www.24hoursoflemons.com -c >> "hxs.select('//a[contains(@href, \"events-results\")]').extract()" | sed >> -n "/\[/,/\]/p" > output.txt >> >> >> #####The output of my bash script >> [u'<a href="/events-results">Events & Results</a>', u'<a >> href="/events-results/article/159-good-effort-grand-prix">JAN 18-19: >> Sonoma Raceway, CA</a>', u'<a >> href="/events-results/article/143-sears-pointless">MAR >> 21-22: Sonoma Raceway, CA</a>', u'<a >> href="/events-results/article/148-button-turrible">JUNE >> 20-21: Buttonwillow, CA</a>', u'<a href="/events-results/article/ >> 149-pacific-northworst-gp">JULY 11-12: The Ridge, WA</a>', u'<a >> href="/events-results/article/152-vodden-the-hell-are-we-doing">SEPT >> 12-13: Thunderhill, CA</a>', u'<a href="/events-results/article/ >> 153-return-of-the-lemonites">OCT 3-4: Miller Mtrspts Park, UT</a>', u'<a >> href="/events-results/article/158-arse-freeze-apalooza">DEC 5-6: Sonoma >> <span style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: >> 11px; line-height: normal;">Raceway</span>, CA</a>', u'<a >> href="/events-results/article/142-north-dallas-hooptie">FEB 28-MAR 1: >> Eagles Cyn Rcy, TX</a>', u'<a >> href="/events-results/article/157-gator-o-rama">NOV >> 14-15: MSR Houston, TX</a>', u'<a href="/events-results/article/ >> 144-the-cure-for-gingervitis">APR 25-26: Gingerman Rcwy, MI</a>', u'<a >> href="/events-results/article/147-the-b-f-e-gp">JUNE 13-14: High Plains >> Rcwy, CO</a>', u'<a >> href="/events-results/article/150-doing-time-in-joliet">JULY >> 25-26: Autobahn Country Club, IL</a>', u'<a href="/events-results/article/ >> 155-where-the-elite-meet-to-cheat">OCT 10-11: Autobahn Country Club, >> IL</a>', u'<a >> href="/events-results/article/146-the-real-hoopties-of-new-jersey">MAY >> 9-10: New Jersey MP, NJ</a>', u'<a href="/events-results/article/ >> 151-there-goes-the-neighborhood">AUGUST 8-9: Thompson Speedway >> Motorsports Park, CT</a>', u'<a >> href="/events-results/article/156-halloween-hooptiefest">OCT >> 24-25: New Hampshire Motor Speedway, NH</a>', u'<a >> href="/events-results/article/141-shine-country-classic">FEB 7-8: Barber >> Motorsports Park, AL</a>', u'<a >> href="/events-results/article/145-southern-discomfort">MAY >> 2-3: Carolina Motorsports Park, SC</a>', u'<a >> href="/events-results/article/154-lemons-south-fall">SEP >> 19-20: Carolina Motorsports Park, SC</a>'] > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
