Hello all,
I am totally new to scrapy. I have tried writing spiders but to little avail.
I finally figured out how to decently get the general data that I want. I
could sed and awk until I get it right, but I have seen the power of scrapy. I
would like to get this right. The data fields that I want are full url of the
links, date information, and location. Here is what I have so far:
#####My Bash script
#!/bin/bash
scrapy shell https://www.24hoursoflemons.com -c
"hxs.select('//a[contains(@href, \"events-results\")]').extract()" | sed -n
"/\[/,/\]/p" > output.txt
#####The output of my bash script
[u'<a href="/events-results">Events & Results</a>', u'<a
href="/events-results/article/159-good-effort-grand-prix">JAN 18-19: Sonoma
Raceway, CA</a>', u'<a href="/events-results/article/143-sears-pointless">MAR
21-22: Sonoma Raceway, CA</a>', u'<a
href="/events-results/article/148-button-turrible">JUNE 20-21: Buttonwillow,
CA</a>', u'<a href="/events-results/article/149-pacific-northworst-gp">JULY
11-12: The Ridge, WA</a>', u'<a
href="/events-results/article/152-vodden-the-hell-are-we-doing">SEPT 12-13:
Thunderhill, CA</a>', u'<a
href="/events-results/article/153-return-of-the-lemonites">OCT 3-4: Miller
Mtrspts Park, UT</a>', u'<a
href="/events-results/article/158-arse-freeze-apalooza">DEC 5-6: Sonoma <span
style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;
line-height: normal;">Raceway</span>, CA</a>', u'<a
href="/events-results/article/142-north-dallas-hooptie">FEB 28-MAR 1: Eagles
Cyn Rcy, TX</a>', u'<a href="/events-results/article/157-gator-o-rama">NOV
14-15: MSR Houston, TX</a>', u'<a
href="/events-results/article/144-the-cure-for-gingervitis">APR 25-26:
Gingerman Rcwy, MI</a>', u'<a
href="/events-results/article/147-the-b-f-e-gp">JUNE 13-14: High Plains Rcwy,
CO</a>', u'<a href="/events-results/article/150-doing-time-in-joliet">JULY
25-26: Autobahn Country Club, IL</a>', u'<a
href="/events-results/article/155-where-the-elite-meet-to-cheat">OCT 10-11:
Autobahn Country Club, IL</a>', u'<a
href="/events-results/article/146-the-real-hoopties-of-new-jersey">MAY 9-10:
New Jersey MP, NJ</a>', u'<a
href="/events-results/article/151-there-goes-the-neighborhood">AUGUST 8-9:
Thompson Speedway Motorsports Park, CT</a>', u'<a
href="/events-results/article/156-halloween-hooptiefest">OCT 24-25: New
Hampshire Motor Speedway, NH</a>', u'<a
href="/events-results/article/141-shine-country-classic">FEB 7-8: Barber
Motorsports Park, AL</a>', u'<a
href="/events-results/article/145-southern-discomfort">MAY 2-3: Carolina
Motorsports Park, SC</a>', u'<a
href="/events-results/article/154-lemons-south-fall">SEP 19-20: Carolina
Motorsports Park, SC</a>']
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
#!/bin/bash
scrapy shell https://www.24hoursoflemons.com -c
"hxs.select('//a[contains(@href, \"events-results\")]').extract()" | sed -n
"/\[/,/\]/p" > output.txt[u'<a href="/events-results">Events & Results</a>', u'<a
href="/events-results/article/159-good-effort-grand-prix">JAN 18-19: Sonoma
Raceway, CA</a>', u'<a href="/events-results/article/143-sears-pointless">MAR
21-22: Sonoma Raceway, CA</a>', u'<a
href="/events-results/article/148-button-turrible">JUNE 20-21: Buttonwillow,
CA</a>', u'<a href="/events-results/article/149-pacific-northworst-gp">JULY
11-12: The Ridge, WA</a>', u'<a
href="/events-results/article/152-vodden-the-hell-are-we-doing">SEPT 12-13:
Thunderhill, CA</a>', u'<a
href="/events-results/article/153-return-of-the-lemonites">OCT 3-4: Miller
Mtrspts Park, UT</a>', u'<a
href="/events-results/article/158-arse-freeze-apalooza">DEC 5-6: Sonoma <span
style="font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;
line-height: normal;">Raceway</span>, CA</a>', u'<a
href="/events-results/article/142-north-dallas-hooptie">FEB 28-MAR 1: Eagles
Cyn Rcy, TX</a>', u'<a href="/events-results/article/157-gator-o-rama">NOV
14-15: MSR Houston, TX</a>', u'<a
href="/events-results/article/144-the-cure-for-gingervitis">APR 25-26:
Gingerman Rcwy, MI</a>', u'<a
href="/events-results/article/147-the-b-f-e-gp">JUNE 13-14: High Plains Rcwy,
CO</a>', u'<a href="/events-results/article/150-doing-time-in-joliet">JULY
25-26: Autobahn Country Club, IL</a>', u'<a
href="/events-results/article/155-where-the-elite-meet-to-cheat">OCT 10-11:
Autobahn Country Club, IL</a>', u'<a
href="/events-results/article/146-the-real-hoopties-of-new-jersey">MAY 9-10:
New Jersey MP, NJ</a>', u'<a
href="/events-results/article/151-there-goes-the-neighborhood">AUGUST 8-9:
Thompson Speedway Motorsports Park, CT</a>', u'<a
href="/events-results/article/156-halloween-hooptiefest">OCT 24-25: New
Hampshire Motor Speedway, NH</a>', u'<a
href="/events-results/article/141-shine-country-classic">FEB 7-8: Barber
Motorsports Park, AL</a>', u'<a
href="/events-results/article/145-southern-discomfort">MAY 2-3: Carolina
Motorsports Park, SC</a>', u'<a
href="/events-results/article/154-lemons-south-fall">SEP 19-20: Carolina
Motorsports Park, SC</a>']