Hello there,
I am new to scrapy and trying to using it.
I tried to debug with scrapy shell and inspection but it didn't help me out.
My script simply does not do anything what I expect.
Here is the script
from scrapy.selector import Selector, HtmlXPathSelector
from uksw.items import DataItem
from scrapy.spider import Spider
from scrapy.shell import inspect_response
from scrapy.utils.response import open_in_browser
class MySpider(Spider):
name = "ecolex"
allowed_domains = ["www.ecolex.org"]
start_urls = [
"http://www.ecolex.org/ecolex/ledge/view/SearchResults?screen=Common&listingField=&allFields=&allFields_allWords=allWords&titleOfText=&titleOfText_allWords=allWords&subject=&subject_allWords=allWords&country=&country_allWords=allWords®ion=®ion_allWords=allWords&basin=&basin_allWords=allWords&keyword=&keyword_allWords=allWords&languageOfDocument=&languageOfDocument_allWords=allWords&searchDate_start=1960&searchDate_end=2014&sortField=searchDate"
]
# rules = (
# Rule(SgmlLinkExtractor(allow=("http://www.ecolex.org/",)),
callback='parse_items')
# )
def parse_items(self, response):
hxs = HtmlXPathSelector(response)
inspect_response(response, self)
items = []
item = DataItem()
item["name"] = response.xpath('//title/text()').extract()
#hxs.select("//div/text()").extract()
items.append(item)
return items
I am trying to get e.g. the title or some text in div. Both things dont
work.
I am running the script with option -o name.json, the result is only one
character '['.
Any suggestions?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.