I have a working scrape that targets this site: start_urls = ["http://www.domu.com/chicago/apartment-search/list?"]. Here is one target page where I'm trying to download files:
http://www.domu.com/chicago/neighborhoods/arlington-heights/central-park-east I am able to download images no problem. I'm using ImagesPipeline built into scrapy. I'm trying to download some pdfs with the FilesPipeline and cannot seem to get it working. Here is my relevant code: ---- settings.py ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 100, 'scrapy.contrib.pipeline.files.FilesPipeline': 200, 'scrapy_mongodb.MongoDBPipeline': 300} IMAGES_STORE = '/home/dfriestedt/PycharmProjects/domu/images/' ----- items.py import scrapy class BuildingData(scrapy.Item): file_urls = scrapy.Field() files = scrapy.Field() ------ spider.py def parse_building(self, response): file_urls = response.css('#available table.sticky-enabled > tbody > tr > td:nth-child(6) a::attr(href)').extract() yield BuildingData(file_urls=file_urls) ------ I'm confident the response.css is working. Here is the output in scrapy shell. >>> response.css('#available table.sticky-enabled > tbody > tr > td:nth-child(6) a::attr(href)').extract() [u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-07-01%20PM.png', u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-07-49%20PM.png', u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-08-22%20PM.png', u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-10-01%20PM.png', u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-10-48%20PM.png', u'http://www.domu.com/sites/default/files/filefield/field_units/11-26-2013%201-11-09%20PM.png'] What am I missing? I ware careful to follow these instructions here: https://groups.google.com/forum/#!msg/scrapy-users/kzGHFjXywuY/O6PIhoT3thsJ -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
