I have one item called "movie" and the data inside this Item are in two
pages. I have something like this:
f = Movie()
> """
> Here I set a lot of metadata but not all of them
> """
>
> request = Request(url_prefix+movie_url[0], callback=self._parse_details)
> request.meta['movie'] = f
> request.meta['movie_list'] = movie_list
> yield request
>
But this code is inside a for who append a list. Inside callback I complete
the data
def _parse_details(self, response):
> f = response.meta['movie']
> movie_list = response.meta['movie_list']
>
> f['debug'] = "It's working!"
>
> return f
>
Outside the for looping who make all the movie items I have this
theater['name'] = name_movie.extract()[0]
> theater['address'] = address_movie.extract()[0]
> theater['movies'] = movies_list
> yield theater
>
And my pipeline:
from scrapy.xlib.pydispatch import dispatcher
> from scrapy import signals
> from scrapy.contrib.exporter import JsonItemExporter
>
> import io
> import json
>
> class NewCineToJsonPipeline(object):
>
> def __init__(self):
> dispatcher.connect(self.spider_opened, signals.spider_opened)
> dispatcher.connect(self.spider_closed, signals.spider_closed)
> self.files = {}
>
> def spider_opened(self, spider):
> #file = open('%s_items.json' % spider.name, 'w+b')
> file = io.BytesIO(b"")
> self.files[spider] = file
> self.exporter = JsonItemExporter(file)
> self.exporter.start_exporting()
>
> def spider_closed(self, spider):
> self.exporter.finish_exporting()
>
> file = self.files.pop(spider)
>
> self._save_to_mongodb(file)
>
> file.close()
>
> def process_item(self, item, spider):
> self.exporter.export_item(item)
> return item
>
My problem is that my database stores the theaters first with the
incomplete movies information and after stores the movies with complete
information without being inside a Theater item. Does anyone knows how to
solve this little problem?
PS: Sorry my bad English. I needed to use the Google Translator for write
this.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.