Hi,
I have an issue with scrapy where my process is stopped after several
hours. It is a broad crawl task so I expect it to keep running until I want
it to stop. I've read that huge processes that are stopped by the system
can be a due to memory leaks, but since I'm not a python pro I can't figure
out if there are any in my code. Here's the parsing method of my spider :
def parse_page(self, response):
sel = Selector(response)
temp = response.url.split("/")
directory = "flux/"+temp[2]
file = "".join(temp)
for flux in sel.xpath('//channel').extract():
if not os.path.exists(directory):
os.mkdir(directory, 0755)
directory =directory+"/"+ file
with codecs.open(directory, 'wb',encoding="utf-8") as f:
f.write(flux)
f.close()
subprocess.call(['java','-cp','.:nano.jar','Post-process',directory])
subprocess.call(['java','-jar','Post-process.jar',directory])
There are no hints in the logs, the console just states "Process stopped"
after a few hours.
Thank you for your replies.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.