Hi,

I have an issue with scrapy where my process is stopped after several 
hours. It is a broad crawl task so I expect it to keep running until I want 
it to stop. I've read that huge processes that are stopped by the system 
can be a due to memory leaks, but since I'm not a python pro I can't figure 
out if there are any in my code. Here's the parsing method of my spider :

    def parse_page(self, response):

    sel = Selector(response)
    temp = response.url.split("/")
    directory = "flux/"+temp[2]
    file = "".join(temp)

    for flux in sel.xpath('//channel').extract():

        if not os.path.exists(directory):
            os.mkdir(directory, 0755)

        directory =directory+"/"+ file
        with codecs.open(directory, 'wb',encoding="utf-8") as f:        
            f.write(flux)
            f.close()
        
subprocess.call(['java','-cp','.:nano.jar','Post-process',directory])
        subprocess.call(['java','-jar','Post-process.jar',directory])


There are no hints in the logs, the console just states "Process stopped" 
after a few hours.

Thank you for your replies.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to