In this project, I want Scrapy and the forked process to both be executing concurrently without requiring either to block and wait. This is possible, but it's not going to be easy to program using the standard python library - it will take some time to get right. Before that, I think it's worth looking at twisted's support for processes: https://twistedmatrix.com/documents/current/core/howto/process.html I haven't used this module before, but it seems like a good starting place to investigate.
Separating the input and output processes complicates the programming API for users in my opinion, so I don't think this is a good solution. Maybe you can instead study the twisted process docs and see if that approach works better? When to terminate the crawl is another interesting problem. Don't try and solve it now, but maybe have a think about how we can approach it when the time comes. > I actually wanted INPUT.py and OUTPUT.py to be single file .i.e, I wanted > this single file to send the requests to the spider , then wait while the > spider does the computations and then take the response from the spider, > But the reason why I have separated them is that, I use the subprocess > library to fork the process and it does blocking I/O. > > Let me explain this clearly, when we run > > >scrapy crawl streaming -a Input=/home/faisal/Dropbox/ > PROGRAMS/SCRAPY/sandbox/INPUT.py -a Output=/home/faisal/Dropbox/ > PROGRAMS/SCRAPY/sandbox/OUPUT.py > > the init function of the spider class will fork out a process which > executes the INPUT.py file , using the subprocess library, which uses a > PIPE for I/O. Now the issue is that I can send some data to the INPUT.py as > stdin and take some data from INPUT.py through stdout. But I have only one > chance to do this, and *the spider will have to wait. *Whereas I want the > *INPUT.py > to wait* while the spider does the computations. > > In short, subprocess does not support *non-blocking I/O. *Thus, I had to > seperate input and ouput. > > I hope you have understood what I am trying to convey . Do you have any > suggestions ? I also request other scrapy members reading this to help me > with my problem . > > Thanks > Faisal > On Monday, February 17, 2014 8:41:04 PM UTC+4, faisal anees wrote: >> >> Hi guys, >> >> I am Mohammed Faisal Anees, a second year Computer Science undergrad at >> IIIT Hyderabad, India. I was really happy when I got to know that >> Scrapy(which has helped me a lot in my projects:) ) is taking part in GSOC >> 2014. What's better than contributing to an organisation that has helped >> you a lot ?!! >> >> I was interested in this idea on the ideas page "Support for spiders in >> other languages". TI had some questions regarding this: >> >> 1) Do we have to make wrappers or should the code be written in the other >> language from scratch ? >> >> 2) Quoting from the ideas page "The goal of this project is to allow >> developers to write spiders simply and easily in any programming language, >> while permitting Scrapy to manage concurrency, scheduling, item exporting, >> caching, etc." Does this mean this project will enable any programming >> language to use Scrapy ... or will we be adding support for languages >> separately one by one? >> >> 3) Which language will be better ? This question will depend on what the >> target audience is .. Developers or Scientists ? We can expect developers >> to be familiar with Javascript/Ruby/Java/Python/etc , Whereas Scientists >> would know C/C++/Python/Java. This is just my view, I might be wrong too !! >> >> Thanks >> > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
