Hi, I need some pointers into the right direction using scrapy for my purpose.
*What I want to do:* I am running an ASP.NET website on a Windows Server where users can register. They should be able to create and run crawl jobs. Similar to scrapinghub. I am running an Ubuntu Server with Portia/Scrapy and a database installed. I have chosen Linux/Ubuntu because it looks that scrapy runs better on Linux than on Windows. (maybe wrong here?) Everything should be scalable. If I run out of resources I would rent another server. The plan is that people configure crawl jobs (using Portia and adding some additional crawl settings on the ASP.NET website) and the website deploys these crawl jobs to the linux server to crawl and save the results in a central database. Once this is done, the user can view the crawled data on the ASP.NET website. I don't want to rewrite the whole ASP.NET website to let it run on Ubuntu. Would be too much work. I am more a Windows guy and pretty new to linux. *Problems I see here:* - I want to use Portia to annotate the pages AND I want ASP.NET to set additional crawler settings which are not set-able by portia. How to transfer the projects from Linux-Server to Windows-Server and back? - Users create a project in ASP.NET and get redirected to my Linux-Server for page annotation and then sent back to the ASP.NET webpage. Could be confusing for the user/difficult for me to code. *Possible solution:* Install the ASP.NET website and Portia/Scrapy on the Windows Server. Portia will be used for annotation, ASP.NET for additional crawler settings. Now *scrapyd *will be used to deploy the created project to the crawl-machine (Linux server) where the crawl job will run and save the results into the central database. The ASP.NET website will retrieve the crawled results from the central database. Scrapy-Redis could be used to create a distributed network of Linux servers for crawling (for the scalable requirement mentioned above). What do you think about that? Did I miss anything? Are there better approaches? Thanks, Christoph -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
