Hi,

I need some pointers into the right direction using scrapy for my purpose.

*What I want to do:*
I am running an ASP.NET website on a Windows Server where users can 
register. They should be able to create and run crawl jobs. Similar to 
scrapinghub.
I am running an Ubuntu Server with Portia/Scrapy and a database installed. 
I have chosen Linux/Ubuntu because it looks that scrapy runs better on 
Linux than on Windows. (maybe wrong here?)

Everything should be scalable. If I run out of resources I would rent 
another server.

The plan is that people configure crawl jobs (using Portia and adding some 
additional crawl settings on the ASP.NET website) and the website deploys 
these crawl jobs to the linux server to crawl and save the results in a 
central database.
Once this is done, the user can view the crawled data on the ASP.NET 
website.

I don't want to rewrite the whole ASP.NET website to let it run on Ubuntu. 
Would be too much work. I am more a Windows guy and pretty new to linux.

*Problems I see here:*
- I want to use Portia to annotate the pages AND I want ASP.NET to set 
additional crawler settings which are not set-able by portia. How to 
transfer the projects from Linux-Server to Windows-Server and back?
- Users create a project in ASP.NET and get redirected to my Linux-Server 
for page annotation and then sent back to the ASP.NET webpage. Could be 
confusing for the user/difficult for me to code.

*Possible solution:*
Install the ASP.NET website and Portia/Scrapy on the Windows Server. Portia 
will be used for annotation, ASP.NET for additional crawler settings. Now 
*scrapyd 
*will be used to deploy the created project to the crawl-machine (Linux 
server) where the crawl job will run and save the results into the central 
database. The ASP.NET website will retrieve the crawled results from the 
central database.
Scrapy-Redis could be used to create a distributed network of Linux servers 
for crawling (for the scalable requirement mentioned above).

What do you think about that? Did I miss anything? Are there better 
approaches?

Thanks,
Christoph

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to