scrapyd will open many scrapy crawlers and will utilize your server's cores fully. If you see any problem adjust max_proc and max_proc_per_cpu ... If your spiders use 100% of CPU, maybe you won't benefit a lot by having max_proc_per_cpu > 1 or 2 but it shouldn't matter too much. If you want your crawls to start faster, decrease slightly poll_interval . You might or might not use 32GBs of ram. Since you pay for them consider co-hosting something like Redis that would benefit from Ram and doesn't use tons of CPU. Maybe use it to cache and reduce scrapy's CPU load if possible. It's interesting that your spider is using 100% of CPU. Consider off-loading/simplifying and batching some of the operations you do there. The idea is that I download a GB of files as step 1 quickly with Scrapy and then I post-process them with a batch-processing system. Step 1 uses (inbound) network bandwidth, little memory and some CPU and can be hosted on many small servers (with different IPs) and Step 2 uses CPU/Memory and can be hosted on more high-end machines. Your scrapy shouldn't do that many CPU-heavy operations like e.g. image processing etc. because it will mean spending more time before it schedules the next Request to download. The recommended CPU usage <http://doc.scrapy.org/en/latest/topics/broad-crawls.html#increase-concurrency> is 80%-90%. If you provide more info, probably I will be able to give something more specific.
On Thursday, March 17, 2016 at 10:02:42 AM UTC, Romain Marchand wrote: > > Hi, > > I would like to host my scrapyd service on good server. > > I actually have this configuration on a dedicated server: > > 32 Go RAM : DDR4 ECC 2133 MHz > CPU : 4c/8t : 2,2 / 2,6 GHz > > My spider use 100% of the CPU. > > Can you advice me a good configuration to run my crawler ! > > Thank's > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
