Thanks Travis for your suggestions. On Mon, Dec 15, 2014 at 10:35 PM, Travis Leleu <[email protected]> wrote: > > Hi Jun, > > Twisted is the scrapy component that makes the http requests. It also > provides the asynchronous capabilities, which is a big part of why scrapy > is so scalable. > > The error you received is because the remote server abruptly terminated > the connection with your computer (Twisted). Depending on the frequency, > it could be anti-bot logic (if it happens consistently then seems to work > the next day or on another IP). It could just be unreliable hosts -- http > connections can terminate for any reason due to the complexity involved > everywhere along the line. > > As far as what to do about it, I would recommend you simply ignore it, > unless getting every single image is worth a lot of effort. In that case, > I would look into trying to catch these errors in code somewhere, and > logging which requests were rejected. Once your crawler is done running > (and the "rejected / error / no-response" requests queue is thus > populated), you can pop through the queue and re-request the files. (I'd > do this with a different spider that consumed from the queue directly, but > that is just me.) > > > On Mon, Dec 15, 2014 at 10:15 PM, Jun Liu <[email protected]> wrote: > >> ping? Anyone can help please? >> >> >> Thanks, >> Jun >> >> On Sat, Dec 13, 2014 at 6:01 PM, Jun Liu <[email protected]> wrote: >>> >>> Hi Scrapy experts, >>> >>> I have a spider trying to scrape product data from >>> http://www.katespade.com/. It has an image pipeline similar to the one >>> in scrapy tutorial: >>> >>> class MyImagesPipeline(ImagesPipeline): >>> >>> ... >>> >>> I pretty much copy/paste it from the tutorial. However, when I run my >>> spider, I occasionally got unknown error of downloading images. The error >>> is something like below: >>> >>> 2014-12-13 17:08:01-0800 [spider_ks] WARNING: File (unknown-error): >>> Error downloading image from <GET >>> http://a248.e.akamai.net/f/248/9086/10h/origin-d4.scene7.com/is/image/KateSpade/NJMU4368_473?wid=750&fmt=jpg> >>> referred in <None>: [<twisted.python.failure.Failure <class >>> 'twisted.internet.error.ConnectionLost'>>, <twisted.python.failure.Failure >>> <class 'twisted.web.http._DataLoss'>>] >>> >>> My question is: What is this error? How to solve it? >>> >>> >>> Thanks, >>> >>> Jun >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. >
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
