Thanks Travis for your suggestions.

On Mon, Dec 15, 2014 at 10:35 PM, Travis Leleu <[email protected]> wrote:
>
> Hi Jun,
>
> Twisted is the scrapy component that makes the http requests.  It also
> provides the asynchronous capabilities, which is a big part of why scrapy
> is so scalable.
>
> The error you received is because the remote server abruptly terminated
> the connection with your computer (Twisted).  Depending on the frequency,
> it could be anti-bot logic (if it happens consistently then seems to work
> the next day or on another IP).  It could just be unreliable hosts -- http
> connections can terminate for any reason due to the complexity involved
> everywhere along the line.
>
> As far as what to do about it, I would recommend you simply ignore it,
> unless getting every single image is worth a lot of effort.  In that case,
> I would look into trying to catch these errors in code somewhere, and
> logging which requests were rejected.  Once your crawler is done running
> (and the "rejected / error / no-response" requests queue is thus
> populated), you can pop through the queue and re-request the files.  (I'd
> do this with a different spider that consumed from the queue directly, but
> that is just me.)
>
>
> On Mon, Dec 15, 2014 at 10:15 PM, Jun Liu <[email protected]> wrote:
>
>> ping? Anyone can help please?
>>
>>
>> Thanks,
>> Jun
>>
>> On Sat, Dec 13, 2014 at 6:01 PM, Jun Liu <[email protected]> wrote:
>>>
>>> Hi Scrapy experts,
>>>
>>> I have a spider trying to scrape product data from
>>> http://www.katespade.com/. It has an image pipeline similar to the one
>>> in scrapy tutorial:
>>>
>>> class MyImagesPipeline(ImagesPipeline):
>>>
>>> ...
>>>
>>> I pretty much copy/paste it from the tutorial. However, when I run my
>>> spider, I occasionally got unknown error of downloading images. The error
>>> is something like below:
>>>
>>> 2014-12-13 17:08:01-0800 [spider_ks] WARNING: File (unknown-error):
>>> Error downloading image from <GET
>>> http://a248.e.akamai.net/f/248/9086/10h/origin-d4.scene7.com/is/image/KateSpade/NJMU4368_473?wid=750&fmt=jpg>
>>> referred in <None>: [<twisted.python.failure.Failure <class
>>> 'twisted.internet.error.ConnectionLost'>>, <twisted.python.failure.Failure
>>> <class 'twisted.web.http._DataLoss'>>]
>>>
>>> My question is: What is this error? How to solve it?
>>>
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to