Re: spider closing

Nicolás Alejandro Ramírez Quiros Thu, 18 Dec 2014 13:27:07 -0800

That limit doesn't exist; the problem lives in your code. You mention that 
you are using Rules, are your regex correct?


El jueves, 18 de diciembre de 2014 06:49:36 UTC-2, ROBERTO ANGUITA MARTIN 
escribió:
>
> i must get 600 items , but like is closed in 34, i cannot obtain all.
> The web scanned have many sub links and i have filer by rule the url allow 
> and domain allowed
>
> Can obteis more information in some way , for known why abort?, exist some 
> limit in item filter?
>
> El jueves, 18 de diciembre de 2014 00:06:30 UTC+1, Travis Leleu escribió:
>>
>> What makes you think it's closing prematurely?  I see a lot of duplicate 
>> requests filtered out by scrapy; if you aren't getting as many items as you 
>> expected, that could be why.  Check your assumptions.
>>
>> On Wed, Dec 17, 2014 at 2:31 PM, ROBERTO ANGUITA MARTIN <
>> [email protected]> wrote:
>>>
>>> I am trying my first crawl
>>> i launch my scrap with this command:
>>>
>>> nohup scrapy crawl prueba -o prueba.csv -t csv -s LOG_FILE=salida.out  
>>> -s JOBDIR=work -L DEBUG &
>>>
>>>
>>> and i have configure CsvExportPipeline.py like in manual example, but 
>>> when spider has scraped 34 FEEDS finish.
>>>
>>> Why? i are surfing in internet and everybody said memory problem but i 
>>> don't found any about memory in log.
>>>
>>> Log level is in DEBUG but i cannot known reason why only read 34 items
>>>
>>>
>>> Final log is this:
>>>
>>>
>>>
>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Closing spider (finished)
>>>
>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Stored csv feed (34 items) 
>>> in: prueba.csv
>>>
>>> 2014-12-17 17:02:32+0100 [prueba] INFO: Dumping Scrapy stats:
>>>
>>> {'downloader/request_bytes': 14603,
>>>
>>>  'downloader/request_count': 35,
>>>
>>>  'downloader/request_method_count/GET': 35,
>>>
>>>  'downloader/response_bytes': 551613,
>>>
>>>  'downloader/response_count': 35,
>>>
>>>  'downloader/response_status_count/200': 35,
>>>
>>>  'dupefilter/filtered': 363,
>>>
>>>  'finish_reason': 'finished',
>>>
>>>  'finish_time': datetime.datetime(2014, 12, 17, 16, 2, 32, 392134),
>>>
>>>  'item_scraped_count': 34,
>>>
>>>  'log_count/DEBUG': 72,
>>>
>>>  'log_count/ERROR': 1,
>>>
>>>  'log_count/INFO': 48,
>>>
>>>  'request_depth_max': 5,
>>>
>>>  'response_received_count': 35,
>>>
>>>  'scheduler/dequeued': 35,
>>>
>>>  'scheduler/dequeued/disk': 35,
>>>
>>>  'scheduler/enqueued': 35,
>>>
>>>  'scheduler/enqueued/disk': 35,
>>>
>>>  'start_time': datetime.datetime(2014, 12, 17, 15, 21, 55, 218630)}
>>>
>>> 2014-12-17 17:02:32+0100 [bodegas] INFO: Spider closed (finished)
>>>
>>>
>>> Can anybody help me?
>>>
>>>
>>> Regards
>>>
>>> Roberto
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: spider closing

Reply via email to