That limit doesn't exist; the problem lives in your code. You mention that you are using Rules, are your regex correct?
El jueves, 18 de diciembre de 2014 06:49:36 UTC-2, ROBERTO ANGUITA MARTIN escribió: > > i must get 600 items , but like is closed in 34, i cannot obtain all. > The web scanned have many sub links and i have filer by rule the url allow > and domain allowed > > Can obteis more information in some way , for known why abort?, exist some > limit in item filter? > > El jueves, 18 de diciembre de 2014 00:06:30 UTC+1, Travis Leleu escribió: >> >> What makes you think it's closing prematurely? I see a lot of duplicate >> requests filtered out by scrapy; if you aren't getting as many items as you >> expected, that could be why. Check your assumptions. >> >> On Wed, Dec 17, 2014 at 2:31 PM, ROBERTO ANGUITA MARTIN < >> [email protected]> wrote: >>> >>> I am trying my first crawl >>> i launch my scrap with this command: >>> >>> nohup scrapy crawl prueba -o prueba.csv -t csv -s LOG_FILE=salida.out >>> -s JOBDIR=work -L DEBUG & >>> >>> >>> and i have configure CsvExportPipeline.py like in manual example, but >>> when spider has scraped 34 FEEDS finish. >>> >>> Why? i are surfing in internet and everybody said memory problem but i >>> don't found any about memory in log. >>> >>> Log level is in DEBUG but i cannot known reason why only read 34 items >>> >>> >>> Final log is this: >>> >>> >>> >>> 2014-12-17 17:02:32+0100 [prueba] INFO: Closing spider (finished) >>> >>> 2014-12-17 17:02:32+0100 [prueba] INFO: Stored csv feed (34 items) >>> in: prueba.csv >>> >>> 2014-12-17 17:02:32+0100 [prueba] INFO: Dumping Scrapy stats: >>> >>> {'downloader/request_bytes': 14603, >>> >>> 'downloader/request_count': 35, >>> >>> 'downloader/request_method_count/GET': 35, >>> >>> 'downloader/response_bytes': 551613, >>> >>> 'downloader/response_count': 35, >>> >>> 'downloader/response_status_count/200': 35, >>> >>> 'dupefilter/filtered': 363, >>> >>> 'finish_reason': 'finished', >>> >>> 'finish_time': datetime.datetime(2014, 12, 17, 16, 2, 32, 392134), >>> >>> 'item_scraped_count': 34, >>> >>> 'log_count/DEBUG': 72, >>> >>> 'log_count/ERROR': 1, >>> >>> 'log_count/INFO': 48, >>> >>> 'request_depth_max': 5, >>> >>> 'response_received_count': 35, >>> >>> 'scheduler/dequeued': 35, >>> >>> 'scheduler/dequeued/disk': 35, >>> >>> 'scheduler/enqueued': 35, >>> >>> 'scheduler/enqueued/disk': 35, >>> >>> 'start_time': datetime.datetime(2014, 12, 17, 15, 21, 55, 218630)} >>> >>> 2014-12-17 17:02:32+0100 [bodegas] INFO: Spider closed (finished) >>> >>> >>> Can anybody help me? >>> >>> >>> Regards >>> >>> Roberto >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
