Hi Daniel,
actually I run /crawlscrape/urls_listing/urls_listing$ scrapy crawl
urls_grasping -o items.json -t json
and a file json has been created:
/crawlscrape/urls_listing/urls_listing$ ls -lh items.json
-rw-rw-r-- 1 marco marco 93K gen 23 06:20 items.json

so it seems that some settings in scrapyd-deploy has be to fine-tuned.
Do you know anyone to ask to?

Thank you very much for your kind help.
Marco

2015-01-22 20:16 GMT+01:00 Daniel Fockler <[email protected]>:
> Alright so it looks like running your project using scrapyd-deploy is
> changing the output settings, so your item output files are going to
>
> /var/lib/scrapyd/items/urls_listing/urls_grasping/8d8e0dfaa20d11e48c91c04a00090e80.jl
>
> For some reason. In your project you can try just running scrapyd without
> scrapyd-deploy and that should allow you to scrape using the correct
> settings. I don't have a ton of experience using the deploy features with
> scrapyd-deploy, so I'm not sure I can help much with that.
>
> On Thursday, January 22, 2015 at 12:16:27 AM UTC-8, Marco Ippolito wrote:
>>
>> Hi Daniel,
>> not seeing any logs I decided to create a new project from scratch.
>> But I still have the same problems.
>> I attached the compressed (.tar) log files together with the
>> compressed (.tar) urls_listing prj directory.
>> Feel free to have a try.
>>
>> Looking forward to your kind help.
>> Marco
>>
>> 2015-01-21 22:09 GMT+01:00 Daniel Fockler <[email protected]>:
>> > If you end up with an empty output.json or a file that just has a '['
>> > character that could mean that scrapy couldn't find any items from your
>> > spider. But if that is not the case then there is another issue. Scrapyd
>> > should output logs for every spider that you run, in a logs directory
>> >
>> > On Wednesday, January 21, 2015 at 11:41:49 AM UTC-8, Marco Ippolito
>> > wrote:
>> >>
>> >> Hi Daniel,
>> >> thanks again for helping.
>> >>
>> >> I tried with
>> >> FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json'
>> >> FEED_FORMAT = 'json'
>> >>
>> >> and with
>> >> FEED_URI = 'output.json'
>> >> FEED_FORMAT = 'json'
>> >> in both cases there is no output and not error message
>> >>
>> >> Any hints?
>> >>
>> >> Marco
>> >>
>> >> 2015-01-21 20:28 GMT+01:00 Daniel Fockler <[email protected]>:
>> >> > You'll want to make sure in your settings.py that feed format is set,
>> >> > like
>> >> >
>> >> > FEED_FORMAT = 'json'
>> >> >
>> >> > If it doesn't work after that then just try changing feed uri to
>> >> >
>> >> > FEED_URI = 'output.json'
>> >> >
>> >> > and scrapy will dump it in your project root
>> >> >
>> >> > On Tuesday, January 20, 2015 at 11:00:50 PM UTC-8, Marco Ippolito
>> >> > wrote:
>> >> >>
>> >> >> Hi Daniel,
>> >> >> thank you very much for your kind help.
>> >> >>
>> >> >> After scheduling the spider run, an output is actually produced:
>> >> >>
>> >> >> Opening file
>> >> >>
>> >> >>
>> >> >> /var/lib/scrapyd/items/sole24ore/sole/89d644f8a13a11e4a2afc04a00090e80.jl
>> >> >> Read output!
>> >> >> This is my output:
>> >> >> {"url": ["http://m.bbc.co.uk";, "http://www.bbc.com/news/";, " .....
>> >> >>
>> >> >> But modifying the feed settings as:
>> >> >> BOT_NAME = 'sole24ore'
>> >> >>
>> >> >> SPIDER_MODULES = ['sole24ore.spiders']
>> >> >> NEWSPIDER_MODULE = 'sole24ore.spiders'
>> >> >>
>> >> >> FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json'
>> >> >>
>> >> >> doesn't produce an output.json into
>> >> >> /home/marco/crawlscrape/sole24ore
>> >> >>
>> >> >> am I missing some other steps?
>> >> >>
>> >> >> Marco
>> >> >>
>> >> >> 2015-01-20 18:45 GMT+01:00 Daniel Fockler <[email protected]>:
>> >> >> > For your first problem, you've started the scrapyd project but you
>> >> >> > need
>> >> >> > to
>> >> >> > schedule a spider run using the schedule.json command. Something
>> >> >> > like
>> >> >> >
>> >> >> > curl http://localhost:6800/schedule.json -d project=sole24ore -d
>> >> >> > spider=yourspidername
>> >> >> >
>> >> >> > For your second problem your settings.py is misconfigured your
>> >> >> > feed
>> >> >> > settings
>> >> >> > should be like
>> >> >> >
>> >> >> > FEED_URI = 'file://home/marco/crawlscrape/sole24ore/output.json'
>> >> >> > FEED_FORMAT = 'json'
>> >> >> >
>> >> >> > Hope that helps
>> >> >> >
>> >> >> > On Tuesday, January 20, 2015 at 4:23:04 AM UTC-8, Marco Ippolito
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >> I' ve got 2 situations to solve.
>> >> >> >>
>> >> >> >> Seems that everything is ok:
>> >> >> >>
>> >> >> >> (SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy
>> >> >> >> sole24ore
>> >> >> >> -p
>> >> >> >> sole24ore
>> >> >> >> Packing version 1421755479
>> >> >> >> Deploying to project "sole24ore" in
>> >> >> >> http://localhost:6800/addversion.json
>> >> >> >> Server response (200):
>> >> >> >> {"status": "ok", "project": "sole24ore", "version": "1421755479",
>> >> >> >> "spiders": 1}
>> >> >> >>
>> >> >> >>
>> >> >> >> marco@pc:/var/lib/scrapyd/dbs$ ls -lah
>> >> >> >> totale 12K
>> >> >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 .
>> >> >> >> drwxr-xr-x 5 scrapy nogroup 4,0K gen 20 06:55 ..
>> >> >> >> -rw-r--r-- 1 root   root    2,0K gen 20 13:04 sole24ore.db
>> >> >> >>
>> >> >> >>
>> >> >> >> marco@pc:/var/lib/scrapyd/eggs/sole24ore$ ls -lah
>> >> >> >> totale 16K
>> >> >> >> drwxr-xr-x 2 scrapy nogroup 4,0K gen 20 13:04 .
>> >> >> >> drwxr-xr-x 3 scrapy nogroup 4,0K gen 20 12:47 ..
>> >> >> >> -rw-r--r-- 1 scrapy nogroup 5,5K gen 20 13:04 1421755479.egg
>> >> >> >>
>> >> >> >>
>> >> >> >> , but nothing is executed
>> >> >> >>
>> >> >> >> marco@pc:/var/lib/scrapyd/items/sole24ore/sole$ ls -a
>> >> >> >> .  ..
>> >> >> >>
>> >> >> >> [detached from 2515.pts-4.pc]
>> >> >> >> marco@pc:~/crawlscrape/sole24ore$ curl
>> >> >> >> http://localhost:6800/listjobs.json?project=sole24ore
>> >> >> >> {"status": "ok", "running": [], "finished": [], "pending": []}
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> The second aspect regards how to save the output into a json
>> >> >> >> file.
>> >> >> >> What is the correct form to put into settings.py?
>> >> >> >>
>> >> >> >> ile Edit Options Buffers Tools Python Help
>> >> >> >> # Scrapy settings for sole24ore project
>> >> >> >> #
>> >> >> >> # For simplicity, this file contains only the most important
>> >> >> >> settings
>> >> >> >> by
>> >> >> >> # default. All the other settings are documented here:
>> >> >> >> #
>> >> >> >> #     http://doc.scrapy.org/en/latest/topics/settings.html
>> >> >> >> #
>> >> >> >>
>> >> >> >> BOT_NAME = 'sole24ore'
>> >> >> >>
>> >> >> >> SPIDER_MODULES = ['sole24ore.spiders']
>> >> >> >> NEWSPIDER_MODULE = 'sole24ore.spiders'
>> >> >> >>
>> >> >> >> FEED_URI=file://home/marco/crawlscrape/sole24ore/output.json
>> >> >> >> --set
>> >> >> >> FEED_FORMAT=json
>> >> >> >>
>> >> >> >>
>> >> >> >> SCREEN)marco@pc:~/crawlscrape/sole24ore$ scrapyd-deploy sole24ore
>> >> >> >> -p
>> >> >> >> sole24ore
>> >> >> >> Packing version 1421756389
>> >> >> >> Deploying to project "sole24ore" in
>> >> >> >> http://localhost:6800/addversion.json
>> >> >> >> Server response (200):
>> >> >> >> {"status": "error", "message": "SyntaxError: invalid syntax"}
>> >> >> >>
>> >> >> >>
>> >> >> >> # Crawl responsibly by identifying yourself (and your website) on
>> >> >> >> the
>> >> >> >> user-agent
>> >> >> >> #USER_AGENT = 'sole24ore (+http://www.yourdomain.com)'
>> >> >> >>
>> >> >> >> Looking forward to your kind help.
>> >> >> >> Kind regards.
>> >> >> >> Marco
>> >> >> >
>> >> >> > --
>> >> >> > You received this message because you are subscribed to a topic in
>> >> >> > the
>> >> >> > Google Groups "scrapy-users" group.
>> >> >> > To unsubscribe from this topic, visit
>> >> >> >
>> >> >> >
>> >> >> > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe.
>> >> >> > To unsubscribe from this group and all its topics, send an email
>> >> >> > to
>> >> >> > [email protected].
>> >> >> > To post to this group, send email to [email protected].
>> >> >> > Visit this group at http://groups.google.com/group/scrapy-users.
>> >> >> > For more options, visit https://groups.google.com/d/optout.
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to a topic in
>> >> > the
>> >> > Google Groups "scrapy-users" group.
>> >> > To unsubscribe from this topic, visit
>> >> >
>> >> > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe.
>> >> > To unsubscribe from this group and all its topics, send an email to
>> >> > [email protected].
>> >> > To post to this group, send email to [email protected].
>> >> > Visit this group at http://groups.google.com/group/scrapy-users.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "scrapy-users" group.
>> > To unsubscribe from this topic, visit
>> > https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > [email protected].
>> > To post to this group, send email to [email protected].
>> > Visit this group at http://groups.google.com/group/scrapy-users.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/0b4xqaHUOSA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to