I am using dirbot from https://github.com/scrapy/dirbot and generating two 
spiders without actually changing anything at all (i.e. no custom code yet).

Here is the file structure of newly unzipped dirbot archive.
    Wed Mar 05 - 02:12 PM > find .
    .
    ./.gitignore
    ./dirbot
    ./dirbot/items.py
    ./dirbot/pipelines.py
    ./dirbot/settings.py
    ./dirbot/spiders
    ./dirbot/spiders/dmoz.py
    ./dirbot/spiders/__init__.py
    ./dirbot/__init__.py
    ./README.rst
    ./scrapy.cfg
    ./setup.py

It doesn't contain reference to scrapybot yet.
    Wed Mar 05 - 02:13 PM > find . -type f -print0 | xargs -0 grep -i 
scrapybot

Then I generate my first spider:
    Wed Mar 05 - 02:14 PM > scrapy genspider confluenceChildPages confluence
    Created spider 'confluenceChildPages' using template 'crawl' in module:
      dirbot.spiders.confluenceChildPages

The new directory structure is:
    Wed Mar 05 - 02:14 PM > find .
    .
    ./.gitignore
    ./dirbot
    ./dirbot/items.py
    ./dirbot/items.pyc
    ./dirbot/pipelines.py
    ./dirbot/settings.py
    ./dirbot/settings.pyc
    ./dirbot/spiders
    ./dirbot/spiders/confluenceChildPages.py
    ./dirbot/spiders/dmoz.py
    ./dirbot/spiders/dmoz.pyc
    ./dirbot/spiders/__init__.py
    ./dirbot/spiders/__init__.pyc
    ./dirbot/__init__.py
    ./dirbot/__init__.pyc
    ./README.rst
    ./scrapy.cfg
    ./setup.py

And I find that it references scrapybot
    Wed Mar 05 - 02:17 PM > find . -type f -print0 | xargs -0 grep -i 
scrapybot
    ./dirbot/spiders/confluenceChildPages.py:from scrapybot.items import 
ScrapybotItem
    ./dirbot/spiders/confluenceChildPages.py:        i = ScrapybotItem()

When I try to generate a second spider, it fails:
    Thu Feb 27 - 01:59 PM > scrapy genspider xxx confluence
    Traceback (most recent call last):
      File "/usr/bin/scrapy", line 5, in <module>
        pkg_resources.run_script('Scrapy==0.22.2', 'scrapy')
      File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 505, 
in run_script
        self.require(requires)[0].run_script(script_name, ns)
      File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1245, 
in run_script
        execfile(script_filename, namespace, namespace)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/EGG-INFO/scripts/scrapy",
 
line 4, in <module>
        execute()
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", 
line 143, in execute
        _run_print_help(parser, _run_command, cmd, args, opts)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", 
line 89, in _run_print_help
        func(*a, **kw)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", 
line 150, in _run_command
        cmd.run(args, opts)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/commands/genspider.py",
 
line 68, in run
        crawler = self.crawler_process.create_crawler()
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py", 
line 87, in create_crawler
        self.crawlers[name] = Crawler(self.settings)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/crawler.py", 
line 25, in __init__
        self.spiders = spman_cls.from_crawler(self)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
 
line 35, in from_crawler
        sm = cls.from_settings(crawler.settings)
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
 
line 31, in from_settings
        return cls(settings.getlist('SPIDER_MODULES'))
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/spidermanager.py",
 
line 22, in __init__
        for module in walk_modules(name):
      File 
"/usr/lib/python2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/utils/misc.py",
 
line 68, in walk_modules
        submod = import_module(fullpath)
      File "/usr/lib/python2.7/importlib/__init__.py", line 37, in 
import_module
        __import__(name)
      File 
"/d/some/dir/dirbot-master/dirbot/spiders/confluenceChildPages.py", line 4, 
in <module>
        from scrapybot.items import ScrapybotItem
      ImportError: No module named scrapybot.items

So, I understand that the second one is failing because it is referencing 
scrapybot.items, which doesn't exist.

1) Have I done something wrong with my first genspider command for it to 
create a reference to items that don't exist? Should I do something 
differently?
2) What is that the second genspider failed exactly? Why is choking on a 
reference to something that doesn't exist in the *first* spider?
3) How do I fix this? Should I change the commands I use to generate the 
spiders or modify the files that are created by the first command?

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to