You can subclass Sitemap Spiders (see https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spiders/sitemap.py) and overwrite the `_parse_sitemap` method.
Em sábado, 15 de fevereiro de 2014 21h22min10s UTC-2, BrendanB escreveu: > > Hi, > > I have a sitemap which I need to replace part of the url before its > parsed. Otherwise it will load up and parse the wrong url. Can a regex be > done prior to scanning the url > > source url > http://store.websitename.com.au/palsonic-dvd-player-compact > > needs to look like this > http://store.com.au/palsonic-dvd-player-compact > > regards, > Brendan > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
