Wanted to also mention after Travis' message, using Scrapy's "View" command is very helpful here if javascript and browser logic changes the DOM.
http://doc.scrapy.org/en/1.0/topics/commands.html#std:command-view On Wed, Sep 16, 2015 at 11:18 AM, Jeremy D <[email protected]> wrote: > Bruce speaks the truth. Not only may it not know the reason, it might not > even know a problem has happened. Whatever site you're scraping may change > so slightly that your full xpath starts getting the wrong data, or data you > don't care about. > > The documentation is right on this one, using relative paths and being > specific (contains(@href, 'image') etc) is the way to go here. Even so, > theres always the potential for something to be incorrect. If you know what > the values are going in, you can check with things like len() or if it > begins/ends with specific values, those are ways to be sure you're getting > the right data from the right spot. "Fragile" is a good word here > > On Wed, Sep 16, 2015 at 11:14 AM, bruce <[email protected]> wrote: > >> Hi. >> >> When dealing with scraping, and using xpath/dom operations, you need >> to always keep in mind, that the overall structure of the content is >> subject to change. It's fragile. If you use a "complete" xpath from >> the root(top) to the item in question, any "change" along the way, can >> result in an error. Unless you have sufficient error checking, your >> app might not "know" the reason for the error. >> >> If you create an xpath, that has the "minimum" of attributes to get >> you to where/what you need, it's more robust. But, it's still fragile, >> just not as fragile as using the complete xpath... >> >> >> >> On Wed, Sep 16, 2015 at 5:50 AM, michio basya <[email protected]> wrote: >> > Hi, >> > >> > >> > I have a question that why never use full xpath. >> > I have been developing crawler with full xpath, and I notice this >> sentence >> > in the documents. >> > >> > Are these reasons a tbody problem and a live browser dom problem? Or any >> > other reasons? >> > If the reasons are only two problems, I will keep developing with full >> > xpath. >> > So please teach me any other reason to prevent a future problem. >> > Thanks, >> > >> > >> http://doc.scrapy.org/en/1.0/topics/firefox.html#caveats-with-inspecting-the-live-browser-dom >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups >> > "scrapy-users" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an >> > email to [email protected]. >> > To post to this group, send email to [email protected]. >> > Visit this group at http://groups.google.com/group/scrapy-users. >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
