Hi there Thank you very match for replies. I understand the true meaning in the link. It's helpful for me. Thanks,
2015年9月17日木曜日 0時20分42秒 UTC+9 Jeremy D: > > Wanted to also mention after Travis' message, using Scrapy's "View" > command is very helpful here if javascript and browser logic changes the > DOM. > > http://doc.scrapy.org/en/1.0/topics/commands.html#std:command-view > > On Wed, Sep 16, 2015 at 11:18 AM, Jeremy D <[email protected] > <javascript:>> wrote: > >> Bruce speaks the truth. Not only may it not know the reason, it might not >> even know a problem has happened. Whatever site you're scraping may change >> so slightly that your full xpath starts getting the wrong data, or data you >> don't care about. >> >> The documentation is right on this one, using relative paths and being >> specific (contains(@href, 'image') etc) is the way to go here. Even so, >> theres always the potential for something to be incorrect. If you know what >> the values are going in, you can check with things like len() or if it >> begins/ends with specific values, those are ways to be sure you're getting >> the right data from the right spot. "Fragile" is a good word here >> >> On Wed, Sep 16, 2015 at 11:14 AM, bruce <[email protected] <javascript:>> >> wrote: >> >>> Hi. >>> >>> When dealing with scraping, and using xpath/dom operations, you need >>> to always keep in mind, that the overall structure of the content is >>> subject to change. It's fragile. If you use a "complete" xpath from >>> the root(top) to the item in question, any "change" along the way, can >>> result in an error. Unless you have sufficient error checking, your >>> app might not "know" the reason for the error. >>> >>> If you create an xpath, that has the "minimum" of attributes to get >>> you to where/what you need, it's more robust. But, it's still fragile, >>> just not as fragile as using the complete xpath... >>> >>> >>> >>> On Wed, Sep 16, 2015 at 5:50 AM, michio basya <[email protected] >>> <javascript:>> wrote: >>> > Hi, >>> > >>> > >>> > I have a question that why never use full xpath. >>> > I have been developing crawler with full xpath, and I notice this >>> sentence >>> > in the documents. >>> > >>> > Are these reasons a tbody problem and a live browser dom problem? Or >>> any >>> > other reasons? >>> > If the reasons are only two problems, I will keep developing with full >>> > xpath. >>> > So please teach me any other reason to prevent a future problem. >>> > Thanks, >>> > >>> > >>> http://doc.scrapy.org/en/1.0/topics/firefox.html#caveats-with-inspecting-the-live-browser-dom >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "scrapy-users" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an >>> > email to [email protected] <javascript:>. >>> > To post to this group, send email to [email protected] >>> <javascript:>. >>> > Visit this group at http://groups.google.com/group/scrapy-users. >>> > For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
