Wanted to also mention after Travis' message, using Scrapy's "View" command
is very helpful here if javascript and browser logic changes the DOM.

http://doc.scrapy.org/en/1.0/topics/commands.html#std:command-view

On Wed, Sep 16, 2015 at 11:18 AM, Jeremy D <[email protected]> wrote:

> Bruce speaks the truth. Not only may it not know the reason, it might not
> even know a problem has happened. Whatever site you're scraping may change
> so slightly that your full xpath starts getting the wrong data, or data you
> don't care about.
>
> The documentation is right on this one, using relative paths and being
> specific (contains(@href, 'image') etc) is the way to go here. Even so,
> theres always the potential for something to be incorrect. If you know what
> the values are going in, you can check with things like len() or if it
> begins/ends with specific values, those are ways to be sure you're getting
> the right data from the right spot. "Fragile" is a good word here
>
> On Wed, Sep 16, 2015 at 11:14 AM, bruce <[email protected]> wrote:
>
>> Hi.
>>
>> When dealing with scraping, and using xpath/dom operations, you need
>> to always keep in mind, that the overall structure of the content is
>> subject to change. It's fragile. If you use a "complete" xpath from
>> the root(top) to the item in question, any "change" along the way, can
>> result in an error. Unless you have sufficient error checking, your
>> app might not "know" the reason for the error.
>>
>> If you create an xpath, that has the "minimum" of attributes to get
>> you to where/what you need, it's more robust. But, it's still fragile,
>> just not as fragile as using the complete xpath...
>>
>>
>>
>> On Wed, Sep 16, 2015 at 5:50 AM, michio basya <[email protected]> wrote:
>> > Hi,
>> >
>> >
>> > I have a question that why never use full xpath.
>> > I have been developing crawler with full xpath, and I notice this
>> sentence
>> > in the documents.
>> >
>> > Are these reasons a tbody problem and a live browser dom problem? Or any
>> > other reasons?
>> > If the reasons are only two problems, I will keep developing with full
>> > xpath.
>> > So please teach me any other reason to prevent a future problem.
>> > Thanks,
>> >
>> >
>> http://doc.scrapy.org/en/1.0/topics/firefox.html#caveats-with-inspecting-the-live-browser-dom
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "scrapy-users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to [email protected].
>> > To post to this group, send email to [email protected].
>> > Visit this group at http://groups.google.com/group/scrapy-users.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to