Hi there

Thank you very match for replies. I understand the true meaning in the link.
It's helpful for me.
Thanks,


2015年9月17日木曜日 0時20分42秒 UTC+9 Jeremy D:
>
> Wanted to also mention after Travis' message, using Scrapy's "View" 
> command is very helpful here if javascript and browser logic changes the 
> DOM.
>
> http://doc.scrapy.org/en/1.0/topics/commands.html#std:command-view
>
> On Wed, Sep 16, 2015 at 11:18 AM, Jeremy D <[email protected] 
> <javascript:>> wrote:
>
>> Bruce speaks the truth. Not only may it not know the reason, it might not 
>> even know a problem has happened. Whatever site you're scraping may change 
>> so slightly that your full xpath starts getting the wrong data, or data you 
>> don't care about. 
>>
>> The documentation is right on this one, using relative paths and being 
>> specific (contains(@href, 'image') etc) is the way to go here. Even so, 
>> theres always the potential for something to be incorrect. If you know what 
>> the values are going in, you can check with things like len() or if it 
>> begins/ends with specific values, those are ways to be sure you're getting 
>> the right data from the right spot. "Fragile" is a good word here
>>
>> On Wed, Sep 16, 2015 at 11:14 AM, bruce <[email protected] <javascript:>> 
>> wrote:
>>
>>> Hi.
>>>
>>> When dealing with scraping, and using xpath/dom operations, you need
>>> to always keep in mind, that the overall structure of the content is
>>> subject to change. It's fragile. If you use a "complete" xpath from
>>> the root(top) to the item in question, any "change" along the way, can
>>> result in an error. Unless you have sufficient error checking, your
>>> app might not "know" the reason for the error.
>>>
>>> If you create an xpath, that has the "minimum" of attributes to get
>>> you to where/what you need, it's more robust. But, it's still fragile,
>>> just not as fragile as using the complete xpath...
>>>
>>>
>>>
>>> On Wed, Sep 16, 2015 at 5:50 AM, michio basya <[email protected] 
>>> <javascript:>> wrote:
>>> > Hi,
>>> >
>>> >
>>> > I have a question that why never use full xpath.
>>> > I have been developing crawler with full xpath, and I notice this 
>>> sentence
>>> > in the documents.
>>> >
>>> > Are these reasons a tbody problem and a live browser dom problem? Or 
>>> any
>>> > other reasons?
>>> > If the reasons are only two problems, I will keep developing with full
>>> > xpath.
>>> > So please teach me any other reason to prevent a future problem.
>>> > Thanks,
>>> >
>>> > 
>>> http://doc.scrapy.org/en/1.0/topics/firefox.html#caveats-with-inspecting-the-live-browser-dom
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google 
>>> Groups
>>> > "scrapy-users" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an
>>> > email to [email protected] <javascript:>.
>>> > To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> > Visit this group at http://groups.google.com/group/scrapy-users.
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to