Wow, thank you so so much for reminding me about the "tbody". This works
well now !!!


On Sun, Apr 14, 2013 at 7:27 PM, Tom Evans <[email protected]> wrote:

> On Sun, Apr 14, 2013 at 10:29 AM,  <[email protected]> wrote:
> > Hi all,
> >
> > I am trying to crawl the information from this link
> >
> >
> http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781
> >
> > and this is the code I use
> >
> >> link =
> >> "
> http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781
> "
> >> xPath =  "id('pC_DV_tableHeader')/x:tbody/x:tr[4]/x:td[3]"
> >> namespace = {'x': 'http://www.w3.org/1999/xhtml'}
> >>
> >> tree = lxml.html.parse(link)
> >> arrayContent = tree.xpath(xPath + "/text()", namespaces=namespace)
> >>
> >> if len(arrayContent):
> >>      content = cgi.escape(arrayContent[0].encode("utf-8"))
> >
> >
> > I use xPath checker add-on of firefox to read the xPath value and the
> > namespace. However, when running the code, I always get the content
> empty.
> > How can I solve this ?
> >
>
> Are you sure your xpath is correct? I'm not sure about that "id()" syntax.
> Try:
>
> //x:table[@id="'pC_DV_tableHeader"]//x:tr[4]/x:td[3]
>
> Another thing to note, the DOM presented by Firefox is the result of
> Firefox parsing and potentially fixing up the HTML code. For instance,
> there is no <tbody> in the actual HTML for that table, Firefox always
> inserts a <tbody> if it is missing  when parsing a table. Does lxml
> also insert a <tbody> if there is not one? If it doesn't, then your
> xpath would never work.
>
> Cheers
>
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/django-users?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to