Wow, thank you so so much for reminding me about the "tbody". This works well now !!!
On Sun, Apr 14, 2013 at 7:27 PM, Tom Evans <[email protected]> wrote: > On Sun, Apr 14, 2013 at 10:29 AM, <[email protected]> wrote: > > Hi all, > > > > I am trying to crawl the information from this link > > > > > http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781 > > > > and this is the code I use > > > >> link = > >> " > http://muaban.net/mua-ban-nha-quan-thu-duc-l5924-c32/quan-thu-duc-ban-nha1lau-2mt-truoc-sau-dg-ng-cong-tru-p-hiep-phu-q9-dt-4x21-5m--id15946781 > " > >> xPath = "id('pC_DV_tableHeader')/x:tbody/x:tr[4]/x:td[3]" > >> namespace = {'x': 'http://www.w3.org/1999/xhtml'} > >> > >> tree = lxml.html.parse(link) > >> arrayContent = tree.xpath(xPath + "/text()", namespaces=namespace) > >> > >> if len(arrayContent): > >> content = cgi.escape(arrayContent[0].encode("utf-8")) > > > > > > I use xPath checker add-on of firefox to read the xPath value and the > > namespace. However, when running the code, I always get the content > empty. > > How can I solve this ? > > > > Are you sure your xpath is correct? I'm not sure about that "id()" syntax. > Try: > > //x:table[@id="'pC_DV_tableHeader"]//x:tr[4]/x:td[3] > > Another thing to note, the DOM presented by Firefox is the result of > Firefox parsing and potentially fixing up the HTML code. For instance, > there is no <tbody> in the actual HTML for that table, Firefox always > inserts a <tbody> if it is missing when parsing a table. Does lxml > also insert a <tbody> if there is not one? If it doesn't, then your > xpath would never work. > > Cheers > > Tom > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/django-users?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-users?hl=en. For more options, visit https://groups.google.com/groups/opt_out.

