Something like:
for newitem in sel.xpath(u'//div[@id="wb_Text4"]//u'):
print newitem.xpath(u'./text()').extract()
print newitem.xpath(u'(./following-sibling::text())[1]').extract()
print
Saludos,
-- luismiguel (@lmorillas)
2014-06-01 11:16 GMT+02:00 Nikolaos-Digenis Karagiannis <[email protected]>:
> I misinterpreted the specification there. Also, in other implementations I
> found it possible to start with a text node as the context node and select
> parents. siblings etc. With lxml.etree once you select a text node you get a
> text result and you are done, no more xpath() methods on this object. Scrapy
> suppresses this
> https://github.com/scrapy/scrapy/blob/554102fd70b14ee83109003cf77ab3a4f91f4f58/scrapy/selector/unified.py#L88-L92
> and I didn't notice at first.
> I 'd call this a bug in lxml (or libxml2).
>
>
> On Sunday, 1 June 2014 11:18:56 UTC+3, Nikolaos-Digenis Karagiannis wrote:
>>
>> Usually you can just count(preceding-sibling::u|self::u) and group them by
>> this count.
>> But alas! here you can not, because the sibling axis does not work on
>> text() nodes.
>> http://www.w3.org/TR/xpath/#node-tests -> Bullet point 3: "For other axes,
>> the principal node type is element"
>> Types of nodes: http://www.w3.org/TR/xpath/#data-model
>> Try counting <u> nodes manually.
>>
>> On Sunday, 1 June 2014 04:57:34 UTC+3, Jaspreet Singh wrote:
>>>
>>> Hi,
>>>
>>> I am looking to scrape a page where the required items are adjacent in
>>> pairs having a single parent node.
>>>
>>> The page is http://www.intradaystocktips.org/stocks_to_watch_today.php
>>>
>>> I want the xpath to be specified such that "Tata Motors Ltd" and the
>>> following text i.e. "Automobile major reported a net profit of Rs 3,920
>>> crore during Jan-March quarter, down 0.3 per cent, against a net profit of
>>> Rs 3,931 crore, in the corresponding quarter last fiscal" is the first item.
>>> Similarly the second item will be "Trent Ltd" followed by "Undeterred by
>>> the BJP's apparently unyielding stance on foreign direct investment (FDI) in
>>> multi-brand retail, Tesco is going ahead with its proposed $110 million
>>> investment to open stores in a joint venture with Tata's Trent Hypermarket.
>>> ".
>>>
>>> In short I need to select a node along with its adjacent node (i.e.
>>> combining adjacent nodes) in a single item of the selection list.
>>>
>>> How can i create a selection using an xpath for the above rule?
>>>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.