Thanks a lot. It worked! On Sunday, June 1, 2014 4:02:45 PM UTC+5:30, Luis Miguel Morillas wrote: > > Something like: > > for newitem in sel.xpath(u'//div[@id="wb_Text4"]//u'): > print newitem.xpath(u'./text()').extract() > print newitem.xpath(u'(./following-sibling::text())[1]').extract() > print > Saludos, > > -- luismiguel (@lmorillas) > > > 2014-06-01 11:16 GMT+02:00 Nikolaos-Digenis Karagiannis < > [email protected] <javascript:>>: > > I misinterpreted the specification there. Also, in other implementations > I > > found it possible to start with a text node as the context node and > select > > parents. siblings etc. With lxml.etree once you select a text node you > get a > > text result and you are done, no more xpath() methods on this object. > Scrapy > > suppresses this > > > https://github.com/scrapy/scrapy/blob/554102fd70b14ee83109003cf77ab3a4f91f4f58/scrapy/selector/unified.py#L88-L92 > > > and I didn't notice at first. > > I 'd call this a bug in lxml (or libxml2). > > > > > > On Sunday, 1 June 2014 11:18:56 UTC+3, Nikolaos-Digenis Karagiannis > wrote: > >> > >> Usually you can just count(preceding-sibling::u|self::u) and group them > by > >> this count. > >> But alas! here you can not, because the sibling axis does not work on > >> text() nodes. > >> http://www.w3.org/TR/xpath/#node-tests -> Bullet point 3: "For other > axes, > >> the principal node type is element" > >> Types of nodes: http://www.w3.org/TR/xpath/#data-model > >> Try counting <u> nodes manually. > >> > >> On Sunday, 1 June 2014 04:57:34 UTC+3, Jaspreet Singh wrote: > >>> > >>> Hi, > >>> > >>> I am looking to scrape a page where the required items are adjacent > in > >>> pairs having a single parent node. > >>> > >>> The page is http://www.intradaystocktips.org/stocks_to_watch_today.php > >>> > >>> I want the xpath to be specified such that "Tata Motors Ltd" and the > >>> following text i.e. "Automobile major reported a net profit of Rs > 3,920 > >>> crore during Jan-March quarter, down 0.3 per cent, against a net > profit of > >>> Rs 3,931 crore, in the corresponding quarter last fiscal" is the first > item. > >>> Similarly the second item will be "Trent Ltd" followed by "Undeterred > by > >>> the BJP's apparently unyielding stance on foreign direct investment > (FDI) in > >>> multi-brand retail, Tesco is going ahead with its proposed $110 > million > >>> investment to open stores in a joint venture with Tata's Trent > Hypermarket. > >>> ". > >>> > >>> In short I need to select a node along with its adjacent node (i.e. > >>> combining adjacent nodes) in a single item of the selection list. > >>> > >>> How can i create a selection using an xpath for the above rule? > >>> > > -- > > You received this message because you are subscribed to the Google > Groups > > "scrapy-users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to [email protected] <javascript:>. > > To post to this group, send email to [email protected] > <javascript:>. > > Visit this group at http://groups.google.com/group/scrapy-users. > > For more options, visit https://groups.google.com/d/optout. >
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
