Hello:
I am trying to capture data from a website and do not have a fixed
structure, so that I do not think to use xpath for each part. I've been
hours trying to delete nodes children of the captured xpath, but I can't,
only occurs to me do so via regular expressions...
The website has the following html:
<div id="highlighted">\n
<div class="bullets">\n
<p class="headLine"><span>text</span></p>\n
<ul>
<li>text 1</li>
<li>text 2</li>
<li>text 3</li>
<li>text 4</li>
</ul>
</div>\n \n\n
<p>\n Other text</p>\n
<br>
<h3>title</h3>\n\n
<p>description</p>\n\n
<p>\xa0</p>\n\n
<h3>title 2</h3>\n\n
<ul>
<li>optional</li>
</ul>
<p>
<br>Text</p>\n\n
<ul>
<li>option 1</li>
<li>option 2</li>
<li>option 3</li>
</ul>
<h3>title 3</h3>\n\n
<p>text</p>\n\n
<ul>
<li>text</li>
</ul>
<p>text</p>\n\n
<h3>title 4</h3>
</div>
I have tested with the following options:
response.xpath('//*[@id="highlighted"][not(@class="bullets")]').extract()
# It returns the html div without making any changes.
response.xpath('//*[@id="highlighted"]/*[not(@class="bullets")]').extract()
# Delete the contents of the div with class bullets, but it returns me
everything in an array (by selector *). I need the content in one field,
for use on the web.
And more several tests but that do not return values...
Is it not possible to get the content of a div by eliminating some
children?. Possibly I'm trying to do something impossible, believing that
it can be done without implementing code in python and we need to do it
using regular expressions.
I need to remove the div with class bullet, the first p node and the last
node h3. For while I hope if someone can tell me if it is feasible via
switches or do I have to implement code, I'll get as using regular
expressions in python (I'm new to this language). Thank you.
Regards
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.