Need advice: what is correct way to parse product attributes html table
with attributes groups and save results to 4 mysql tables: attribute,
attribute_description, attribute_group, attribute_group_description. Number
of attributes group in html table unknown, but we can count it with
product_attribute_group_number =
response.xpath('count(//th[@class="tech-specs-category"])').extract()
print '###product_attribute_group_number###',
int(float(product_attribute_group_number[0]))
We can loop over every attribute group with:
for x in range (1,product_attribute_group_number):
for sel in
response.xpath('//tr[th[@class="tech-specs-category"]][%s]/following-sibling::tr[count(.|//tr[th[@class="tech-specs-category"]][%s]/preceding-sibling::tr)=count(//tr[th[@class="tech-specs-category"]][%s]/preceding-sibling::tr)]|//tr[th[@class="tech-specs-category"]][%s]'
%(x, x+1, x+1, x)):
product_attribute_group_name =
sel.xpath('th[@class="tech-specs-category"]/text()').extract()
print '###product_attribute_group_name###',
product_attribute_group_name
item['product_attributes'] = {}
for prop_row in product_attributes:
try:
prop = prop_row.xpath('th/text()').extract()[0]
except IndexError, e:
print e# or pass, do nothing just ignore that row
prop = prop.strip()
try:
val = prop_row.xpath('td/text()').extract()[0]
except IndexError, e:
print e# or pass, do nothing just ignore that
row...
val = val.strip()
item['product_attributes'][prop] = val
yield item
Is it correct way with correct selector xpath?
Next question: what is correct selector xpath for last attributes group?
(It hasn`t following-sibling::tr)
Are there more elegant methods to parse html table with product attributes
which are grouped to attribute groups?
Table example:
Operating System:(attributes group name)
OS(attribute name) Windows 8(attribute value)
OS Language(attribute name) English(attribute value)
Audio:(attributes group name)
Speakers(attribute name) Stereo Speakers(attribute value)
Mic In(attribute name) Yes(attribute value)
Headphone(attribute name) Yes(attribute value)
Battery:(attributes group name)
Battery Type(attribute name) 4 Cell Li-ion(attribute value)
Battery life(attribute name) 41 WHr(attribute value)
<div class="parameters-wrapper">
<table class="techSpecs">
<tr>
<th class="tech-specs-category" colspan="2">Operating System:</th>
</tr>
<tr>
<th>OS</th>
<td>Windows 8</td>
</tr>
<tr>
<th>OS Language</th>
<td>English</td>
</tr>
<tr>
<th class="tech-specs-category" colspan="2">Audio:</th>
</tr>
<tr>
<th>Speakers</th>
<td>Stereo Speakers</td>
</tr>
<tr>
<th>Mic In</th>
<td>Yes</td>
</tr>
<tr>
<th>Headphone</th>
<td>Yes</td>
</tr>
<tr>
<th class="tech-specs-category" colspan="2">Battery:</th>
</tr>
<tr>
<th>Battery Type</th>
<td>4 Cell Li-ion</td>
</tr>
<tr>
<th>Battery life</th>
<td>41 WHr</td>
</tr>
</table>
</div>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.