Need advice: what is correct way to parse product attributes html table 
with attributes groups and save results to 4 mysql tables: attribute, 
attribute_description, attribute_group, attribute_group_description. Number 
of attributes group in html table unknown, but we can count it with
product_attribute_group_number = 
response.xpath('count(//th[@class="tech-specs-category"])').extract()
print '###product_attribute_group_number###', 
int(float(product_attribute_group_number[0]))

We can loop over every attribute group with:
for x in range (1,product_attribute_group_number):
    for sel in 
response.xpath('//tr[th[@class="tech-specs-category"]][%s]/following-sibling::tr[count(.|//tr[th[@class="tech-specs-category"]][%s]/preceding-sibling::tr)=count(//tr[th[@class="tech-specs-category"]][%s]/preceding-sibling::tr)]|//tr[th[@class="tech-specs-category"]][%s]'
 
%(x, x+1, x+1, x)):
        product_attribute_group_name = 
sel.xpath('th[@class="tech-specs-category"]/text()').extract()
        print '###product_attribute_group_name###', 
product_attribute_group_name
   item['product_attributes'] = {}
   for prop_row in product_attributes:
            try:
                prop = prop_row.xpath('th/text()').extract()[0]
            except IndexError, e:
                print e# or pass, do nothing just ignore that row
           prop = prop.strip()
            try:  
                val = prop_row.xpath('td/text()').extract()[0]
            except IndexError, e: 
                print e# or pass, do nothing just ignore that 
row...           
            val = val.strip()
            item['product_attributes'][prop] = val 
            yield item
Is it correct way with correct selector xpath?
Next question: what is correct selector xpath for last attributes group? 
(It hasn`t following-sibling::tr)
Are there more elegant methods to parse html table with product attributes 
which are grouped to attribute groups?   

Table example:
           Operating System:(attributes group name)
OS(attribute name)     Windows 8(attribute value)
OS Language(attribute name)         English(attribute value)
           Audio:(attributes group name)
Speakers(attribute name)     Stereo Speakers(attribute value)
Mic In(attribute name)     Yes(attribute value)
Headphone(attribute name)     Yes(attribute value)
          Battery:(attributes group name)
Battery Type(attribute name)     4 Cell Li-ion(attribute value)
Battery life(attribute name)     41 WHr(attribute value)

<div class="parameters-wrapper">
<table class="techSpecs">
<tr>
<th class="tech-specs-category" colspan="2">Operating System:</th>
</tr>
<tr>
    <th>OS</th>
    <td>Windows 8</td>
</tr>
<tr>
    <th>OS Language</th>
    <td>English</td>
</tr>
<tr>
<th class="tech-specs-category" colspan="2">Audio:</th>
</tr>
<tr>
    <th>Speakers</th>
    <td>Stereo Speakers</td>
</tr>
<tr>
    <th>Mic In</th>
    <td>Yes</td>
</tr>
<tr>
    <th>Headphone</th>
    <td>Yes</td>
</tr>
<tr>
<th class="tech-specs-category" colspan="2">Battery:</th>
</tr>
<tr>
    <th>Battery Type</th>
    <td>4 Cell Li-ion</td>
</tr>
<tr>
    <th>Battery life</th>
    <td>41 WHr</td>
</tr>
</table>
</div>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to