How to write css selector for title, nextpage link and description?

Bolt Clock Tue, 04 Nov 2014 06:26:13 -0800


I practice Scrapy and want to ask a question:


*https://eapplicant.northshore.org/psc/psapp/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL
 
<https://eapplicant.northshore.org/psc/psapp/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL>*

*Please let me know how to select the css selector path for title, 
description and next page link*.

*Language : Python + scrapy + scrapinghub/splash*

the website I want to scrap has a structure like this:

      *  <div id="divgbHRS_CE_JO_EXT_I$0" style="width:613px;height:206px; ">   
              ##  This is the full path of table.***
        <table id="gbHRS_CE_JO_EXT_I$0" width="613" cellspacing="0" 
cellpadding="0" border="0" dir="ltr" style="overflow:hidden;">
        <tbody>
        <tr>
        <td valign="top" style="width:613px;">
        <div id="divgbrHRS_CE_JO_EXT_I$0" 
onscroll="ptGridObj_win0.doOnScroll('HRS_CE_JO_EXT_I$0',1);" 
style="width:613px;height:206px; overflow-x:hidden;overflow-y:hidden;">
        <table id="tdgbrHRS_CE_JO_EXT_I$0" cellspacing="0" cellpadding="2" 
border="0" cols="5" dir="ltr" style="width:613px;">
        <tbody>
        <tr id="trHRS_CE_JO_EXT_I$0_row1" 
onmouseout="hoverLightTR('rgb(249,254,203)','',1,'trHRS_CE_JO_EXT_I$0_row1');" 
onmouseover="hoverLightTR('rgb(249,254,203)','',0,'trHRS_CE_JO_EXT_I$0_row1');" 
onclick="HighLightTR('rgb(212,219,217)','','trHRS_CE_JO_EXT_I$0_row1');">
        <td id="tdHRS_CE_JO_EXT_I$0#0" class="PSLEVEL1GRIDODDROW" width="20" 
nowrap="nowrap" height="54" align="center" style="">
        <div id="win0divSELECT$0">
        <input id="SELECT$chk$0" type="hidden" value="N" name="SELECT$chk$0">
        <input id="SELECT$0" class="PSCHECKBOX" type="checkbox" 
onclick="setupTimeout2(); 
this.form.SELECT$chk$0.value=(this.checked?'Y':'N');doFocus_win0(this,false,true);"
 value="Y" tabindex="99" name="SELECT$0">
        </div>
        </td>
        <td id="tdHRS_CE_JO_EXT_I$0#1" class="PSLEVEL1GRIDODDROW" width="83" 
align="left" style="">
        <td id="tdHRS_CE_JO_EXT_I$0#2" class="PSLEVEL1GRIDODDROW" width="182" 
align="left" style="">
        ***<div id="win0divPOSTINGTITLE$0" style="width:182px;">                
                  ## Here I need to add css selectors for this title.***
        </td>
        <td id="tdHRS_CE_JO_EXT_I$0#3" class="PSLEVEL1GRIDODDROW" width="83" 
align="left" style="">
        <td id="tdHRS_CE_JO_EXT_I$0#4" class="PSLEVEL1GRIDODDROW" align="left" 
style="">
        </tr>  `enter code here`

This is the next page html :

<div id="win0divHRS_APPL_WRK_HRS_LST_NEXT">
    <span class="PSHYPERLINK" title="Next In List">
    *<a id="HRS_APPL_WRK_HRS_LST_NEXT" class="PSHYPERLINK" 
href="javascript:submitAction_win0(document.win0,'HRS_APPL_WRK_HRS_LST_NEXT');" 
tabindex="74" ptlinktgt="pt_replace" name="HRS_APPL_WRK_HRS_LST_NEXT">Next</a>  
## Here i have to extract next page jobs using splash.*
    </span>
    </div>
    </td>

Spider code:

============

def parse(self,response):
    selector = Selector(response)
    links = []
    *for link in selector.css('div.win0divHRS_CE_JO_EXT_I$0 
div.trHRS_CE_JO_EXT_I$0_row1 > 
a.title.heading.trHRS_CE_JO_EXT_I$0_row1-title::attr(href)').extract():      ## 
  Here is my code and don't work*
        yield Request(urlparse.urljoin(response.url, link),
                      callback=self.parse_listing_page,
                      #meta={"use_splash": False}
                      )

   *  next_page_link **= selector.css('div.pages > 
a:last-child:not(.disabled)')    ##   Here is my code and don't work*
     if next_page_link:
         def increment10(matchobj):
             *return "st="+str(int(matchobj.group("pagenum"))+10                
   ##   Here is my code and don't work*
         next_page_url = re.sub('', increment10, response.url) 
         print "next page:", next_page_url
         yield Request(next_page_url, self.parse,
                       #meta={"use_splash": True},
                       dont_filter=True)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

How to write css selector for title, nextpage link and description?

Reply via email to