Re: Scrapy shell returns empty list!?

Morad Edwar Tue, 17 Mar 2015 04:42:18 -0700

Please do it again but after step one run the following code :
    print response.url
And make give us the output.


Morad Edwar,
Software Developer | Bkam.com
On Mar 17, 2015 1:13 PM, "Kais DAI" <[email protected]> wrote:

> This is what I did:
>
>    1. I opened the command line in windows and run the follwing command: 
> *scrapy
>    shell 
> https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1
>    
> <https://www.linkedin.com/job/jobs-in-san-francisco-ca/?page_num=1&trk=jserp_pagination_1>*
>    2. Then, I run this command:
>    
> *sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]/div[@class="content"]/span/a[@class="title"]/text()’).extract()
>  * In
>    this case, an empty list is returned *[] *Also, the same thing with
>    this xpath selection:
>    
> *sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[1]/div/span/a').extract()*
>
> Did you obtained a result by following the same steps?
> Thank you for your help.
>
> Regards,
> K.
>
> 2015-03-17 11:34 GMT+01:00 Morad Edwar <[email protected]>:
>
>> I used 'scrapy shell' and your xpath worked fine!!
>> and when i changed 'li[1]' to 'li' it scrapped all the jobs titles.
>>
>>
>> On Monday, March 16, 2015 at 6:19:01 PM UTC+2, DataScience wrote:
>>>
>>> Actually, I've checked the "response.body" and it doesn't matches the
>>> content that I have in the webpage.
>>> I am really confused, what can I do in this case?
>>>
>>> Le lundi 16 mars 2015 17:15:14 UTC+1, Travis Leleu a écrit :
>>>>
>>>> It doesn't look to me like it's writing the HTML to the DOM with j.s.,
>>>> as you noted.
>>>>
>>>> The big concern I have is that you are assuming the HTML content in
>>>> your browser is the same as in your code.  How have you asserted this?
>>>>
>>>> On Mon, Mar 16, 2015 at 9:02 AM, DataScience <[email protected]> wrote:
>>>>
>>>>> Thank you Travis for you quick feedback.
>>>>>
>>>>> I am testing scrapy on this specefic webpage and try to get the job
>>>>> offers (and not profiles).
>>>>> I read in some forums that it may be due to the website which is
>>>>> using Javascript to build most of the page, so the elements I want do
>>>>> not appear in the HTML source of the page. I've checked by disabling
>>>>> Javascript and reloading the page, but the result has been displayed on 
>>>>> the
>>>>> page (I've also checked the network in firbug by filtering XHR and looked
>>>>> into the POST...and nothing).
>>>>>
>>>>> Any help would be more than welcome.
>>>>> Thank you.
>>>>>
>>>>>
>>>>> Le lundi 16 mars 2015 16:26:41 UTC+1, Travis Leleu a écrit :
>>>>>>
>>>>>> Linkedin can be a tough site to scrape, as they generally don't want
>>>>>> their data in other people's hands.  You will need to use a user-agent
>>>>>> switcher (you don't mention what UA you are sending), and most likely
>>>>>> require a proxy in addition.
>>>>>>
>>>>>> If you are looking to scrape the entirety of linkedin, it's > 30
>>>>>> million profiles.  I've found it more economical to purchase a linkedin
>>>>>> data dump from scrapinghub.com than to scrape it myself.
>>>>>>
>>>>>> On Mon, Mar 16, 2015 at 8:05 AM, DataScience <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Scrapy Guys,
>>>>>>>
>>>>>>> Scrapy returns me an empty list while using shell to pick a simple
>>>>>>> "title" field from this web page: http://goo.gl/dBR8P4
>>>>>>> I've used:
>>>>>>>
>>>>>>>    - sel.xpath(‘//div[@id="results-rail"]/ul[@class="jobs"]/li[1]
>>>>>>>    /div[@class="content"]/span/a[@class="title"]/text()’).extract()
>>>>>>>    - sel.xpath('html/body/div[3]/div/div[2]/div[2]/div[1]/ul/li[
>>>>>>>    1]/div/span/a').extract()
>>>>>>>    - ...
>>>>>>>
>>>>>>> I verified the issue of the POST with XHR using firebug, and I think
>>>>>>> there are no relationships with information generated using js code 
>>>>>>> (what
>>>>>>> do you think?).
>>>>>>>
>>>>>>> Can you please help me to figure out with this problem?
>>>>>>> Thank you in Advance.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> K.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "scrapy-users" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/scrapy-users/BSmdIyfxiC4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy shell returns empty list!?

Reply via email to