YouTube pages rely on Javascript to create the <video> element,
and your browser's XPath tool works because it operates on the rendered 
page, after Javascript has done its work.

Scrapy itself does not interpret Javascript instructions, it's not a 
browser,
so it can only work on what's inside the HTML source code when the web page 
is fetched.

You can see for example that the elements with ID "player-api", which 
contains "movie-player" in your screenshot,
is empty in the source code

 <div id="player-api" class="player-width player-height off-screen-target 
player-api"></div>

What you can see also is that this #player-api element is followed by 
<script> elements.
And while is not straighforward to read what this Javascript code is about,
you can use js2xml (disclaimer: I wrote and maintain js2xml)

Below is an example usage for js2xml using scrapy shell:

it parses Javascript statements from <script> elements in #player, and then 
extracts dicts.
There's an "args" key in the main script, that itself contains an  
url_encoded_fmt_stream_map 
key with some URLs for the video you may be after:

I'm using urlparse to decode what looks like a query string

(the full scrapy shell session 
is https://gist.github.com/redapple/8269818915cc2c337dc2)

$ scrapy shell "https://www.youtube.com/watch?v=1EFnX1UkXVU";
2014-12-30 15:18:09+0100 [default] DEBUG: Crawled (200) <GET 
https://www.youtube.com/watch?v=1EFnX1UkXVU> (referer: None)

In [1]: import js2xml
In [2]: import urlparse
In [3]: import pprint

In [4]: for script in response.css('#player 
script').xpath('string()').extract():
    jstree = js2xml.parse(script)
    data = js2xml.jsonlike.getall(jstree)
    for d in data:
        pprint.pprint(d)
   ...:     
{}
{'args': {'account_playback_token': 
'QUFFLUhqa0sweExRZno5OHZEaGcwWVVQaXAxVWh0NUNFZ3xBQ3Jtc0tseE9DRUw3cFVRbkFGN1hub2VmQlNERGl3WjFIQV84aTI0b0lxZnhwdDZKRl96N1g5eWN3dkZER1pFbVM4dS1FeWJoc1FJeTBXdS0tbU5LY1NsWngtSHY1R0hoTl9xdy1iWUNoam1nRFM2czEweVdMNA==',
          'adaptive_fmts': 
'size=1280x720&clen=51269588&fps=15&itag=136&init=0-709...bitrate=80798',
          'allow_embed': '1',
          'allow_ratings': '1',
          'atc': 
'a=3&b=nhjwMM7ySu8wj8OhutnokFK8Dvs&c=1419949090&d=1&e=1EFnX1UkXVU&c3a=28&c1a=1&hh=hKbH2J9f2WwblpFs2hvo0H17oZo',
          'author': 'Michael Herman',
          'avg_rating': '4.948387146',
          'c': 'WEB',
          'cc3_module': '1',
          'cc_asr': '1',
          'cc_font': 'Arial Unicode MS, arial, verdana, _sans',
          'cc_fonts_url': 
'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/fonts708.swf',
          'cc_load_policy': '2',
          'cc_module': 
'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/subtitle_module.swf',
          'cl': '82697338',
          'cr': 'FR',
          'csi_page_type': 'watch,watch7',
          'dash': '1',
          'dashmpd': 'http://manifest.googlevideo.com/api/...',
          'enablecsi': '1',
          'enablejsapi': 1,
          'eventid': 'IrSiVP-kC4v4cKrwgRg',
          'fexp': 
'900718,927622,931342,932404,938809,9405699,9406022,940927,940940,941004,943917,947209,947218,948124,952302,952605,952901,955110,955301,957103,957105,957201',
          'fmt_list': 
'22/1280x720/9/0/115,43/640x360/99/0/0,18/640x360/9/0/115,5/426x240/7/0/0,36/426x240/99/1/0,17/256x144/99/1/0',
          'hl': 'en_US',
          'host_language': 'en',
          'idpj': '-6',
          'iurl': 'https://i.ytimg.com/vi/1EFnX1UkXVU/hqdefault.jpg',
          'iurlhq': 'https://i.ytimg.com/vi/1EFnX1UkXVU/hqdefault.jpg',
          'iurlmaxres': 
'https://i.ytimg.com/vi/1EFnX1UkXVU/maxresdefault.jpg',
          'iurlmq': 'https://i.ytimg.com/vi/1EFnX1UkXVU/mqdefault.jpg',
          'iurlsd': 'https://i.ytimg.com/vi/1EFnX1UkXVU/sddefault.jpg',
          'iv3_module': '1',
          'iv_invideo_url': 
'https://www.youtube.com/annotations_invideo?cta=2&video_id=1EFnX1UkXVU',
          'iv_load_policy': '1',
          'iv_module': 
'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/iv_module.swf',
          'keywords': 'Scrapy,Python,scraping,python scrapy,web scraping',
          'ldpj': '-25',
          'length_seconds': '717',
          'loaderUrl': 'https://www.youtube.com/watch?v=1EFnX1UkXVU',
          'no_get_video_log': '1',
          'of': 'lNeUuIm8BRrYa4UFYW3Vbw',
          'plid': 'AAULb6kfjbEHoNwt',
          'pltype': 'contentugc',
          'probe_url': 
'http://r5---sn-5hn7ym7z.googlevideo.com/videogoodput?id=o-ACe-sIXL0cLvgJC4v5mIahOxT1PHw4zDPr8ZGMCgqwQI&source=goodput&range=0-99999&expire=1419952690&ip=89.84.122.217&ms=pm&mm=35&nh=EAk&sparams=id,source,range,expire,ip,ms,mm,nh&signature=3B4094AEE2FC1C0142BCEDB115F785607DEC0CF1.04988A5889C0348F50D45D76A7D6831155C91407&key=cms1',
          'ptk': 'youtube_none',
          'ssl': '1',
          'storyboard_spec': 
'https://i.ytimg.com/sb/1EFnX1UkXVU/storyboard3_L$L/$N.jpg|48#27#100#10#10#0#default#28F7DFM7_rVji4ZXj1Inr3KDPBE|80#45#145#10#10#5000#M$M#oy8NWkx8UFfdFYJoDyKoK-F6EUo|160#90#145#5#5#5000#M$M#RPAH69FExaDD6f0lYwoCjc64vI8',
          't': '1',
          'thumbnail_url': 'https://i.ytimg.com/vi/1EFnX1UkXVU/default.jpg',
          'timestamp': '1419949090',
          'title': 'Scraping Web Pages with Scrapy',
          'tmi': '1',
          'token': '1',
          'ttsurl': 'https://www.youtube.com/api/timedtext?...',
          'ucid': 'UCt7yOnL7bI7yCa1Xe_GTjJQ',
          'url_encoded_fmt_stream_map': 
'fallback_host=tc.v18.cache4.googlevideo.com&quality=hd720...',
          'video_id': '1EFnX1UkXVU',
          'view_count': '52035',
          'vq': 'auto',
          'watermark': 
',https://s.ytimg.com/yts/img/watermark/youtube_watermark-vflHX6b6E.png,https://s.ytimg.com/yts/img/watermark/youtube_hd_watermark-vflAzLcD6.png'},
 'assets': {'css': '//s.ytimg.com/yts/cssbin/www-player-vflPfi1TF.css',
            'html': '/html5_player_template',
            'js': 
'//s.ytimg.com/yts/jsbin/html5player-en_US-vflw4H1P-/html5player.js'},
 'attrs': {'id': 'movie_player'},
 'html5': False,
 'messages': {'player_fallback': ['Adobe Flash Player or an HTML5 supported 
browser is required for video playback.<br><a 
href="http://get.adobe.com/flashplayer/";>Get the latest Flash Player 
</a><br><a href="/html5">Learn more about upgrading to an HTML5 
browser</a>']},
 'min_version': '8.0.0',
 'params': {'allowfullscreen': 'true',
            'allowscriptaccess': 'always',
            'bgcolor': '#000000'},
 'sts': 16427,
 'url': 'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/watch_as3.swf',
 'url_v8': 'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/cps.swf',
 'url_v9as2': 'https://s.ytimg.com/yts/swfbin/player-vfly1u_c5/cps.swf'}
[]

In [5]: for script in response.css('#player 
script').xpath('string()').extract():
   ...:    jstree = js2xml.parse(script)
   ...:    data = js2xml.jsonlike.getall(jstree)
   ...:    for d in data:
   ...:        try:
   ...:            if d:
   ...:                pprint.pprint(urlparse.parse_qsl(d.get("args", 
{}).get("url_encoded_fmt_stream_map", "")))
   ...:        except:
   ...:             pass
   ...:         
[('fallback_host', 'tc.v18.cache4.googlevideo.com'),
 ('quality', 'hd720'),
 ('itag', '22'),
 ('type', 'video/mp4; codecs="avc1.64001F, mp4a.40.2"'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=716.985&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=22&mime=video%2Fmp4&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cratebypass%2Csource%2Cupn%2Cexpire&ratebypass=yes&signature=75A8510F49A9C73C72BC4F4A8759320481305D26.EA7ABB7DD01D7B4BA5228ABD8DF8DD47AB73A3A1&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0,fallback_host=tc.v20.cache6.googlevideo.com'),
 ('quality', 'medium'),
 ('itag', '43'),
 ('type', 'video/webm; codecs="vp8.0, vorbis"'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=0.000&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=43&mime=video%2Fwebm&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cratebypass%2Csource%2Cupn%2Cexpire&ratebypass=yes&signature=E17363F74C7068BEB4DB31FC90AEF2EA70A3C233.F634AC2BD1B5A6B27E1DDFB4FB09DE7C04D1DF0E&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0,fallback_host=tc.v13.cache4.googlevideo.com'),
 ('quality', 'medium'),
 ('itag', '18'),
 ('type', 'video/mp4; codecs="avc1.42001E, mp4a.40.2"'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=716.985&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=18&mime=video%2Fmp4&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Cratebypass%2Csource%2Cupn%2Cexpire&ratebypass=yes&signature=78201511AECE7F328D67AA08EC40E22777C62616.6B0C1787F391F30F1D28D8C2BCD6E67C71F1BB5F&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0,fallback_host=tc.v4.cache4.googlevideo.com'),
 ('quality', 'small'),
 ('itag', '5'),
 ('type', 'video/x-flv'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=716.983&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=5&mime=video%2Fx-flv&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Csource%2Cupn%2Cexpire&signature=DE27A5283FB425F79CC1ACBB67D0B20FF07D5BD5.DBACE3E830A573BF4092AC442C99278D4CFF549F&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0,fallback_host=tc.v4.cache5.googlevideo.com'),
 ('quality', 'small'),
 ('itag', '36'),
 ('type', 'video/3gpp; codecs="mp4v.20.3, mp4a.40.2"'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=717.125&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=36&mime=video%2F3gpp&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Csource%2Cupn%2Cexpire&signature=E9DD3B41DDA5B39F12D7311682DEB24A376F04C9.0C3EEEFED598AF77E877D361B57385CE5941303F&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0,fallback_host=tc.v9.cache5.googlevideo.com'),
 ('quality', 'small'),
 ('itag', '17'),
 ('type', 'video/3gpp; codecs="mp4v.20.3, mp4a.40.2"'),
 ('url',
  
'http://r3---sn-25ge7n7d.googlevideo.com/videoplayback?dur=717.217&id=o-AMERlvuyknt71bMvL2Sjki6y2WsGz0TDKn11unO3_SQy&mm=31&ip=89.84.122.217&key=yt5&itag=17&mime=video%2F3gpp&source=youtube&ms=au&fexp=900718%2C927622%2C931342%2C932404%2C938809%2C9405699%2C9406022%2C940927%2C940940%2C941004%2C943917%2C947209%2C947218%2C948124%2C952302%2C952605%2C952901%2C955110%2C955301%2C957103%2C957105%2C957201&mv=m&mt=1419949043&sver=3&initcwndbps=872500&sparams=dur%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Cmime%2Cmm%2Cms%2Cmv%2Csource%2Cupn%2Cexpire&signature=E4199F944FC4A5A1DFBAD4562EB628E62B53FD27.FA0A2D69378E3AB8B4E50FD55A2F64CA7A048EA1&expire=1419970690&upn=5QFvFRIqKzs&ipbits=0')]




On Tuesday, December 30, 2014 6:49:51 AM UTC+1, Gaurang shah wrote:
>
> Following is the details. 
> Os: Windows 7 64 bit
> Python 2.7
> Scrapy 0.25.1
>
> I don't understand the last question. I am using selector provided by 
> scrapy to get the node using xpath. Following is the code. 
>
> selector = Selector(response)
> view_count = 
> selector.xpath("//div[@class='watch-view-count']/text()")[0].extract().strip()
> video_url = 
> selector.xpath("//video[contains(@class,'html5-main-video')]/@src").extract()
>
>
> Gaurang Shah
> Blog: qtp-help.blogspot.com
> Mobile: +91 738756556
>
> On Tue, Dec 30, 2014 at 1:24 AM, bruce <[email protected] <javascript:>> 
> wrote:
>
>> Hey Gaurang,
>>
>> What's the OS, version of python, version of scrapy you're using?
>>
>> Does scrapy use urlib? or better, if you know, what lib does scrapy use 
>> for the url/xpath processing?
>>
>>
>>
>> On Mon, Dec 29, 2014 at 11:32 AM, Gaurang shah <[email protected] 
>> <javascript:>> wrote:
>>
>>> Sorry guys, Forgot to mentioned. All these xpath is able to identify the 
>>> elemenet using firepath add-on of firefox. 
>>>
>>> *//video *
>>> *//video[contains(@class,'html5-main-video')]/@src*
>>>
>>> *//div[@class='html5-video-container']/video/@src*
>>>
>>> *//div[@id='movie_player']/div[1]/video/@src*
>>>
>>> *//div[@id='player-api']/div[1]/div[1]/video/@src*
>>>
>>> *However none of them is working in scrapy ???*
>>>
>>> Gaurang Shah
>>> Blog: qtp-help.blogspot.com
>>> Mobile: +91 738756556
>>>
>>> On Mon, Dec 29, 2014 at 9:41 PM, bruce <[email protected] <javascript:>> 
>>> wrote:
>>>
>>>> Are you able to effectively create an xpath using your browser's 
>>>> xpath/dev tools?
>>>>
>>>> in firefox, you can use dom inspector, there are others as well, not 
>>>> sure of your browser..
>>>>
>>>> In other words, is the issue with the "video" element, or something 
>>>> else in your xpath?
>>>>
>>>> If you can resolve the xpath with a separate tool, that should give you 
>>>> direction to solve the issue.
>>>>
>>>>
>>>>
>>>> On Mon, Dec 29, 2014 at 7:38 AM, Gaurang shah <[email protected] 
>>>> <javascript:>> wrote:
>>>>
>>>>> Hi Guys, 
>>>>>
>>>>> I am trying to scrap the youtube site. And somehow the xpath which 
>>>>> fetches the video src is not working in scrapy. 
>>>>>
>>>>> Url: https://www.youtube.com/watch?v=1EFnX1UkXVU
>>>>>
>>>>>
>>>>> following xpaths is not working 
>>>>> *//video *
>>>>> *//video[contains(@class,'html5-main-video')]/@src*
>>>>>
>>>>>
>>>>> <https://lh3.googleusercontent.com/--_vqbGQxgWg/VKFLFyraflI/AAAAAAAACLY/2352f1VU0ds/s1600/Image%2B004.jpg>
>>>>> I am able to retrive xpath till,* //div[@id='player-api']*, after 
>>>>> that it's dead end. scrapy is not able to find any more node in this. 
>>>>> However there are nodes inside that as well. 
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected] <javascript:>.
>>>>> To post to this group, send email to [email protected] 
>>>>> <javascript:>.
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "scrapy-users" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/scrapy-users/nGisMymqofU/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected] <javascript:>.
>>>> To post to this group, send email to [email protected] 
>>>> <javascript:>.
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "scrapy-users" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/scrapy-users/nGisMymqofU/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to