So, NBC is putting quite a bit of video online at nbcolympics.com.
However, if you decide to try and check out a bit of backstage action on your linux box, you'll be told (politely) "Go away, kid, ya bother me".

However, the intrepid folks over in the Ubuntu forums have come up with a way, albeit convoluted, for those of us on the fringes to take part.

Basically, they've created 3 different scripts to search the Olympic site and parse out the stream urls, which you can then pass into your favorite media player.

The first, and least effective for me is nbcsched.py
It takes two arguments, the sport and whether you want live or recorded.
It will then return a list of streams based on the criteria:

[EMAIL PROTECTED]:~/olympics$ python nbcsched.py badminton live
Mixed Doubles Bronze-Medal Match
mmsh://msolympics-ENC11-high.wm.llnwd.net/msftolympicslive-live/msolympics_ENC11_high?e=1218970072&h=5ad555469147663221ba5ed234a30a0b
[EMAIL PROTECTED]:~/olympics$ mplayer mms://msolympics-ENC11-high.wm.llnwd.net/msftolympicslive-live/msolympics_ENC11_high?e=1218970072&h=5ad555469147663221ba5ed234a30a0b
(Note that I removed the "h" from the URL as mplayer won't play mmsh URLS.)

[EMAIL PROTECTED]:~/olympics$ python nbcsched.py badminton rewind
Women's Singles Round of 64 Juliane SCHENK (GER) - Maria Kristin YULIANTI (INA)
mms://msolympics-73.wmod.llnwd.net/a2439/d1/msftolympics/nbcs/wmp/rewinds/BADH-BJ-SD11-080908-085503--SBR--HIGH.wmv?e=1218972595&h=372a35a66848b88d35f20b9143adcf5b

However, NBC appears to be playing games with the URLs, as they "expire" and you'll get a "Technical Difficulties" stream instead of your match. If you are fast on the cut and paste, it works about half the time. Piping works at about the same percentage.

The second method give me greater success.
nbcquick-rev2.py takes an "asset" from a URL string and converts that into the mms stream:

[EMAIL PROTECTED]:~/$ python nbcquick-rev2.py fn2h-bj-sd09-081708-175003
...
mms://msolympics-ENC09-high.wm.llnwd.net/msftolympicslive-live/msolympics_ENC09_high?e=1218970407\&h=a9da4c51e9efe8a28c7a08a7968a8198

$ mplayer 'mms://<stream here>'
(Note the escapes)

The asset number comes from the url on nbcolympics.com when you click on the video feed link.
<http://www.nbcolympics.com/video/player.html?assetid=fn2h-bj-sd09-081708-175003&channelcode=sportfe>
The bit between the = and the &

The third script is essentially the same but attempts to locate the "Silverlight" stream, supposedly higher quality.

The syntax is the same as the nbcquick script:
[EMAIL PROTECTED]:~/olympics$ python nbcquick-sl.py fn2h-bj-sd09-081708-175003
mmsh://msolympics-ENC09-high.wm.llnwd.net/msftolympicslive-live/msolympics_ENC09_high?e=1218970930&amp;h=972f0abe9eed99b93823287563601b11
[EMAIL PROTECTED]:~/olympics$

You can also pipe it to mplayer directly:

$ mplayer -cache 8096 `python nbcquick-sl.py tb4h-bj-sd30-081308-095504`

Why would you want to do this? Well, apart from the obvious factor that most Olympic events don't have a chance of getting broadcast(they don't feature skimpy bikinis or Michael Phelps), the footage is often the raw camera footage. No announcers speaking over the action or inane commentary. The sound is generally raw, you can hear the referees admonishing competitors, for example.

Also, it seems that a lot of the time, the NBC folks forget the cameras are live, or just don't care. My moment of zen was watching the Sailing docks and a very angry Chinese official tearing into a sailing crew loading their boat. I have no idea what the problem was, but you wouldn't see that on the broadcast edit. Also a fencing referee having a stern talk with a Italian? fencer. I don't speak French, so I'm not sure what it was about either.

You can also use mimms to save the files for later viewing if you prefer. mimms will also play mmsh urls: $ mimms `python ~/olympics/nbcquick-sl.py 0813_sd_ttm_ch_l0381` KeyPoints.wmv $ mimms 'mmsh://msolympics-ENC36-high.wm.llnwd.net/msftolympicslive-live/msolympics_ENC36_high?e=1218969242&amp;h=d41c23ed3dfcdd4d99c7d3cb02ad82c1' - | tee dock.wmv | mplayer -

This is definately not the easiest route, but still, if you are a fan of sports that don't get much airtime, with a bit of fussing you can see what the first class citizens of the Windows world see.

I'm will attach the scripts to this email, if they don't survive, you can find them on the euglug server.
<http://test.euglug.org/files/nbcquick-rev2.py.txt>
<http://test.euglug.org/files/nbcquick-sl.py.txt>
<http://test.euglug.org/files/nbcsched.py.txt>

If that fails, the original Ubuntu thread can be found here:
<http://ubuntuforums.org/showthread.php?t=883142>

-ajb
#!/usr/bin/env python
import re
import sys, os

from sys import argv
from urllib import urlopen
sys.path.append('/usr/lib/python2.5/site-packages/oldxml')
from xml.dom import Node
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()

sport = mode = mpat = mre = None
if len(argv) > 1:
    aid = argv[1] # direct asset ID

#aidpat = r'assetid=([0-9]+)&'
aidpat = r"assetid=([0-9a-zA-Z\-]+)&"
aidre = re.compile(aidpat,re.MULTILINE)
asxpat = r'"high":"([^"]+)"'
asxre = re.compile(asxpat,re.MULTILINE)
refpat = r'REF HREF="([^"]+)"'
refre = re.compile(refpat,re.MULTILINE)

def fixent(s):
    s = re.sub(r'Videos & Photos',r'Videos &amp; Photos',s)
    s = re.sub(r'<script([^>]*)>',r'<script\1><![CDATA[',s)
    s = re.sub(r'</script>',r']]></script>',s)
    return s

def getText(e):
    "Recursively get all the text from element e."
    texts = ['']
    for n in e.childNodes:
        if n.nodeType is Node.TEXT_NODE:
            texts.append(n.data)
        else:
            texts.append(getText(n))
    return ''.join(texts)

class Sched:
    def __init__(self):
        #self.live = [self.split(a) for a in self.alinks if mode in a.childNodes[0].data]
        self.live = []
        aurl = "http://www.nbcolympics.com/video/modules/json/resourcedata/"; + aid + "/asset.html"
        u = urlopen(aurl)
        asset = u.read()
        u.close()
        # Skip stream URLs for "sl" (silverlight?).
        asset = asset[asset.find('"wmp"'):]
        m = re.search(asxre,asset)
        asxurl = m.groups()[0]
        # Fetch ASX file.
        u = urlopen(asxurl)
        asxfile = u.read();
        u.close()
        print asxfile
        print
        # Extract stream URL.
        m = re.search(refre,asxfile)
        mmsurl = m.groups()[0]
        mmsurl = re.sub(r'^http','mms',mmsurl)
        mmsurl = re.sub(r'&amp;','\\&',mmsurl)
        #chomp mmsurl
        self.live.append( (mmsurl) )

nowsched = Sched()

for (asset) in nowsched.live:
#    print asset
#    print
    cmd = "totem-xine %s" % asset
    print cmd
    os.system(cmd)

# vim:ts=2:sw=2:tw=72:ci:
#!/usr/bin/env python
import re
import sys

from sys import argv
from urllib import urlopen
sys.path.append('/usr/lib/python2.5/site-packages/oldxml')
from xml.dom import Node
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()

sport = mode = mpat = mre = None
if len(argv) > 1:
	aid = argv[1] # direct asset ID

aidpat = r'assetid=([0-9]+)&'
aidre = re.compile(aidpat,re.MULTILINE)
asxpat = r'"high":"([^"]+)"'
asxre = re.compile(asxpat,re.MULTILINE)
pippat = r'"pip":"([^"]+)"'
pipre = re.compile(pippat,re.MULTILINE)
lowpat = r'"low":"([^"]+)"'
lowre = re.compile(lowpat,re.MULTILINE)
refpat = r'REF HREF="([^"]+)"'
refre = re.compile(refpat,re.MULTILINE)

def fixent(s):
	s = re.sub(r'Videos & Photos',r'Videos &amp; Photos',s)
	s = re.sub(r'<script([^>]*)>',r'<script\1><![CDATA[',s)
	s = re.sub(r'</script>',r']]></script>',s)
	return s

def getText(e):
	"Recursively get all the text from element e."
	texts = ['']
	for n in e.childNodes:
		if n.nodeType is Node.TEXT_NODE:
			texts.append(n.data)
		else:
			texts.append(getText(n))
	return ''.join(texts)

class Sched:
	def __init__(self):
		#self.live = [self.split(a) for a in self.alinks if mode in a.childNodes[0].data]
		self.live = []
		aurl = "http://www.nbcolympics.com/video/modules/json/resourcedata/"; + aid[:4] + "/" + aid[4:] + "/asset.html"
		u = urlopen(aurl)
		asset = u.read()
		u.close()
		# Skip to stream URLs for "sl" (silverlight?).
		asset = asset[asset.find('"sl"'):]
		s = re.search(asxre,asset)
		# Skip to stream URLs for "wmp"
		asset = asset[asset.find('"wmp"'):]
		m = re.search(asxre,asset)
		# fall back to silverlight if we have to
		if m is None:
				m = s
		asxurl = m.groups()[0]
		# Fetch ASX file.
		u = urlopen(asxurl)
		asxfile = u.read();
		u.close()
		# Extract stream URL.
		m = re.search(refre,asxfile)
		mmsurl = m.groups()[0]
		mmsurl = re.sub(r'^http','mmsh',mmsurl)
		self.live.append( (mmsurl) )

nowsched = Sched()

for (asset) in nowsched.live:
	print asset
	print


# vim:ts=2:sw=2:tw=72:ci:
#!/usr/bin/env python
import re
import sys

from sys import argv
from urllib import urlopen
sys.path.append('/usr/lib/python2.5/site-packages/oldxml')
from xml.dom import Node
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()

sport = mode = mpat = mre = None
if len(argv) > 1:
	sport = argv[1] # which sport
if len(argv) > 2:
	mode = argv[2] # live or recored ('live' or 'rewind')
if len(argv) > 3:
	mpat = argv[3] # pattern to be contained in match name
	mre = re.compile(mpat,re.IGNORECASE)
if sport is None:
	sport = 'badminton'
if mode is None or mode == 'live':
	mode = 'Live'
elif mode == 'rewind':
	mode = 'Rewind'

nowurl = "http://www.nbcolympics.com/"+sport+"/resultsandschedules/index.html";

aidpat = r'assetid=([0-9]+)&'
aidre = re.compile(aidpat,re.MULTILINE)
asxpat = r'"high":"([^"]+)"'
asxre = re.compile(asxpat,re.MULTILINE)
refpat = r'REF HREF="([^"]+)"'
refre = re.compile(refpat,re.MULTILINE)

def fixent(s):
	s = re.sub(r'Videos & Photos',r'Videos &amp; Photos',s)
	s = re.sub(r'<script([^>]*)>',r'<script\1><![CDATA[',s)
	s = re.sub(r'</script>',r']]></script>',s)
	return s

def getText(e):
	"Recursively get all the text from element e."
	texts = ['']
	for n in e.childNodes:
		if n.nodeType is Node.TEXT_NODE:
			texts.append(n.data)
		else:
			texts.append(getText(n))
	return ''.join(texts)

class Sched:
	def livelinks(self):
		"For each a-tag, get asset information and match name."
		#self.live = [self.split(a) for a in self.alinks if mode in a.childNodes[0].data]
		self.live = []
		for a in self.alinks:
			if mode not in a.childNodes[0].data: # live or recorded
				continue
			# Extract match title.
			tr = a.parentNode.parentNode.parentNode
			match = getText(tr.childNodes[1])
			match = match.replace(u'\xa0',' ')
			match = match.replace('\n',' ')
			match = match.replace('\t',' ')
			match = re.sub(' +',' ',match)
			if mpat and not re.search(mre,match): # skip unwanted matches
				continue
			# Extract asset file URL and fetch it.
			m = re.search(aidre,a.getAttribute('href'))
			aid = m.groups()[0]
			aurl = "http://www.nbcolympics.com/video/modules/json/resourcedata/"; + aid[:4] + "/" + aid[4:] + "/asset.html"
			u = urlopen(aurl)
			asset = u.read()
			u.close()
			# Skip stream URLs for "sl" (silverlight?).
			asset = asset[asset.find('"wmp"'):]
			m = re.search(asxre,asset)
			asxurl = m.groups()[0]
			# Fetch ASX file.
			u = urlopen(asxurl)
			asxfile = u.read();
			u.close()
			# Extract stream URL.
			m = re.search(refre,asxfile)
			mmsurl = m.groups()[0]
			mmsurl = re.sub(r'^http','mmsh',mmsurl)
			self.live.append( (match,mmsurl) )

	def __init__(self,url):
		self.url = url
		u = urlopen(self.url) # open url
		self.html = u.read()  # load html
		u.close()
		self.html = fixent(self.html)
		self.doc = reader.fromString(self.html) # parse html
		# Find all links with an assetid
		links = self.doc.getElementsByTagName('a')
		self.alinks = [a for a in links if re.search(aidre,a.getAttribute('href'))]
		# For each link, extract the match title, whether it's live, and the
		# asset file.
		#self.live = [self.split(a) for a in self.alinks if 'Live' in a.childNodes[0].data]
		#self.live = [self.split(a) for a in self.alinks if mode in a.childNodes[0].data]
		self.livelinks()

nowsched = Sched(nowurl)

for (match,asset) in nowsched.live:
	print match
	print asset
	print


# vim:ts=2:sw=2:tw=72:ci:
_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug

Reply via email to