Re: FLV download script works, but I want to enhance it

2009-05-10 Thread Aahz
In article mailman.5307.1241805968.11746.python-l...@python.org,
The Music Guy  music...@alphaios.net wrote:
On Thu, May 7, 2009 at 9:29 AM, Aahz a...@pythoncraft.com wrote:

 Here's my download script to get you started figuring this out, it does
 the wget in the background so that several downloads can run in parallel
 from a single terminal window:

 #!/bin/bash

 echo Downloading $1
 wget $1  /dev/null 21 

Thanks for the reply, but unfortunately that script is going in the
complete wrong direction.

Not really; my point was that you could use something similar to process
files after downloading.

Firstly, downloading multiple files in tandem does not speed up the
process as it merely cuts up the already limited bandwidth into even
smaller pieces and delays every download in progress. It is much
better to queue downloads to occur one-by-one.

Secondly, that approach is based on bash rather than Python. I know I
could use the `` operator on a command line to background processes,
but I would like to be able to have more control through Python and
the use of the subprocess or threading modules.

Threading probably won't get you anywhere; I bet processing the files is
CPU-intensive.  I suggest looking into the multiprocessing module, which
would then run the file processing.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

It is easier to optimize correct code than to correct optimized code.
--Bill Harlan
--
http://mail.python.org/mailman/listinfo/python-list


Re: FLV download script works, but I want to enhance it

2009-05-08 Thread The Music Guy
On Thu, May 7, 2009 at 9:29 AM, Aahz a...@pythoncraft.com wrote:

 Here's my download script to get you started figuring this out, it does
 the wget in the background so that several downloads can run in parallel
 from a single terminal window:

 #!/bin/bash

 echo Downloading $1
 wget $1  /dev/null 21 
 --
 Aahz (a...@pythoncraft.com)           *         http://www.pythoncraft.com/

 It is easier to optimize correct code than to correct optimized code.
 --Bill Harlan
 --
 http://mail.python.org/mailman/listinfo/python-list


Aahz,

Thanks for the reply, but unfortunately that script is going in the
complete wrong direction.

Firstly, downloading multiple files in tandem does not speed up the
process as it merely cuts up the already limited bandwidth into even
smaller pieces and delays every download in progress. It is much
better to queue downloads to occur one-by-one.

Secondly, that approach is based on bash rather than Python. I know I
could use the `` operator on a command line to background processes,
but I would like to be able to have more control through Python and
the use of the subprocess or threading modules.
--
http://mail.python.org/mailman/listinfo/python-list


Re: FLV download script works, but I want to enhance it

2009-05-07 Thread Aahz
In article mailman.5134.1241579669.11746.python-l...@python.org,
The Music Guy  music...@alphaios.net wrote:

After I download the files, I usually want to convert them to another
video format using command line tools, and I usually convert each one
in a separate terminal since that way they can all be converted at the
same time rather than one-by-one. Oddly enough, converting tends to
take longer than downloading, but converting multiple files at once is
faster than converting them one-by-one.

Here's my download script to get you started figuring this out, it does
the wget in the background so that several downloads can run in parallel
from a single terminal window:

#!/bin/bash

echo Downloading $1
wget $1  /dev/null 21 
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

It is easier to optimize correct code than to correct optimized code.
--Bill Harlan
--
http://mail.python.org/mailman/listinfo/python-list


FLV download script works, but I want to enhance it

2009-05-05 Thread The Music Guy
I've written a script that can scan any number of web pages for urls
to .flv (Flash Video) files and then download them using urllib2. It's
kind of like a YouTube download utility, but more general-purpose (and
it doesn't actually work with YouTube. :P). The script works fine; it
even has support for outputting URLs of files it's found to stdout for
chaining to tools like wget.

After I download the files, I usually want to convert them to another
video format using command line tools, and I usually convert each one
in a separate terminal since that way they can all be converted at the
same time rather than one-by-one. Oddly enough, converting tends to
take longer than downloading, but converting multiple files at once is
faster than converting them one-by-one.

What I want to do is add an option to my script that will make it
begin converting each video using my command line tool after each is
downloaded, while at the same time continuing to download other files
in the background. I know this is possible using threading or
subprocesses, but I need help figuring out how to do it. Another thing
I want to do is allow the script to use wget for downloading instead
of urllib2.

The script is below. I've already added the options for the new
features, but they don't do anything yet.

#!/usr/bin/env python

import urllib
import re
import sys
import optparse

HELP =  %prog [-p] url1, url2, url3, ...

Flash Video Snagger (FLVSnag)

Tool for downloading files referenced in the source code of some
video-displaying websites. Simply pass a website URL as the first
argument and the HTML will be scanned for URLs to .flv files and
they will be downloaded.

Note that all given pages are scanned before the first video is
downloaded.

FIXME: Downloads may not occur in the order that they are found.

  - The Music Guy, 5/5/09
music...@alphaios.net


def main():
# Create the option parser and parse the options.
parser = optparse.OptionParser(usage=HELP)
parser.add_option(
'-p', '--print',
 action = 'store_true',
 default = False,
 help =
Causes the URLs to be printed to stdout (one URL per line) 
without actually downloading them. Good for piping.
)
parser.add_option(
'-d', '--dpg',
action = 'store_true',
default = False,
help =
TODO: Converts each downloaded file to DPG if dpgconv.py
is available 
from the PATH.
)
parser.add_option(
'-w', '--wget',
action = 'store_true',
default = False,
help =
TODO: Downloads each file using wget instead of using internal 
downloader. Only works if wget is available from the PATH.
)
ops,args = parser.parse_args()
if not args:
parser.print_usage()
return

# Create a pattern that matches embedded .flv file URLs.
urlre = re.compile(http\:\/\/(?!\?)[a-zA-Z0-9_ \.\/]*\.flv, re.I)

# Download the given URLs and scan the files for video URLs.
m = set()
for u in args:
# Try to download the pages.
try:
f = urllib.urlopen(u)
except IOError:
sys.exit('ERROR: Unable to open url %s.'%u)
s = f.read()
f.close()

# Add any located URLs to the set.
m = m.union(urlre.findall(s))

# Delete page to save space.
del s

# If the user did not specify to print video URLs, download the videos.
if not ops.p:
# Print a notice if no URLs are found.
if not m:
print No URLs found.
return

# Create a pattern to match video filename
fnre = re.compile([a-zA-Z0-9_ \.]*\.flv, re.I)

# Download each video
for v in m:
p = fnre.findall(v)[0]
print 'Downloading %s to %s...'%(v,p)
urllib.urlretrieve(v,p)
print 'done.'

# If the user said to print video URLs, just print, don't download.
else:
for v in m:
print v


if __name__ == '__main__':
main()
--
http://mail.python.org/mailman/listinfo/python-list