Re: Suggestions on mechanism or existing code - maintain persistence of file download history

Chris Angelico Wed, 29 Jan 2020 12:30:23 -0800

On Thu, Jan 30, 2020 at 7:06 AM jkn <[email protected]> wrote:
>
> Hi all
>     I'm almost embarrassed to ask this as it's "so simple", but thought I'd 
> give
> it a go...


Hey, nothing wrong with that!

> I want to be a able to use a simple 'download manager' which I was going to 
> write
> (in Python), but then wondered if there was something suitable already out 
> there.
> I haven't found it, but thought people here might have some ideas for 
> existing work, or approaches.
>
> The situation is this - I have a long list of file URLs and want to download 
> these
> as a 'background task'. I want this to process to be 'crudely persistent' - 
> you
> can CTRL-C out, and next time you run things it will pick up where it left 
> off.

A decent project. I've done this before but in restricted ways.

> The download part is not difficult. Is is the persistence bit I am thinking 
> about.
> It is not easy to tell the name of the downloaded file from the URL.
>
> I could have a file with all the URLs listed and work through each line in 
> turn.
> But then I would have to rewrite the file (say, with the previously-successful
> lines commented out) as I go.
>

Hmm. The easiest way would be to have something from the URL in the
file name. For instance, you could hash the URL and put the first few
digits of the hash in the file name, so
http://some.domain.example/some/path/filename.html might get saved
into "a39321604c - filename.html". That way, if you want to know if
it's been downloaded already, you just hash the URL and see if any
file begins with those digits.

Would that kind of idea work?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Suggestions on mechanism or existing code - maintain persistence of file download history

Reply via email to