| eflyjason added a comment. |
I assume that with
If the same file already exists in the folder :
- If the filename doesn't contain latest, it shouldn't be downloaded again.
- Endif, add the current date as a suffix to the name
the dump file can be re-downloaded every day?
However, as stated on https://dumps.wikimedia.org,
These snapshots are provided at the very least monthly and usually twice a month.
Which means the user will usually still be downloading the same file the next day.
One solution is T183789: download_dump.py: Support for "date specified" dumps. But we will have to make sure when the user doesn't specify the -revision, the script will first find the latest date from https://dumps.wikimedia.org/frwiki/, then download the latest file in that folder (e.g. https://dumps.wikimedia.org/frwiki/20171220/frwiki-20171220-abstract.xml.gz).
After download, the script will have to link a -latest file (e.g. frwiki-latest-abstract.xml.gz) to that downloaded file (e.g. frwiki-20171220-abstract.xml.gz) for other scripts' automated uses.
Then when everytime user run the script without the -revision, check if the file frwiki-latest-abstract.xml.gz linked to contains the same date as the latest date on website.
If the date is not equal, download the new file and relink it to the -latest file.
However, one problem would be how can we (or do we have to) manage all those non-latest files downloaded before.
Or is there any simpler solution?
Cc: rafidaslam, divadsn, eflyjason, pywikibot-bugs-list, Aklapper, Xqt, zhuyifei1999, jayvdb, siebrand, Zoranzoki21, Framawiki, Bright1055, Toppole69, Mine0901, Jayprakash12345, Magul, Tbscho, MayS, Beeyan, Mdupont, JJMC89, MtDu, D3r1ck01, Avicennasis, Dalba, Masti, Alchimista, Rxy
_______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
