Fae added a comment.
Sticking the upload script inside an infinite loop, allowing the upload to
break on any first API error seems a practical but bad //brute-force// work
around. However this is incredibly slow, wasteful of processing time and
bandwidth, and not a solution for the vast majority for Commons contributors.
Here's what I found:
1. The most common pattern is an indefinite series of retries, which shortly
after over 1 hour give up with `Maximum retries attempted without success.` In
actuality, the file may have been uploaded during the retries but the API
failed to return a success message, and failed to return a 'duplicate' message
when trying to re-upload the file it just uploaded.
2. Second most likely is successful upload on first run, but this decreases
in likelihood with the size of the file. In the example of
"catalogofcopyri12libr" there have been around 10 attempts to upload the 185MB
file, and it has yet to succeed (most of these attempts taking an hour to
time-out).
3. Third most likely is after a series of retries, the API returns a message
like `duplicate: Uploaded file is a duplicate of
[u'Catalog_of_Copyright_Entries_1977_Books_and_Pamphlets_Jan-June.pdf'].` Which
though a technically correct message, is a symptom of failing to return a
successful upload message.
4. Lastly is a `http-curl-error`, which seems to be the InternetArchive
falling over during these mass multiple requests.
Here's an example of one of the parallel looping tasks we are running. This
report of InternetArchive idents is after the second attempts, so the files in
bold have timed-out twice, while the others succeeded on first or second
attempt:
- ** //2 catalogofcopyri12libr 189.8M// **
- **//9 catalogofcopyrig17libr 173.9M// **
- **//10 catalogofco11libr 217.7M //**
- ** //11 catalogofcop13libr 136.6M //**
- ** //13 catalogofcopyr11libr 199.4M //**
- 17 1977booksandpamp33112libr 143.1M
- 18 1977musicjanjune33152libr 106.8M
- ** //19 1977booksandpamp33112library 128.1M// **
- 29 1977booksandpamphle33111libr 128.8M
- 30 1976worksofartja330711libr 83.3M
- 33 catalogofcopyrig33051libr 91.6M
- 37 catalogofcopyrig33012library 134.4M
- 38 catalogofcopyrig33011library 122.1M
- 39 catalogofcopyrig33012libr 139.5M
- 40 catalogofcopyrig33011libr 128.2M
- 41 catalogofcopyrig33051library 88.3M
- 43 catalogofcopyrig33052libr 114.5M
- 44 catalogofcopyrig33052library 109.9M
- 45 catalogofcopyrig32912lib 131.4M
- 49 catalogofcopyrig32952libr 103.3M
One conclusion would be that PDFs over 125MB are highly unlikely to be
successfully uploaded by ordinary volunteers, and significantly larger PDFs
cannot be batch uploaded in any practical way by anyone at this time.
TASK DETAIL
https://phabricator.wikimedia.org/T254459
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Fae
Cc: Dvorapa, Aklapper, pywikibot-bugs-list, Fae, JohnsonLee01, SHEKH, Dijkstra,
CBogen, Biazzzzoo, Philoserf, Khutuck, Zkhalido, Viztor, Wenyi, Tbscho, MayS,
Mdupont, JJMC89, Poyekhali, Altostratus, Taiwania_Justo, Avicennasis,
Ixocactus, Wong128hk, mys_721tx, Hydriz, El_Grafo, Dinoguy1000, jayvdb, Masti,
Alchimista, Steinsplitter, Rxy, Jay8g, Keegan
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs