Xover added a comment.
Possibly relevant to keep in mind when debugging this; PDF and DjVu files are not treated as just blobs of binary data on disk with just a little metadata in the db. Last I heard the hidden text layer gets extracted from the file, wrapped in an XML structure, and stuffed into a field in the database that is sized for metadata and can overflow. The more pages in a PDF, the more text, the more likely that's going to start playing a role. If this happens before the "upload" is considered complete, it might be a factor. It is also a known issue that thumbnail generation for PDFs (which uses ghostscript iirc) is bog slow (like 10+ seconds for a single page). Depending on where in the process this happens it may be a relevant factor (the File information page shows at least two thumbnails; but possibly only generated on demand when the information page is actually loaded / those thumbs are requested). TASK DETAIL https://phabricator.wikimedia.org/T254459 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Xover Cc: Xover, eprodromou, AntiCompositeNumber, ShakespeareFan00, Dvorapa, Aklapper, pywikibot-bugs-list, Fae, Naike, CBogen, Biazzzzoo, Philoserf, CptViraj, WDoranWMF, Chambersjay, DannyS712, JKSTNK, Jony, Amorymeltzer, Conradrock, Ramsey-WMF, Sethakill, dg711, Poyekhali, Agabi10, Taiwania_Justo, Pchelolo, Ixocactus, Wong128hk, Hydriz, El_Grafo, Dinoguy1000, jayvdb, Anomie, Steinsplitter, Rxy, Jay8g, fgiunchedi, Keegan, Legoktm, Tgr
_______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
