Xover added a comment.

  Possibly relevant to keep in mind when debugging this; PDF and DjVu files are 
not treated as just blobs of binary data on disk with just a little metadata in 
the db. Last I heard the hidden text layer gets extracted from the file, 
wrapped in an XML structure, and stuffed into a field in the database that is 
sized for metadata and can overflow. The more pages in a PDF, the more text, 
the more likely that's going to start playing a role. If this happens before 
the "upload" is considered complete, it might be a factor.
  
  It is also a known issue that thumbnail generation for PDFs (which uses 
ghostscript iirc) is bog slow (like 10+ seconds for a single page). Depending 
on where in the process this happens it may be a relevant factor (the File 
information page shows at least two thumbnails; but possibly only generated on 
demand when the information page is actually loaded / those thumbs are 
requested).

TASK DETAIL
  https://phabricator.wikimedia.org/T254459

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Xover
Cc: Xover, eprodromou, AntiCompositeNumber, ShakespeareFan00, Dvorapa, 
Aklapper, pywikibot-bugs-list, Fae, Naike, CBogen, Biazzzzoo, Philoserf, 
CptViraj, WDoranWMF, Chambersjay, DannyS712, JKSTNK, Jony, Amorymeltzer, 
Conradrock, Ramsey-WMF, Sethakill, dg711, Poyekhali, Agabi10, Taiwania_Justo, 
Pchelolo, Ixocactus, Wong128hk, Hydriz, El_Grafo, Dinoguy1000, jayvdb, Anomie, 
Steinsplitter, Rxy, Jay8g, fgiunchedi, Keegan, Legoktm, Tgr
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to