Mitar created this task.
Mitar added projects: Wikidata, Dumps-Generation.
Restricted Application added a project: wdwb-tech.

TASK DESCRIPTION
  My understanding is that dumps are currently in fact already produced by 
multiple shards and then combined into one file. I wonder why simply multiple 
files are not kept because that would also make it easier to process dumps in 
parallel over multiple files. There are already no guarantees on the order of 
documents in dumps. Currently this is hard because it is hard to split a 
compressed file into multiple chunks without decompressing the file first (and 
then potentially recompressing chunks back). So, given that dump size has grown 
through time, maybe it is time that it is provided in multiple files, each file 
at some reasonable maximum size?

TASK DETAIL
  https://phabricator.wikimedia.org/T278204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mitar
Cc: Mitar, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to