Mitar added a comment.
I learned today that Wikipedia has a nice approach with a multistream bz2 archive <https://dumps.wikimedia.org/enwiki/> and additional file with an index, which tells you an offset into the bz2 archive you have to decompress as a chunk to access particular page. Wikidata could do the same, just for items and properties. This would allow one to extract only those entities they care about. Mutlistream also enables one to decompress parts of the file in parallel on multiple machines, by distributing offsets between them. Wikipedia also provides the same multistream archive as multiple files so that one can even easier distribute the whole dump over multiple machines. I like that approach. TASK DETAIL https://phabricator.wikimedia.org/T115223 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mitar Cc: Addshore, Mitar, abian, JanZerebecki, Hydriz, hoo, Halfak, NealMcB, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Svick, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org