[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread ArielGlenn
ArielGlenn added a subscriber: hoo. ArielGlenn added a comment. I am proactively adding @hoo as he can provide some insight and perhaps tag others as well. TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread Sascha
Sascha added a comment. Hm, good point. Could the dumps be made consistent? Maybe like this: Before starting a dump, find the current last revision; pass this cut-off revision ID to the dumping shards; change the dump-producing code

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread Mitar
Mitar added a comment. Are you sure `lastrevid` works like that for the whole dump? I think that dump is made from multiple shards, so it might be that `lastrevid` is not consistent across all items? TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread Sascha
Sascha added a comment. To find the timestamp of the last Wikidata change that went into a dump file, couldn’t one — while processing the dump — extract the entity and revision ID with the highest `lastrevid` value in the entire dump, and then retrieve the corresponding `modified` timestamp

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-23 Thread Mitar
Mitar added a comment. I realized I have exactly the same need as poster on StackOveflow: get a dump and then using real-time feed to keep it updated. But you have to know where to start with the real-time feed through EventStreams, using historical consumption

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-21 Thread Mitar
Mitar added a comment. Personally, I would love to have for each item in the dump a timestamp when it was created and a timestamp when it was last modified. Related: https://phabricator.wikimedia.org/T278031 TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-03-21 Thread Mitar
Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mitar Cc: Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, jannee_e, Akuckartz,