[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-06-13 Thread Yurik
Yurik added a comment.Yep, sounds good.TASK DETAILhttps://phabricator.wikimedia.org/T120452EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: YurikCc: RobLa-WMF, TheDJ, Eloy, Jdforrester-WMF, brion, ThurnerRupert, intracer, TerraCodes, Pokefan95, gerritbot,

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-25 Thread Yurik
Yurik added a comment. Another alternative is to actually reuse `.tabular` for storing this data, instead of creating a custom format. Also, .tabular could benefit from `"ordered"` meta tag to automatically resort values on save. No sorting is allowed on the fields of "localized" data

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-25 Thread Yurik
Yurik added a comment. JsonConfig can easily allow us to store all allowed licenses as a "config" page - a JSON with a custom schema for licenses. This way we could have a wiki page named **Config:Licenses.json** (bikesheding is welcome): { // License ID "CC0-1.0": {

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-25 Thread matmarex
matmarex added a comment. In https://phabricator.wikimedia.org/T120452#2232451, @Yurik wrote: > I found MediaWiki:Licenses - I am not sure what it is used for. They're used for the licensing dropdown at Special:Upload

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread Yurik
Yurik added a comment. I found MediaWiki:Licenses - I am not sure what it is used for. There are translations in the subpages. Also, there is a list in preferences

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread brion
brion added a comment. I'm a fan of "inheritMetadata" :) TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Yurik, brion Cc: TheDJ, Eloy, Jdforrester-WMF, brion, ThurnerRupert, intracer,

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread Yurik
Yurik added a comment. Name game: "inheritFrom", "deriveFrom", "ref", "link", "metadata", ...? Still open question: where to get the license ids and their descriptions :) TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread brion
brion added a comment. Side note -- the referenced source data: page should get recorded as a template link in the link tables maybe? Or a file link at least. Some kind of reference. :) TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread brion
brion added a comment. @yurik I like that -- maybe generalize it as a metadata inheritance model; anything not filled out in the local json is taken from the referenced .tabular item. TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread Yurik
Yurik added a comment. An extra feature could be `"headersRef": "Some other table.tabular"` instead of "headers" and "titles", allowing headers to be defined in another table. This way many identically structured tables can benefit from the shared localization. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread Yurik
Yurik added a comment. @jeumerus, adding a tag to the structured content (json) is not as obvious as for free form wiki markup. We could interpret some meta fields as wiki markup... So far this is the the structure I'm going for: { "license": "licence-id", // do we have

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-23 Thread JEumerus
JEumerus added a comment. On Commons, usually a subpage of "Commons:Deletion requests/Page under discussion" is used for a deletion request, with a tag being placed on the page being discussed. Same on English Wikipedia, save for different names. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread Yurik
Yurik added a comment. Can the talk pages be used for deletion requests? TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Yurik Cc: TheDJ, Eloy, Jdforrester-WMF, brion, ThurnerRupert, intracer,

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread TheDJ
TheDJ added a comment. And how do you request deletion ? One idea is to use a multipart contenthandler. The Page namespace of wikisource does this. That way, you can have a part wikitext and a part json. But also potentially adds to confusion TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread Yurik
Yurik added a comment. More important semi-bikeshed questions: - How should we store licenses? Is there a license ID of any sorts? I wouldn't want free form license field text if possible. - Are there any other metadata fields required? - Should we support datetime fields in

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread Jdforrester-WMF
Jdforrester-WMF added a comment. In https://phabricator.wikimedia.org/T120452#2231244, @brion wrote: > @Jdforrester-WMF data cube sounds awesome. :D someday later! Yessir. :-) TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread brion
brion added a comment. @yurik: The W3C CSV on the Web working group's metadata model recommendation refers to "columns" with attributes for "name" and "titles" (plural, allowing alternates or per-language variants), with similar recommended character restrictions on "name" for ease of

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread Jdforrester-WMF
Jdforrester-WMF added a comment. In https://phabricator.wikimedia.org/T120452#2230912, @Yurik wrote: > I'm totally ok to bikeshed about the naming: > > - for ID, it will be a list of strings named: "headers", "ids", "columns", "header_id", ... > - for localized column name, it's a

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread Yurik
Yurik added a comment. I'm totally ok to bikeshed about the naming: - for ID, it will be a list of strings named: "headers", "ids", "columns", "header_id", ... - for localized column name, it's a list of objects, each object having (language id -> string). We can call it "columns",

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-22 Thread brion
brion added a comment. Re headers -- yeah need to distinguish between header labels (i18nable text) and column ids (identifiers for programs). As long as capability is there I don't mind the terms used, sounds like you're already working on that :) TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-21 Thread Jdforrester-WMF
Jdforrester-WMF added a comment. In https://phabricator.wikimedia.org/T120452#2226245, @brion wrote: > Population of every US census place for every 10-year census since 1790. That's probably a lot. Now add more columns for various breakdown information. Argh. :-) The 'right'

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-21 Thread brion
brion added a comment. Side note: headers are rejected if they contain spaces. That seems odd? TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Yurik, brion Cc: Jdforrester-PERSONAL,

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-21 Thread brion
brion added a comment. In https://phabricator.wikimedia.org/T120452#2227240, @matmarex wrote: > As I understand these are stored as regular MediaWiki pages now, so they have a maximum length of 2 MB. Even naive queries pulling the whole thing into memory would be fast enough at these

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-21 Thread matmarex
matmarex added a comment. As I understand these are stored as regular MediaWiki pages now, so they have a maximum length of 2 MB. Even naive queries pulling the whole thing into memory would be fast enough at these scales. If we want to think about performance for large data, we should

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-20 Thread brion
brion added a comment. Pulling individual data items out of large lists; pulling relevant columns in order to sum them; pulling or updating a small number of cells during editing; sub setting a large data set to graph the subset; sub setting a large data set to perform operations on it Ina

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-20 Thread Yurik
Yurik added a comment. @brion, could you think of use cases for partial data reads? I think it will be mostly "draw data as a wiki list or a table with some magical highlighting/string concatenation/...", or "draw a graph with all the data". That's why at this point I simply provide Lua's

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-20 Thread brion
brion added a comment. Couple quick notes: - pretty cool. :) - I worry about efficiency of storage and queries; for small tables json blobs are fine but for large data sets thisll get extremely verbose, and loading/saving small updates to a large table will get very slow. Consider

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-20 Thread ThurnerRupert
ThurnerRupert added a comment. just to add another example which might help the zillion of sports results on wikipedia. taken an example of kicker.de: - basedata: http://www.kicker.de/news/fussball/bundesliga/vereine/1-bundesliga/2015-16/bayern-muenchen-14/vereinstermine.html - league

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-04-19 Thread Yurik
Yurik added a comment. merged, please take a look at https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Tabular_data_storage_for_Commons.21 @matmarex , JsonConfig is an extension I built a while ago for storing structured data on wiki and making it available from another

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-10 Thread Yurik
Yurik added a comment. @Bawolff Lua supports mw.text.jsonDecode(), which is great for these pages - if we keep TSV in a schema like { "columns": ... "data":[ [1,2,3], [4,5,6], ... ] } TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-10 Thread ekkis
ekkis added a comment. > Lua tables I know nothing of that technology but if it can be parsed readily in any environment, that would be fine, otherwise TSVs or JSON are very common formats TASK DETAIL https://phabricator.wikimedia.org/T120452 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-10 Thread Bawolff
Bawolff added a comment. > I think there's a very valuable use-case in data sets well below the 100mb range That was just a general example, because it wasn't clear to me if this was being sold as a general purpose solution for all data-sets. 4 mb is the actual limit for wikipages, and

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-10 Thread ekkis
ekkis added a comment. I think there's a very valuable use-case in data sets well below the 100mb range, particularly, all manner of reference data. think of a list of countries, cities within countries, telephone area codes. the names of HTML entities, colour name/codes, et cetera... or

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-10 Thread Bawolff
Bawolff added a comment. > We already have pretty robust support for pages containing text. Not if your data-set is 100 mb big. I think this bug maybe needs some scope clarification before deciding which approach is best. > I think the best way forward is to post on commons

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-09 Thread Yurik
Yurik added a comment. I think the best way forward is to post on commons proposing to add a new namespace there. Also, to email all the relevant mailing lists, including various ambasadors, etc, and post on wikitech news for the next week, all linking to the proposal to host it on commons,

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-09 Thread Milimetric
Milimetric added a comment. I'd like to help with this discussion, but not sure where to start. Commons - seems like a decent fit, but not sure what the process is to get approval Wikidata - seems like the best fit, but the team managing it seems to disagree strongly, so do we just

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-08 Thread Yurik
Yurik added a comment. @MZMcBride agree that this is highly needed, especially now with graphs. Deciding which wiki to host it seems to be the hardest problem (i can easily do the rest with content handler in jsonconfig extension).

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2016-03-08 Thread MZMcBride
MZMcBride added a comment. In https://phabricator.wikimedia.org/T120452#1924101, @MarkTraceur wrote: > - It should be a Multimedia team project, though we can't take it on for a while, and I don't believe we need to, because this isn't exactly high priority for anyone. This

[Wikidata-bugs] [Maniphest] [Commented On] T120452: Allow tabular datasets on Commons (or some similar central repository) (CSV, TSV, JSON, XML)

2015-12-23 Thread matmarex
matmarex added a comment. MediaWiki generally should already support uploading files of these types, Wikimedia wikis (or just Commons, or just Wikidata) would just have to be configured to accept them. CSV/TSV and JSON are very simple plaintext formats and could easily be just allowed. XML is