I'd like to put a placeholder in Phab or Trello for this work, but please help me out because I am still new....could someone help summarize the context and what we are trying solve?
Also, would this go into Research, Eng or Refinery backlog? Thanks! On Thu, Dec 11, 2014 at 1:52 PM, Dan Andreescu <dandree...@wikimedia.org> wrote: > Bikeshed indeed -- this seems to be a project that could soak up a lot of >> time. I'm with Aaron -- let's be consistent with the principle of least >> surprise and use an existing identifier. The database seems as good a place >> to start as any. >> > > I disagree that this is bikeshedding. The reason people look back after a > year at a project and go "yuck, wish we named those things differently" is > precisely because this type of effort is incorrectly labeled as > bikeshedding. We are *not* talking a bout a bike shed. We're talking > about a schema that will hopefully serve hundreds or thousands of > researchers and our own growing team (I'm considering both Aaron's revision > schema and the data warehouse schema). > > >> So, I'm not sure that is necessary for the term "identifier" which I >>> assume that "id" abbreviates. Regardless it seems clear that these numbers >>> are thought of as primary identifiers of a namespace that can otherwise >>> have many names. For example, see this snippet from the result of this >>> query: >>> http://es.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases&format=jsonfm >>> >>> "1": { >>> "id": 1, >>> "case": "first-letter", >>> "*": "Discusi\u00f3n", >>> "subpages": "", >>> "canonical": "Talk" >>> >>> }, >>> >> > Fair enough, namespace_id seems like a good name for a property of a page > entity then. > > >> I don't see us getting rid of legacy naming right now. I don't see how >>> adding a new name helps anyone -- veteran or newbie. >>> >> > I disagree that we have to care at all about legacy names. I disagree > that the principle of least surprise leads one to prefer database names. > To me, that's more surprising because database conventions have no place in > json. If I was new to this world, it also seems more surprising. If I was > an existing user, I don't think I would be at all surprised as long as the > names were clear and the schemas well documented. This page_namespace_id > is a bit of a red herring because we have harder things to tackle like > "restrictions". > > >> However, if we were to develop a mapping of canonical names and pursue >>> that from here forward, we might be able to move beyond the old names for >>> the most important data sources in a few of years. However, I'm skeptical >>> that we'll ever be able to change any production DB field names. >>> >> > We need not be tied to the production db names. The data warehouse effort > is trying to transform a confusing schema riddled with idiosyncrasies into > a clean, easy to understand, and easy to work with, dimensional model. In > the process, we are also trying to capture changes to objects over time so > we are greatly expanding the usefulness of the database. Good naming > matters and we should take our time. > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics