I'd like to put a placeholder in Phab or Trello for this work, but please
help me out because I am still new....could someone help summarize the
context and what we are trying solve?

Also, would this go into Research, Eng or Refinery backlog?

Thanks!

On Thu, Dec 11, 2014 at 1:52 PM, Dan Andreescu <dandree...@wikimedia.org>
wrote:

> Bikeshed indeed -- this seems to be a project that could soak up a lot of
>> time. I'm with Aaron -- let's be consistent with the principle of least
>> surprise and use an existing identifier. The database seems as good a place
>> to start as any.
>>
>
> I disagree that this is bikeshedding.  The reason people look back after a
> year at a project and go "yuck, wish we named those things differently" is
> precisely because this type of effort is incorrectly labeled as
> bikeshedding.  We are *not* talking a bout a bike shed.  We're talking
> about a schema that will hopefully serve hundreds or thousands of
> researchers and our own growing team (I'm considering both Aaron's revision
> schema and the data warehouse schema).
>
>
>> So, I'm not sure that is necessary for the term "identifier" which I
>>> assume that "id" abbreviates.  Regardless it seems clear that these numbers
>>> are thought of as primary identifiers of a namespace that can otherwise
>>> have many names.  For example, see this snippet from the result of this
>>> query:
>>> http://es.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases&format=jsonfm
>>>
>>> "1": {
>>>                 "id": 1,
>>>                 "case": "first-letter",
>>>                 "*": "Discusi\u00f3n",
>>>                 "subpages": "",
>>>                 "canonical": "Talk"
>>>
>>> },
>>>
>>
> Fair enough, namespace_id seems like a good name for a property of a page
> entity then.
>
>
>> I don't see us getting rid of legacy naming right now.  I don't see how
>>> adding a new name helps anyone -- veteran or newbie.
>>>
>>
> I disagree that we have to care at all about legacy names.  I disagree
> that the principle of least surprise leads one to prefer database names.
> To me, that's more surprising because database conventions have no place in
> json.  If I was new to this world, it also seems more surprising.  If I was
> an existing user, I don't think I would be at all surprised as long as the
> names were clear and the schemas well documented.  This page_namespace_id
> is a bit of a red herring because we have harder things to tackle like
> "restrictions".
>
>
>> However, if we were to develop a mapping of canonical names and pursue
>>> that from here forward, we might be able to move beyond the old names for
>>> the most important data sources in a few of years.   However, I'm skeptical
>>> that we'll ever be able to change any production DB field names.
>>>
>>
> We need not be tied to the production db names.  The data warehouse effort
> is trying to transform a confusing schema riddled with idiosyncrasies into
> a clean, easy to understand, and easy to work with, dimensional model.  In
> the process, we are also trying to capture changes to objects over time so
> we are greatly expanding the usefulness of the database.  Good naming
> matters and we should take our time.
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to