Thank you all for your thoughtful opinions.

Since people want to know the top pages over an arbitrary time period, we
think Druid would be the best back-end for that kind of query.  But we're
not going to push that for the first release.  It's very useful to know
that's the consensus, we can now start talking to Jaime Crespo about Druid
/ alternatives, make plans, etc.  Until then, the first release is going to
have the top endpoint that Joseph wrote about.  This is easy to
pre-aggregate and dump into Cassandra.  Also, the /v1/pageviews/ prefix is
going to be on all the endpoints we launch with, because these are
endpoints in a "pageviews" RESTBase module.  So we'll have:

/v1/pageviews/top/{project}/{access}/{year}/{month}/{day}

for now, with {month} and {day} being optional parameters.  This will give
you the top pageviews for the selected calendar date.  And as soon as we
can, we'll have:

/v1/pageviews/top/{project}/{access}/from/{start}{/end}

As proposed by Gabriel, with {start} and {end} taking both full dates and
"now"-relative negative integers.

The initial endpoint we launch won't have hourly resolution, that seems
like too much data to pre-aggregate.  But we'll see how Druid handles very
specific dates (should be fine) and make that a feature in the second
version.  We'll have to look into the privacy implications of short time
ranges, like an hour.



On Mon, Sep 14, 2015 at 10:18 AM, Andrew Otto <ao...@wikimedia.org> wrote:

> Also, maybe *top-articles* instead of *top*, to avoid naming collision in
> the future?
>
> +1 for prefixing whatever paths you are doing now with something
> relevant.  I sense that there might be more than just pageview data in the
> future.
>
> /pageviews/top/…?
>
>
>
>
> On Sep 11, 2015, at 18:38, Marcel Ruiz Forns <mfo...@wikimedia.org> wrote:
>
> +1 Adam
>
> Also, maybe *top-articles* instead of *top*, to avoid naming collision in
> the future?
>
> On Sat, Sep 12, 2015 at 12:27 AM, Adam Baso <ab...@wikimedia.org> wrote:
>
>> I'd be in favor of both. Maybe with a little tweak to the pathing:
>>
>> /top/{project}/{access}/days/{days-in-the-past}
>>
>>  /top/{project}/{access}/range/{start}/{end}
>>
>> with "days" or "range" maybe being earlier in the forward slash separated
>> spec if it doesn't read well semantically.
>>
>>
>> On Fri, Sep 11, 2015 at 3:14 PM, Dan Andreescu <dandree...@wikimedia.org>
>> wrote:
>>
>>> It wouldn't be too hard to offer both, but I'm thinking it might be
>>> confusing for a consumer.  I think ultimately the decision should be up to
>>> the people using this data, because the use cases are fairly different for
>>> each form.  If people ask for both, we'll do both.
>>>
>>> Leila, we'd love to have page_ids as well, but we'd have to block the
>>> release on a bigger effort to reliably mirror mediawiki databases in Hadoop
>>> for processing, so we'll probably punt on that for now.  But we have more
>>> than many reasons to work on that sooner than later.
>>>
>>> On Fri, Sep 11, 2015 at 6:09 PM, Gabriel Wicke <gwi...@wikimedia.org>
>>> wrote:
>>>
>>>> The former might be slightly easier to cache, and can be linked to /
>>>> pulled in statically, without a need to dynamically construct a URL. Would
>>>> it be hard to offer both?
>>>>
>>>> On Fri, Sep 11, 2015 at 3:06 PM, Leila Zia <le...@wikimedia.org> wrote:
>>>>
>>>>> It's getting exciting. :-)
>>>>>
>>>>> I'd go with choice 2 since it gives more control to the user while
>>>>> offering what the user can get through choice 1 as well.
>>>>>
>>>>> Question: will we get page_ids or page_titles or both? It's good to
>>>>> have both.
>>>>>
>>>>> Leila
>>>>>
>>>>> On Fri, Sep 11, 2015 at 3:00 PM, Dan Andreescu <dandreescu@wikimedia
>>>>> .org> wrote:
>>>>>
>>>>>> Hi everyone.  End of quarter is rapidly approaching and I wanted to
>>>>>> ask a quick question about one of the endpoints we want to push out.  We
>>>>>> want to let you ask "what are the top articles" but we're not sure how to
>>>>>> structure the URL so it's most useful to you.  Here are the choices:
>>>>>>
>>>>>> Choice 1. /top/{project}/{access}/{days-in-the-past}
>>>>>>
>>>>>> Example: top articles via all en.wikipedia sites for the past 30
>>>>>> days: /top/en.wikipedia/all-access/30
>>>>>>
>>>>>>
>>>>>> Choice 2. /top/{project}/{access}/{start}/{end}
>>>>>>
>>>>>> Example: top articles via all en.wikipedia sites from June 12th, 2014
>>>>>> to August 30th, 2015: /top/en.wikipedia/all-access/2014-06-12/2015-08-30
>>>>>>
>>>>>>
>>>>>> (in all of those,
>>>>>>
>>>>>> * {project} means en.wikipedia, commons.wikimedia, etc.
>>>>>> * {access} means access method as in desktop, mobile web, mobile app
>>>>>>
>>>>>> )
>>>>>>
>>>>>> Which do you prefer?  Would any other query style be useful?
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Gabriel Wicke
>>>> Principal Engineer, Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to