>
> dumps.wikimedia.org/analytics

Does "analytics" mean anything in this context?  Why not aim for something
like dumps.wikimedia.org/views?

-Aaron

On Thu, Feb 11, 2016 at 9:39 AM, Oliver Keyes <oke...@wikimedia.org> wrote:

> It's also the International Day of Women and Girls in Science!
>
> Sounds like a good summary.
>
> On 11 February 2016 at 07:31, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
> > I almost revived this thread on Mardi Gras, but I didn't want to be
> known as
> > The Holiday Crusher so I waited.  Today is relatively safe [1] :)
> >
> > Ok, there are three main points being made:
> >
> > 1. deprecating the old datasets
> > 2. liberating ourselves from the old format
> > 3. reorganizing the dumps page
> >
> > My thoughts on each:
> >
> > 1. I agree with Dario and Erik's points.  Let's keep the old files
> around,
> > but stop generating new files in May 2016.  To explain this, we'll make a
> > new section called "Deprecated" and put links to the pagecounts-*
> datasets
> > there.
> >
> > 2. I wasn't expecting to talk about format, but it makes sense because,
> for
> > example, Erik's dataset is just a pivoted format.  So, we could have a
> > section for the Pageview datasets, with links for each format we already
> > have: Domasz archive format, Erik Z compressed format.  We could then
> add a
> > new format that's easier to understand and could even include some of the
> > data we expose via the pageview API.  But from an organizational point of
> > view, treating "format" as a separate concept from "dataset" will be an
> > improvement.
> >
> > 3. I think it's time we had our own page instead of just being under
> > dumps.wikimedia.org/other.  Let's have dumps.wikimedia.org/analytics and
> > link to it from both the main dumps page and /other.  The separation will
> > make it easier to reference other places we have data static file dumps,
> > like datasets.wikimedia.org.  And it'll also make it easier to add
> links and
> > references to how this work is being done and where people can interact
> with
> > us or help us.
> >
> >
> > I hope I captured what everyone was saying.  If there aren't any
> objections,
> > I'll send a list of next steps needed to accomplish this, and get to
> work :)
> >
> >
> >
> > [1] Today is Be Electrific Day, Get Out Your Guitar Day, Grandmother
> > Achievement Day, National Don't Cry Over Spilled Milk Day, National
> > Inventors' Day, National Make a Friend Day, National Peppermint Patty
> Day,
> > National Shut-in Visitation Day, Pro Sports Wives Day, Promise Day,
> > Satisfied Staying Single Day, White Shirt Day
> >
> >
> > On Wed, Jan 6, 2016 at 7:13 PM, Dario Taraborelli
> > <dtarabore...@wikimedia.org> wrote:
> >>
> >> Erik's proposal sounds very reasonable.
> >>
> >> There might be some confusion about what we mean by "keeping the old
> >> datasets for longitudinal analysis". No one is planning to remove the
> old
> >> static dumps, just stop generating them/maintaining them going forward.
> >>
> >> I also want to echo Nuria regarding the human cost of maintaining
> multiple
> >> definitions. I just finished preparing a response to a reporter who was
> >> asking about project-level mobile PV data and I was not immediately
> able to
> >> answer if a specific data source I wanted to cite was using the old or
> new
> >> definition (until I talked to Dan and we looked up together a gerrit
> patch).
> >>
> >> How do people feel about turning off the generation of old dumps by May
> >> 2016, i.e. one year after having the two series of data available in
> >> parallel?
> >>
> >>
> >>
> >> On Wed, Jan 6, 2016 at 10:17 AM, Nuria Ruiz <nu...@wikimedia.org>
> wrote:
> >>>
> >>> >As I just mentioned to Dan in a private email conversation, keeping
> >>> > datasets even with imperfect measurements is important. Particularly
> for
> >>> > longitudinal analysis.
> >>> Have in mind that maintaining these old dumps is not "free", it causes
> a
> >>> lot of confusion and maintenance costs to have several pageview
> definitions
> >>> around. We get a lot of questions about spiky-ness of old definition
> and we
> >>> need to maintain software that generates the old files thus, we think
> is
> >>> reasonable to ask our users to transition to the new definition and
> >>> eventually (in a period of months) turn off the old dumps.
> >>>
> >>> On Thu, Dec 24, 2015 at 6:12 AM, Maurice Vergeer <m.verg...@maw.ru.nl>
> >>> wrote:
> >>>>
> >>>> Dear all,
> >>>>
> >>>> As I just mentioned to Dan in a private email conversation, keeping
> >>>> datasets even with imperfect measurements is important. Particularly
> for
> >>>> longitudinal analysis.
> >>>>
> >>>> Also, from what I understand - me being a newby here - is that the
> data
> >>>> are stored in separate files. Dan suggested reordering the page into
> >>>> categories. Maybe, another option is to create more extensive
> datasets with
> >>>> more different measurements in a single datafile. On the other hand,
> the
> >>>> files would become even bigger in size. Not an issue for mee, but for
> users
> >>>> in the field accesibility (dowlnload bandwidth) could become an issue.
> >>>>
> >>>> my two cents
> >>>> Maurice
> >>>>
> >>>>
> >>>> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.d...@gmail.com>
> wrote:
> >>>>>
> >>>>> Nothing against this approach!
> >>>>>
> >>>>> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu
> >>>>> <dandree...@wikimedia.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.d...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Dan,
> >>>>>>> Happy holidays!
> >>>>>>> Good idea to combine these datasets! However we have one more
> dataset
> >>>>>>> by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/
> >>>>>>
> >>>>>>
> >>>>>> And that's an important one!  But I was thinking we could
> re-organize
> >>>>>> the page into categories.  Erik's dataset could go into a
> "processed data"
> >>>>>> category or something like that.  The three I wanted to talk about
> on this
> >>>>>> thread are just the raw data.
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Analytics mailing list
> >>>>>> Analytics@lists.wikimedia.org
> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Thank you.
> >>>>>
> >>>>> Alex Druk
> >>>>> alex.d...@gmail.com
> >>>>> (775) 237-8550 Google voice
> >>>>>
> >>>>> _______________________________________________
> >>>>> Analytics mailing list
> >>>>> Analytics@lists.wikimedia.org
> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> ________________________________________________
> >>>> Maurice Vergeer
> >>>> To contact me, see http://mauricevergeer.nl/node/5
> >>>> To see my publications, see http://mauricevergeer.nl/node/1
> >>>> ________________________________________________
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> Analytics@lists.wikimedia.org
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >>
> >> --
> >>
> >>
> >> Dario Taraborelli  Head of Research, Wikimedia Foundation
> >> wikimediafoundation.org • nitens.org • @readermeter
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to