> > dumps.wikimedia.org/analytics
Does "analytics" mean anything in this context? Why not aim for something like dumps.wikimedia.org/views? -Aaron On Thu, Feb 11, 2016 at 9:39 AM, Oliver Keyes <oke...@wikimedia.org> wrote: > It's also the International Day of Women and Girls in Science! > > Sounds like a good summary. > > On 11 February 2016 at 07:31, Dan Andreescu <dandree...@wikimedia.org> > wrote: > > I almost revived this thread on Mardi Gras, but I didn't want to be > known as > > The Holiday Crusher so I waited. Today is relatively safe [1] :) > > > > Ok, there are three main points being made: > > > > 1. deprecating the old datasets > > 2. liberating ourselves from the old format > > 3. reorganizing the dumps page > > > > My thoughts on each: > > > > 1. I agree with Dario and Erik's points. Let's keep the old files > around, > > but stop generating new files in May 2016. To explain this, we'll make a > > new section called "Deprecated" and put links to the pagecounts-* > datasets > > there. > > > > 2. I wasn't expecting to talk about format, but it makes sense because, > for > > example, Erik's dataset is just a pivoted format. So, we could have a > > section for the Pageview datasets, with links for each format we already > > have: Domasz archive format, Erik Z compressed format. We could then > add a > > new format that's easier to understand and could even include some of the > > data we expose via the pageview API. But from an organizational point of > > view, treating "format" as a separate concept from "dataset" will be an > > improvement. > > > > 3. I think it's time we had our own page instead of just being under > > dumps.wikimedia.org/other. Let's have dumps.wikimedia.org/analytics and > > link to it from both the main dumps page and /other. The separation will > > make it easier to reference other places we have data static file dumps, > > like datasets.wikimedia.org. And it'll also make it easier to add > links and > > references to how this work is being done and where people can interact > with > > us or help us. > > > > > > I hope I captured what everyone was saying. If there aren't any > objections, > > I'll send a list of next steps needed to accomplish this, and get to > work :) > > > > > > > > [1] Today is Be Electrific Day, Get Out Your Guitar Day, Grandmother > > Achievement Day, National Don't Cry Over Spilled Milk Day, National > > Inventors' Day, National Make a Friend Day, National Peppermint Patty > Day, > > National Shut-in Visitation Day, Pro Sports Wives Day, Promise Day, > > Satisfied Staying Single Day, White Shirt Day > > > > > > On Wed, Jan 6, 2016 at 7:13 PM, Dario Taraborelli > > <dtarabore...@wikimedia.org> wrote: > >> > >> Erik's proposal sounds very reasonable. > >> > >> There might be some confusion about what we mean by "keeping the old > >> datasets for longitudinal analysis". No one is planning to remove the > old > >> static dumps, just stop generating them/maintaining them going forward. > >> > >> I also want to echo Nuria regarding the human cost of maintaining > multiple > >> definitions. I just finished preparing a response to a reporter who was > >> asking about project-level mobile PV data and I was not immediately > able to > >> answer if a specific data source I wanted to cite was using the old or > new > >> definition (until I talked to Dan and we looked up together a gerrit > patch). > >> > >> How do people feel about turning off the generation of old dumps by May > >> 2016, i.e. one year after having the two series of data available in > >> parallel? > >> > >> > >> > >> On Wed, Jan 6, 2016 at 10:17 AM, Nuria Ruiz <nu...@wikimedia.org> > wrote: > >>> > >>> >As I just mentioned to Dan in a private email conversation, keeping > >>> > datasets even with imperfect measurements is important. Particularly > for > >>> > longitudinal analysis. > >>> Have in mind that maintaining these old dumps is not "free", it causes > a > >>> lot of confusion and maintenance costs to have several pageview > definitions > >>> around. We get a lot of questions about spiky-ness of old definition > and we > >>> need to maintain software that generates the old files thus, we think > is > >>> reasonable to ask our users to transition to the new definition and > >>> eventually (in a period of months) turn off the old dumps. > >>> > >>> On Thu, Dec 24, 2015 at 6:12 AM, Maurice Vergeer <m.verg...@maw.ru.nl> > >>> wrote: > >>>> > >>>> Dear all, > >>>> > >>>> As I just mentioned to Dan in a private email conversation, keeping > >>>> datasets even with imperfect measurements is important. Particularly > for > >>>> longitudinal analysis. > >>>> > >>>> Also, from what I understand - me being a newby here - is that the > data > >>>> are stored in separate files. Dan suggested reordering the page into > >>>> categories. Maybe, another option is to create more extensive > datasets with > >>>> more different measurements in a single datafile. On the other hand, > the > >>>> files would become even bigger in size. Not an issue for mee, but for > users > >>>> in the field accesibility (dowlnload bandwidth) could become an issue. > >>>> > >>>> my two cents > >>>> Maurice > >>>> > >>>> > >>>> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.d...@gmail.com> > wrote: > >>>>> > >>>>> Nothing against this approach! > >>>>> > >>>>> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu > >>>>> <dandree...@wikimedia.org> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.d...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi Dan, > >>>>>>> Happy holidays! > >>>>>>> Good idea to combine these datasets! However we have one more > dataset > >>>>>>> by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/ > >>>>>> > >>>>>> > >>>>>> And that's an important one! But I was thinking we could > re-organize > >>>>>> the page into categories. Erik's dataset could go into a > "processed data" > >>>>>> category or something like that. The three I wanted to talk about > on this > >>>>>> thread are just the raw data. > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Analytics mailing list > >>>>>> Analytics@lists.wikimedia.org > >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Thank you. > >>>>> > >>>>> Alex Druk > >>>>> alex.d...@gmail.com > >>>>> (775) 237-8550 Google voice > >>>>> > >>>>> _______________________________________________ > >>>>> Analytics mailing list > >>>>> Analytics@lists.wikimedia.org > >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> ________________________________________________ > >>>> Maurice Vergeer > >>>> To contact me, see http://mauricevergeer.nl/node/5 > >>>> To see my publications, see http://mauricevergeer.nl/node/1 > >>>> ________________________________________________ > >>>> > >>>> _______________________________________________ > >>>> Analytics mailing list > >>>> Analytics@lists.wikimedia.org > >>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>> > >>> > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> Analytics@lists.wikimedia.org > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > >> > >> > >> > >> -- > >> > >> > >> Dario Taraborelli Head of Research, Wikimedia Foundation > >> wikimediafoundation.org • nitens.org • @readermeter > >> > >> > >> _______________________________________________ > >> Analytics mailing list > >> Analytics@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics