It's also the International Day of Women and Girls in Science! Sounds like a good summary.
On 11 February 2016 at 07:31, Dan Andreescu <dandree...@wikimedia.org> wrote: > I almost revived this thread on Mardi Gras, but I didn't want to be known as > The Holiday Crusher so I waited. Today is relatively safe [1] :) > > Ok, there are three main points being made: > > 1. deprecating the old datasets > 2. liberating ourselves from the old format > 3. reorganizing the dumps page > > My thoughts on each: > > 1. I agree with Dario and Erik's points. Let's keep the old files around, > but stop generating new files in May 2016. To explain this, we'll make a > new section called "Deprecated" and put links to the pagecounts-* datasets > there. > > 2. I wasn't expecting to talk about format, but it makes sense because, for > example, Erik's dataset is just a pivoted format. So, we could have a > section for the Pageview datasets, with links for each format we already > have: Domasz archive format, Erik Z compressed format. We could then add a > new format that's easier to understand and could even include some of the > data we expose via the pageview API. But from an organizational point of > view, treating "format" as a separate concept from "dataset" will be an > improvement. > > 3. I think it's time we had our own page instead of just being under > dumps.wikimedia.org/other. Let's have dumps.wikimedia.org/analytics and > link to it from both the main dumps page and /other. The separation will > make it easier to reference other places we have data static file dumps, > like datasets.wikimedia.org. And it'll also make it easier to add links and > references to how this work is being done and where people can interact with > us or help us. > > > I hope I captured what everyone was saying. If there aren't any objections, > I'll send a list of next steps needed to accomplish this, and get to work :) > > > > [1] Today is Be Electrific Day, Get Out Your Guitar Day, Grandmother > Achievement Day, National Don't Cry Over Spilled Milk Day, National > Inventors' Day, National Make a Friend Day, National Peppermint Patty Day, > National Shut-in Visitation Day, Pro Sports Wives Day, Promise Day, > Satisfied Staying Single Day, White Shirt Day > > > On Wed, Jan 6, 2016 at 7:13 PM, Dario Taraborelli > <dtarabore...@wikimedia.org> wrote: >> >> Erik's proposal sounds very reasonable. >> >> There might be some confusion about what we mean by "keeping the old >> datasets for longitudinal analysis". No one is planning to remove the old >> static dumps, just stop generating them/maintaining them going forward. >> >> I also want to echo Nuria regarding the human cost of maintaining multiple >> definitions. I just finished preparing a response to a reporter who was >> asking about project-level mobile PV data and I was not immediately able to >> answer if a specific data source I wanted to cite was using the old or new >> definition (until I talked to Dan and we looked up together a gerrit patch). >> >> How do people feel about turning off the generation of old dumps by May >> 2016, i.e. one year after having the two series of data available in >> parallel? >> >> >> >> On Wed, Jan 6, 2016 at 10:17 AM, Nuria Ruiz <nu...@wikimedia.org> wrote: >>> >>> >As I just mentioned to Dan in a private email conversation, keeping >>> > datasets even with imperfect measurements is important. Particularly for >>> > longitudinal analysis. >>> Have in mind that maintaining these old dumps is not "free", it causes a >>> lot of confusion and maintenance costs to have several pageview definitions >>> around. We get a lot of questions about spiky-ness of old definition and we >>> need to maintain software that generates the old files thus, we think is >>> reasonable to ask our users to transition to the new definition and >>> eventually (in a period of months) turn off the old dumps. >>> >>> On Thu, Dec 24, 2015 at 6:12 AM, Maurice Vergeer <m.verg...@maw.ru.nl> >>> wrote: >>>> >>>> Dear all, >>>> >>>> As I just mentioned to Dan in a private email conversation, keeping >>>> datasets even with imperfect measurements is important. Particularly for >>>> longitudinal analysis. >>>> >>>> Also, from what I understand - me being a newby here - is that the data >>>> are stored in separate files. Dan suggested reordering the page into >>>> categories. Maybe, another option is to create more extensive datasets with >>>> more different measurements in a single datafile. On the other hand, the >>>> files would become even bigger in size. Not an issue for mee, but for users >>>> in the field accesibility (dowlnload bandwidth) could become an issue. >>>> >>>> my two cents >>>> Maurice >>>> >>>> >>>> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.d...@gmail.com> wrote: >>>>> >>>>> Nothing against this approach! >>>>> >>>>> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu >>>>> <dandree...@wikimedia.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.d...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Hi Dan, >>>>>>> Happy holidays! >>>>>>> Good idea to combine these datasets! However we have one more dataset >>>>>>> by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/ >>>>>> >>>>>> >>>>>> And that's an important one! But I was thinking we could re-organize >>>>>> the page into categories. Erik's dataset could go into a "processed >>>>>> data" >>>>>> category or something like that. The three I wanted to talk about on >>>>>> this >>>>>> thread are just the raw data. >>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> Analytics@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Thank you. >>>>> >>>>> Alex Druk >>>>> alex.d...@gmail.com >>>>> (775) 237-8550 Google voice >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>>> >>>> >>>> -- >>>> ________________________________________________ >>>> Maurice Vergeer >>>> To contact me, see http://mauricevergeer.nl/node/5 >>>> To see my publications, see http://mauricevergeer.nl/node/1 >>>> ________________________________________________ >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> >> >> Dario Taraborelli Head of Research, Wikimedia Foundation >> wikimediafoundation.org • nitens.org • @readermeter >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics