[Reviving old thread]

I was looking at our EventLogging data today, and discovered that
Schema:Edit contains no useful information that isn't already in the
database apart from which button people use to thank each other, and if we
really care about that we can measure it separately without producing nine
gigs of unused data.

Feel free to delete the data associated with Schema:Echo (but not
Schema:EchoInteraction! We do use that one) with extreme prejudice. I've
also written a config patch to stop us from producing these events (
https://gerrit.wikimedia.org/r/#/c/274345/ ) which I will deploy in the
SWAT on Thursday.

I also found that a long-standing issue with duplicate events in
Schema:EchoInteraction wasn't fixed yet, so I wrote a patch for that too:
https://gerrit.wikimedia.org/r/274342

On Tue, Dec 15, 2015 at 11:16 AM, Jonathan Morgan <jmor...@wikimedia.org>
wrote:

> Hi Nuria!
>
> Speaking for *my own particular scenario*, that solution sounds like it
> will be fine, since I don't plan on immediately performing research with
> these data.
>
> But it's obviously still the Collab team's call here--they likely have
> needs I know nothing about. Cc'ing Joe Matazzoni in case he's not following
> this already...
>
> J
>
>
>
> On Tue, Dec 15, 2015 at 9:50 AM, Nuria Ruiz <nu...@wikimedia.org> wrote:
>
>>
>> >We could blacklist this schema from the mysql database, and still keep
>> producing it.  It would be available in Hadoop either way.
>>
>> Right but I would also like to drop the table if it is not being used, if
>> data is not going to be looked at soonish there is no point in storing as
>> it will likely be deleted before it gets looked at.
>>
>> Thanks,
>>
>> Nuria
>>
>> On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto <ao...@wikimedia.org> wrote:
>>
>>> We could blacklist this schema from the mysql database, and still keep
>>> producing it.  It would be available in Hadoop either way.
>>>
>>>
>>> On Dec 15, 2015, at 12:22, Jonathan Morgan <jmor...@wikimedia.org>
>>> wrote:
>>>
>>> Hi Nuria,
>>>
>>> FWIW: Although I'm not using this right now, but I could see it being
>>> useful for understanding the impact of new notification updates that are
>>> coming down the pike.[1][2]
>>>
>>> What are the costs involved in keeping this schema up?
>>>
>>> Best,
>>> J
>>>
>>> 1.
>>> https://meta.wikimedia.org/wiki/Research:Cross-wiki_notifications_user_research
>>> 2. https://phabricator.wikimedia.org/T116741
>>>
>>> On Tue, Dec 15, 2015 at 8:22 AM, Nuria Ruiz <nu...@wikimedia.org> wrote:
>>>
>>>> Roan:
>>>>
>>>> The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo)
>>>> is quite large and we are not sure is even used.
>>>>
>>>> Can you confirm either way? If it is no longer used we will stop
>>>> collecting it.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Nuria
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>>>
>>> --
>>> Jonathan T. Morgan
>>> Senior Design Researcher
>>> Wikimedia Foundation
>>> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to