Just spoke with Jaime Crespo and he confirmed that:

   - m4-master (master EL database) only holds events for the last 45
   days to avoid space problems. That's for all tables including Echo.

   - analytics-storage is the replica that keeps the historical data and is
   meant to apply the specific purging strategy agreed in the schema's talk
   page. This database does not have space problems (yet).


On Wed, Dec 16, 2015 at 2:14 AM, Aaron Halfaker <ahalfa...@wikimedia.org>
wrote:

> No!  Please do not nuke old data.  +1 to J-Mo.  This will probably be
> useful for long-term studies of notifications.  If I had the time, I'd pick
> it up right now based on this reminder!
>
> I'm happy with having historical data preserved (please makes sure that it
> is) and the MySQL table dropped until a recent point.  It will be important
> that we can come back to this later and either restore the data or query it
> in it's entirety from hadoop.
>
> -Aaron
>
> On Tue, Dec 15, 2015 at 1:34 PM, Madhumitha Viswanathan <
> mviswanat...@wikimedia.org> wrote:
>
>> I want to mention that data in Hadoop is only available from Aug 27th
>> 2015. Older data is only available in mysql.
>>
>> On Tue, Dec 15, 2015 at 11:27 AM, Roan Kattouw <rkatt...@wikimedia.org>
>> wrote:
>>
>>> If the data is going to be retained but would just become harder to
>>> query (i.e. still in Hadoop but not in mysql), maybe we could nuke data
>>> that's more than a year old (or 6 months old or something) from mysql?
>>>
>>> On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto <ao...@wikimedia.org>
>>> wrote:
>>>
>>>> We could blacklist this schema from the mysql database, and still keep
>>>> producing it.  It would be available in Hadoop either way.
>>>>
>>>>
>>>> On Dec 15, 2015, at 12:22, Jonathan Morgan <jmor...@wikimedia.org>
>>>> wrote:
>>>>
>>>> Hi Nuria,
>>>>
>>>> FWIW: Although I'm not using this right now, but I could see it being
>>>> useful for understanding the impact of new notification updates that are
>>>> coming down the pike.[1][2]
>>>>
>>>> What are the costs involved in keeping this schema up?
>>>>
>>>> Best,
>>>> J
>>>>
>>>> 1.
>>>> https://meta.wikimedia.org/wiki/Research:Cross-wiki_notifications_user_research
>>>> 2. https://phabricator.wikimedia.org/T116741
>>>>
>>>> On Tue, Dec 15, 2015 at 8:22 AM, Nuria Ruiz <nu...@wikimedia.org>
>>>> wrote:
>>>>
>>>>> Roan:
>>>>>
>>>>> The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo)
>>>>> is quite large and we are not sure is even used.
>>>>>
>>>>> Can you confirm either way? If it is no longer used we will stop
>>>>> collecting it.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Nuria
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan T. Morgan
>>>> Senior Design Researcher
>>>> Wikimedia Foundation
>>>> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> --Madhu :)
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to