> The motivation behind your proposal is (I think) a desire to have a unified 
> configuration interface for data collection jobs. This makes total sense and 
> it's worth pursuing. I just don't think we should stuff everything into the 
> schema. The schema is just that: a schema. It's a data model.

> Much agree with ori here. We would be bloating schema with properties that 
> have nothing to do with data definition.


agree with both of you, these are data collection settings that do not 
necessarily belong in the schema itself if its job is to represent the data 
model.

As you know, we don’t have a solution for representing schema metadata (other 
than the dirty hack of schema talk pages) or data collection options. As a 
customer, I would value the ability to specify schema ownership (who should be 
contacted if something goes wrong), sampling rates (should the data be 
collected sampled or unsampled), retention and privacy options (should the data 
be retained indefinitely? should the whole log be pruned after the retention 
window? are there fields that include PII that should be stripped?) as well as 
monitoring where a specific <schema, rev_id> is deployed.

Dario
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to