On November 24, 2017 10:41 am, Klérisson Paixão wrote:
Speaking of which, I think it's important to curate a dataset of success/failure logs with the expected anomalies to be found. Those will be super useful to prevent regression when trying out new settings ormodels.How to store and manage the dataset remains to be defined too. To give you an idea, fwiw, you can find my original dataset here: git clone https://softwarefactory-project.io/r/logreduce-testsHow did you collect and curate the original dataset? And, how do you expect the new set looks like? Cheers, Klérisson
This dataset has been manually created, mostly out of failed jobs from openstack-infra ci. I tried to pick logs with unusual formats and I just referenced the expected anomalies to be found in the inf.yaml files. Perhaps we could annotate the error_pr score out of the current log-classify.crm, at least for the obvious anomalies. The dataset attribute would be a list of (error_pr, log-line) tuples. Though, instead of looking for high error_pr score, we might want to only report the error_pr scores that highly deviate from the mean, in which case we should better store (deviation, log-line). Regards, -Tristan
pgpQxZGUmf3nR.pgp
Description: PGP signature
_______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
