On November 24, 2017 10:41 am, Klérisson Paixão wrote:
Speaking of which, I think it's important to curate a dataset of
success/failure logs with the expected anomalies to be found. Those will
be super useful to prevent regression when trying out new settings or
models.
How to store and manage the dataset remains to be defined too.
To give you an idea, fwiw, you can find my original dataset here:
 git clone https://softwarefactory-project.io/r/logreduce-tests

How did you collect and curate the original dataset?
And, how do you expect the new set looks like?

Cheers,
Klérisson

This dataset has been manually created, mostly out of failed jobs from
openstack-infra ci. I tried to pick logs with unusual formats and I
just referenced the expected anomalies to be found in the inf.yaml files.

Perhaps we could annotate the error_pr score out of the current
log-classify.crm, at least for the obvious anomalies. The dataset
attribute would be a list of (error_pr, log-line) tuples.

Though, instead of looking for high error_pr score, we might want to only
report the error_pr scores that highly deviate from the mean, in which
case we should better store (deviation, log-line).

Regards,
-Tristan

Attachment: pgpQxZGUmf3nR.pgp
Description: PGP signature

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Reply via email to