Sound quite promising Marcel, not to mention cool!  :) 

One caution though, about one specific item you mentioned several times: "
... users entering a non-existing update site url ..." 

Don't be too quick to write all those off as "user error". Believe it or 
not, some project's repositories include incorrect or obsolete "reference 
sites". That's one reason we stopped "copying them" to the Sim. Release 
repo. (That, and there was just too many, to be meaningful to users). 

I do not think there is a way to detect which were entered by user, and 
which were "data" provided by a project's repo.

Perhaps you are talking about obvious typo's or something. But, if not, 
you might want to "keep a list" of non-existing sites you find in the 
reports, in a common "cross-project" bug, or something? (In the past, when 
I open bugs on invalid ones, projects were very slow to fix, if they fixed 
at all.)

Good luck, 





From:   Marcel Bruch <marcel.br...@codetrails.com>
To:     Cross project issues <cross-project-issues-dev@eclipse.org>, 
Date:   10/09/2014 01:36 PM
Subject:        [cross-project-issues-dev] Updates on Automated Error 
Reporting:      Started with Mars M2
Sent by:        cross-project-issues-dev-boun...@eclipse.org



Hi,

I’d like to give an update on the current status of the automated error 
reporting started with Eclipse Mars M2 last week (and earlier iterations). 
The error reporting tool is now part of Eclipse Mars M2 Committer Edition 
and Modeling Package.


In the past ~4 weeks, we received 1500 error reports, of which 650 were 
distinct. These reports have been mapped into 412 groups of very similar 
reports using an automated assessment but manually reviewing these 
recommendations.

Out of these 412 report groups,

* 48 reports (~12%) have been marked as being „not eclipse“, meaning that 
they are caused by other, external plugins.
* 41 reports (~10%) have been marked as „invalid“ or „won’t fix“ meaning 
that they are likely caused by users-errors like entering a non-existing 
update site url or similar things.
* 55 reports (~13%) have been moved to other projects for further 
investigation (I think they are likely bugs).
* 10 new reports have been marked as fixed as of today
* approximately 15 reports have been marked as duplicates of already 
existing bug reports created manually.

The remaining ~250 reports currently do not provide enough information to 
allow (me) judgement. There are several reasons why I cannot judge whether 
it’s a bug or not. The obvious one is that neither the error message nor 
the stack trace give me a clear indication. The technical ones are (i) an 
issue with the m2e SLF4J log appender swallowing exceptions, and (ii) a 
misconception on my end where we swallowed some stacktraces hidden in 
CoreException.getStatus.

Theses are fixed now and should lead to better results for M3. 


Up to now, we had roughly 350 reporters in the past 4 weeks and 75 alone 
last 7 days. Per day we receive between 50 and 120 error reports of which 
~40 reports are „new“ (meaning distinct). For each of these distinct error 
reports a new bug report is created. After duplicate detection and first 
classification this number goes down to 20-30 per day.

I hope and expect that the number of error reports per day goes down more 
and more over time. How many distinct error traces can be out there in 
Eclipse, eh? :-?



Just to make clear again:
Not all logged error reports are bugs. Actually, I estimate that at some 
point, say, 80% of the reports will be user errors (e.g. users entering a 
non-existing update site url etc.). Only a fraction will actually be bugs. 
However, when starting from zero, we need to wade through that river 
first…


Of course it’s a bit too early to draw a conclusion. But what I can say at 
the moment is that:

(i) we observe a recognizable amount of errors/bugs in Luna that weren’t 
reported in the past 6 months.
(ii) projects like MPC, JDT, and PDE are quite responsive, i.e., comment 
and discuss them or mark them as duplicates of other bugs reported 
earlier. The way how these projects handle these reports is greatly 
appreciated and makes this a worthwhile investment for me.
(iii) there are some new feature requests coming up from committers. 
Please let me know which ideas you have to make error reporting more 
useful.
(iv) Bugzilla is not a good front-end to manage duplicate detections. At 
some time, we’ll have to improve this.
(v) reviewing errors takes me roughly an hour per day. I’d welcome 
committers to review reports for their projects. Please find a set of 
example bugzilla queries to find the latest bugs for your projects below. 




If you don’t have time to review error reports, there are a few things you 
could do in your code to make error reporting and duplicate detection more 
effective:


1. Use error codes:

We use error codes for duplicate detection (if two reports have the same 
error code they are more likely to be duplicates).
We’ve 2144 error reports in the dashboard. Out of these 1114 error reports 
have an error code of ‚0‘; 432 have error code ‚2‘, and 403 reports use 
error code ‚4‘. Which makes 90 % of all error codes. 


2. Use (single) quotes for placeholders in messages:

We use the similar of error messages to determine whether two error 
reports are similar.
We currently split the messages on white spaces. If we could safely 
identify the placeholders in your messages, this would work even better. 
Thus, if you'd use '' and put variable parts of your error message in 
there, this would be a great improvement. In case you use longer messages, 
you may consider putting them behind a ":". Then we could cut off your 
messages after the colon or at least rate it much lower.


3. Log your exceptions:

Sounds obvious but isn't always the case.
We use the exception types to judge whether two error reports may be 
duplicates. The more specific your log / status code and exception type 
is, the better we can detect duplicates.


4. Let your plugin name and packages follow the Eclipse naming 
conventions:

We guess which bundles are participating in an stack trace by mapping 
class names to bundles resolved in the active system. If your plugin uses 
com.some.thing but you plugin is calls my.other we don’t put your bundle 
on the list of present bundles.
We use this list of bundles to guess which Bugzilla product we file new 
issues against and guess the version from the report.


5. Use Bugzilla "whiteboard" field and keyword "needinfo“ in bug reports:

The error reporter reads the values stored in the keyword and whiteboard 
field and presents them to the (next) reporter. For example, if you get an 
error message like the well-known ‚Resource is out of sync‘, the automated 
error reporter would send this error to eclipse.org and in turn would 
present the user a message like „Tip: Eclipse can keep track of resource 
changes automatically. To enable this, go to preferences > General > 
Workspace and enable 'use native polling‘".

Done right, users that experience such a problem may get immediate 
feedback how to solve these issues for all times.

In case the error report is not providing enough details, you may specify 
the „needinfo“ keyword. In that case, the next reporter get’s notified 
that a committer requested additional information and points him to the 
bug report.

We may extend this approach in the near future depending on your feedback 
and if you actually make use of it. So, let us know what you think.



Some notes on our plans for M3:
First priority is improving duplicate detection. As of today we correctly 
classify 50% of the bugs with a false positive rate of 1%. For the 
remaining 50% we could do better…

Second priority is improving the Eclipse client. If you have usability 
suggestions, please open a bug agains Recommenders.Incubator product, 
Stacktraces component.

Using bugzilla as „ui“ is not perfect and we need to replace it some time 
in the future. For the time being, however, we’ll stay with Bugzilla until 
either Webmasters cut us off from Bugzilla because of too much traffic or 
we have to time to build a front-end more suitable for analyzing automated 
error reports.


And sorry for broken (old) links.
Short before M2 we changed the URL scheme so that all resources are now 
well protected; potentially private data is accessible for committers 
only. This, however, required a new (breaking) naming scheme. All new bug 
reports use the new urls but we did not fix the old ones. Sorry for the 
inconvenience this may have caused.

The error reports dashboard is now available at: 
https://dev.eclipse.org/recommenders/committers/dashboard/



Finally,
in case you’d like to integrate the error reporting tool into your EPP 
package, please contact your package maintainer. We’d be more than happy 
to receive many more error reports.


Best,
Marcel




Example links to review project specific reports below. You may take these 
as examples to create your own queries - or contact me for custom 
solutions:

[m2e]   http://eclip.se/2P
[oomph] http://eclip.se/2S
[mpc]   http://eclip.se/2R
[jdt]   http://goo.gl/ROkLsS
[all open last 7d] http://goo.gl/uN03c7


-- 
Codetrails GmbH
The knowledge transfer company

Robert-Bosch-Str. 7, 64293 Darmstadt
Phone: +49-6151-276-7092
Mobile: +49-179-131-7721
http://www.codetrails.com/

Managing Director: Dr. Marcel Bruch
Handelsregister: Darmstadt HRB 91940

[attachment "signature.asc" deleted by David M Williams/Raleigh/IBM] 
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@eclipse.org
To change your delivery options, retrieve your password, or unsubscribe 
from this list, visit
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev

_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@eclipse.org
To change your delivery options, retrieve your password, or unsubscribe from 
this list, visit
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev

Reply via email to