Sounds like a good idea. But, what advantage other than reducing
redundant information would this provide? After all, CodeIssue,
FileMetric, Coverage, UnitTest, and Dependency data that all have
much more redundant information. On the other hand, I'm all for it
if the exploration becomes easier or if some way linking Issue data
with other data becomes easier. I think that is the key.
Maybe we can make the Jira sensor smarter? For example, maybe we have
to give in and actually hook into the database to send 'Age' data
instead of trying to calculate it on the server side.
CSDL's use of the Jira-Subversion sensor and the process that we are
following opens up a lot of possibilities for linking ActiveTime,
Commits, and Issues. One of the problems that I see with the event
based Issue SDT is that a commit for an Issue (specified with the
[HACK-1] convention) is not represented in the XML issue file. Thus,
our Jira sensor would not know to send Issue data and this link might
not be able to be made easily.
thanks, aaron
At 04:04 PM 1/27/2006, you wrote:
--On Friday, January 27, 2006 2:19 AM -1000 "(Cedric) Qin ZHANG"
<[EMAIL PROTECTED]> wrote:
The sensor is deployed and the first batch of data are sent. If you
want a distraction
from your coding activity, please log into your Hackystat account
and check today's raw
issue data and let me if there is any error.
Good work, Cedric!
When I get distracted, I like to send long emails, so here goes. :-)
First, I note that we need to evolve the Issue SDT to use pMaps
instead of the "data" field. This doesn't need to be done
immediately. Maybe I should first write the SDT chapter and then
see if someone can do this evolution by reading the documentation.
Second, the data (pMap) field should include something like a
'fixRelease' field; that will make the telemetry much more useful.
The last issue in my mind is the big one: snapshots vs. incremental
data. As everyone knows, I have been requiring a "snapshot" approach
to the Jira sensor in which it sends a summary of all relevent Jira
issues every day. This produces a tremendous amount of redundant
information and is not scalable as the issue repository grows. The
reason for my insanity is as follows:
- The Jira issue sensor was "volatile" code. Doing a snapshot was a
simple way to improve the reliability of the data.
- Incremental issue data is inherently vulnerable. If your sensor
drops out for a day or more, you've permanently lost data.
Furthermore, how do you distinguish between a busted sensor and
simply no Jira issue activity?
- Snapshot data makes the higher level analyses easier to
implement. Because we were still actively exploring what use to
make of this data, I wanted to make that exploration process easier.
Eventually, of course, we have to move to a more incremental
approach. Here's my proposal on how to do it safely:
(1) Create a new SDT called "IssueSnapshot". What this SDT provides
is summary statistics about the state of the repository---how many
issues were open, closed, etc. We have to do more investigation to
decide what we need to send for this thing, but it will be a very
small amount of data compared to the current snapshot. The
IssueSnapshot is sent every day by the sensor.
(2) Use the current Issue SDT to send only changes that occurred
during the current day. If there is no Issue activity, then no Issue
data is sent.
This approach solves two out of three problems:
(1) The amount of data sent to Hackystat becomes scalable, and the
redundant daily issue data is eliminated.
(2) We can now distinguish between "no data" and "sensor is
busted". When there is "no data", there is still an
IssueSnapshot. When there is no IssueSnapshot, then "sensor is busted".
The approach does not solve what to do when the sensor busts and we
miss data. I guess in that case we simply have to manually resend
the data from the days that were missed.
Thoughts welcomed.
Cheers,
Philip