--On Friday, January 27, 2006 2:19 AM -1000 "(Cedric) Qin ZHANG" <[EMAIL PROTECTED]>
wrote:
The sensor is deployed and the first batch of data are sent. If you want a
distraction
from your coding activity, please log into your Hackystat account and check
today's raw
issue data and let me if there is any error.
Good work, Cedric!
When I get distracted, I like to send long emails, so here goes. :-)
First, I note that we need to evolve the Issue SDT to use pMaps instead of the "data"
field. This doesn't need to be done immediately. Maybe I should first write the SDT
chapter and then see if someone can do this evolution by reading the documentation.
Second, the data (pMap) field should include something like a 'fixRelease' field; that
will make the telemetry much more useful.
The last issue in my mind is the big one: snapshots vs. incremental data. As everyone
knows, I have been requiring a "snapshot" approach to the Jira sensor in which it sends a
summary of all relevent Jira issues every day. This produces a tremendous amount of
redundant information and is not scalable as the issue repository grows. The reason for
my insanity is as follows:
- The Jira issue sensor was "volatile" code. Doing a snapshot was a simple way to
improve the reliability of the data.
- Incremental issue data is inherently vulnerable. If your sensor drops out for a day or
more, you've permanently lost data. Furthermore, how do you distinguish between a busted
sensor and simply no Jira issue activity?
- Snapshot data makes the higher level analyses easier to implement. Because we were
still actively exploring what use to make of this data, I wanted to make that exploration
process easier.
Eventually, of course, we have to move to a more incremental approach. Here's my proposal
on how to do it safely:
(1) Create a new SDT called "IssueSnapshot". What this SDT provides is summary
statistics about the state of the repository---how many issues were open, closed, etc.
We have to do more investigation to decide what we need to send for this thing, but it
will be a very small amount of data compared to the current snapshot. The IssueSnapshot
is sent every day by the sensor.
(2) Use the current Issue SDT to send only changes that occurred during the current day.
If there is no Issue activity, then no Issue data is sent.
This approach solves two out of three problems:
(1) The amount of data sent to Hackystat becomes scalable, and the redundant daily issue
data is eliminated.
(2) We can now distinguish between "no data" and "sensor is busted". When there is "no
data", there is still an IssueSnapshot. When there is no IssueSnapshot, then "sensor is
busted".
The approach does not solve what to do when the sensor busts and we miss data. I guess
in that case we simply have to manually resend the data from the days that were missed.
Thoughts welcomed.
Cheers,
Philip