Re: [HACKYSTAT-DEV-L] Jira2 sensor comments and from snapshot->incremental issue collection

(Cedric) Qin ZHANG Sun, 29 Jan 2006 04:33:11 -0800

Hi, Philip,

The biggest problem with incremental data is that you don't how manydays you need to scan back (while doing analysis) when you want to haveall open issues for a particular day. This is necessary if you want tocalculate the number of issues satisfying certain criteria (e.g. open,major issues in 7.3 release assigned to Philip). Unless "IssueSnapShot"can anticipate all criteria we might need in the future, we have toeither (1) scan all the way back to day 0, or (2) make arbitrarydecision regarding how many days we need to go back.

I have a partial solution to reduce the amount of server side data. Weknow that some sdt are "latest snapshot", such as Issue, Coverage. Forthose types of data, we only need the latest batch. There could be atask on the server side that wakes up daily to delete "non-latest"batches of snapshot data.


Cheers,

Cedric


Philip Johnson wrote:

--On Friday, January 27, 2006 2:19 AM -1000 "(Cedric) Qin ZHANG"<[EMAIL PROTECTED]> wrote:
The sensor is deployed and the first batch of data are sent. If youwant a distractionfrom your coding activity, please log into your Hackystat account andcheck today's raw
issue data and let me if there is any error.
Good work, Cedric!

When I get distracted, I like to send long emails, so here goes. :-)
First, I note that we need to evolve the Issue SDT to use pMapsinstead of the "data" field. This doesn't need to be doneimmediately. Maybe I should first write the SDT chapter and then seeif someone can do this evolution by reading the documentation.
Second, the data (pMap) field should include something like a'fixRelease' field; that will make the telemetry much more useful.
The last issue in my mind is the big one: snapshots vs. incrementaldata. As everyone knows, I have been requiring a "snapshot" approachto the Jira sensor in which it sends a summary of all relevent Jiraissues every day. This produces a tremendous amount of redundantinformation and is not scalable as the issue repository grows. Thereason for my insanity is as follows:
- The Jira issue sensor was "volatile" code. Doing a snapshot was asimple way to improve the reliability of the data.
- Incremental issue data is inherently vulnerable. If your sensordrops out for a day or more, you've permanently lost data.Furthermore, how do you distinguish between a busted sensor and simplyno Jira issue activity?
- Snapshot data makes the higher level analyses easier to implement.Because we were still actively exploring what use to make of thisdata, I wanted to make that exploration process easier.
Eventually, of course, we have to move to a more incremental approach.Here's my proposal on how to do it safely:
(1) Create a new SDT called "IssueSnapshot". What this SDT providesis summary statistics about the state of the repository---how manyissues were open, closed, etc. We have to do more investigation todecide what we need to send for this thing, but it will be a verysmall amount of data compared to the current snapshot. TheIssueSnapshot is sent every day by the sensor.
(2) Use the current Issue SDT to send only changes that occurredduring the current day. If there is no Issue activity, then no Issuedata is sent.
This approach solves two out of three problems:
(1) The amount of data sent to Hackystat becomes scalable, and theredundant daily issue data is eliminated.(2) We can now distinguish between "no data" and "sensor is busted".When there is "no data", there is still an IssueSnapshot. When thereis no IssueSnapshot, then "sensor is busted".
The approach does not solve what to do when the sensor busts and wemiss data. I guess in that case we simply have to manually resend thedata from the days that were missed.
Thoughts welcomed.

Cheers,
Philip

Re: [HACKYSTAT-DEV-L] Jira2 sensor comments and from snapshot->incremental issue collection

Reply via email to