Re: Feedback of my Phd work in Cloudstack Project

Patrick Dube Thu, 10 Dec 2015 12:27:07 -0800

The history around the new file isn't the file itself, but in which
directory/package it would be in.


Cheers,

On Thu, Dec 10, 2015 at 3:01 PM, Igor Wiese <igor.wi...@gmail.com> wrote:

> Hi Patrick
>
> The problem with new files is the absence of history to build the
> prediction models. I need at least some commits (10 commits for example).
> Yes, the link between files is what we are predicting. We can predict
> changes involving commands.properties, XML files in general, .txt files, or
> any source code extension :-)
>
> Thanks for the feedback.
>
>
> 2015-12-10 17:40 GMT-02:00 Patrick Dube <patrickdub...@gmail.com>:
>
> > Are you handling new files as well, or the links between sets of files
> (or
> > packages)? As an example, if a user creates a new API cmd, then he will
> > update the "commands.properties" file. Another example, if a VO file is
> > updated, then there will be a db migration file added as well.
> > Cool work,
> >
> > On Thu, Dec 10, 2015 at 9:21 AM, Igor Wiese <igor.wi...@gmail.com>
> wrote:
> >
> > > Hi Sebastien.
> > >
> > > We used only 141 commits because we needed data from the issues. As my
> > > assumption is related to the contextual information from Issues and
> > Social
> > > aspects, we need to aggregate commits and Issues.
> > >
> > > First, I collected the issues from JIRA and then i tryed to aggregate
> the
> > > commits that explicit made mentions to an issue collected. I only also
> > used
> > > closed issues to obtain the confidence that the code used to build my
> > > models have been merged and checked by the community.
> > >
> > > That is the weak point of my approach. I need the past data from the
> > > issues. Sometimes it is not available for past time.
> > > It is in my plan to use also data from github to make the dataset more
> > > complete.
> > >
> > > All the best,
> > >
> > > 2015-12-10 11:22 GMT-02:00 sebgoa <run...@gmail.com>:
> > >
> > > >
> > > > On Dec 10, 2015, at 12:31 AM, Igor Wiese <igor.wi...@gmail.com>
> wrote:
> > > >
> > > > > Hi, Cloudstack Community.
> > > > >
> > > > > My name is Igor Wiese, phd Student from Brazil. In my research, I
> am
> > > > > investigating two important questions: What makes two files change
> > > > > together? Can we predict when they are going to co-change again?
> > > > >
> > > > > I've tried to investigate this question on the Cloudstack project.
> > I've
> > > > > collected data from issue reports, discussions and commits and
> using
> > > some
> > > > > machine learning techniques to build a prediction model.
> > > > >
> > > > > I collected a total of 141 commits in which a pair of files changed
> > > > > together and could correctly predict 60% commits.
> > > >
> > > >
> > > > Hi Igor, why 141 commits ? Is that the only commits you found with
> > only a
> > > > pair for changes ?
> > > >
> > > > My gut feeling is that you could check the entire history of the
> > > > CloudStack repo (~5 years worth of data) and work on different type
> of
> > > > tuples.
> > > >
> > > > 141 commits seems like a really small dataset.
> > > >
> > > > -Sebastien
> > > >
> > > > > These were the most
> > > > > useful information for predicting co-changes of files:
> > > > >
> > > > > - sum of number of lines of code added, modified and removed,
> > > > >
> > > > > - number of words used to describe and discuss the issues,
> > > > >
> > > > > - number of comments in each issue,
> > > > >
> > > > > - median value of closeness, a social network measure obtained from
> > > issue
> > > > > comments, and
> > > > >
> > > > > - median value of constraint, a social network measure obtained
> from
> > > > issue
> > > > > comments.
> > > > >
> > > > > To illustrate, consider the following example from our analysis.
> For
> > > > > release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and
> > > > > "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3
> > commits.
> > > > In
> > > > > another 2 commits, only the first file changed, but not the second.
> > > > > Collecting contextual information for each commit made to first
> file
> > in
> > > > the
> > > > > previous release (4.3), we were able to predict all 3 commits in
> > which
> > > > both
> > > > > files changed together in release 4.4, and we only issued 0 false
> > > > > positives. For this pair of files, the most important contextual
> > > > > information was the number of lines of code added, removed and
> > modified
> > > > in
> > > > > each commit,the number of comments in each issue, and social
> network
> > > > > measures (closeness, density, constraint, hierarchy) obtained from
> > > issue
> > > > > comments.
> > > > >
> > > > > - Do these results surprise you? Can you think in any explanation
> for
> > > the
> > > > > results?
> > > > >
> > > > > - Do you think that our rate of prediction is good enough to be
> used
> > > for
> > > > > building tool support for the software community?
> > > > >
> > > > > - Do you have any suggestion on what can be done to improve the
> > change
> > > > > recommendation?
> > > > >
> > > > > You can visit our webpage to inspect the results in details:
> > > > > http://flosscoach.com/index.php/17-cochanges/67-cloudstack
> > > > >
> > > > > All the best,
> > > > > Igor Wiese
> > > > > Phd Candidate
> > > >
> > > >
> > >
> > >
> > > --
> > > =================================
> > > Igor Scaliante Wiese
> > > PhD Candidate - Computer Science @ IME/USP
> > > Faculty in Dept. of Computing at Universidade Tecnológica Federal do
> > Paraná
> > >
> >
>
>
>
> --
> =================================
> Igor Scaliante Wiese
> PhD Candidate - Computer Science @ IME/USP
> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
>

Re: Feedback of my Phd work in Cloudstack Project

Reply via email to