Re: Can you help us Hadoop Community?

2015-12-15 Thread Igor Wiese
For Sure Zhe. We are planning to study in the method level as well.

Thanks for your suggestion
All the best,
Igor Wiese



2015-12-15 17:26 GMT-02:00 Zhe Zhang :
>>
>> it is difficult to find files to change in a
>> specific issue.
>
> I guess this can be a useful reminder "you might also want to update file
> Y". Maybe richer insights can be found on method level.
>
> ---
> Zhe Zhang
>
> On Mon, Dec 14, 2015 at 7:07 PM, Igor Wiese  wrote:
>
>> Hi Zhe! Thanks for your answer.
>>
>> In fact, we are predicting the "co-change" based on contextual
>> information collected from issues, commits and developers
>> communication. Considering the files that i described in the example
>> ("/ipc/Client.java" and
>> "security/SecurityUtil.java") I collected metrics in each issue and
>> commit from Client.java to predict when Client.java is prone to change
>> with SecurityUtil.java.
>>
>> We are thinking to build a webservice to help newcomers during their
>> first contributions. Our research group interviewed some newcomers and
>> they told us that it is difficult to find files to change in a
>> specific issue. We can recommend files to be checked.
>>
>> From the committer perspective, we could help in code review tasks.
>>
>> What do you think?
>>
>> Our idea
>>
>> 2015-12-14 22:16 GMT-02:00 Zhe Zhang :
>> > Hi Igor,
>> >
>> > It's an interesting direction to study tickets/commits in the Hadoop
>> > community.
>> >
>> > A research group from Univ. Wisconsin did a similar study on Linux file
>> > systems and I found it quite insightful:
>> > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf
>> >
>> > For your results, could you elaborate why you picked "co-change" as the
>> > metric, and how to improve software tools from the "co-change"
>> predictions?
>> >
>> > Thanks,
>> > Zhe
>> >
>> > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese 
>> wrote:
>> >
>> >> Hi, Hadoop Community.
>> >>
>> >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week
>> >> ago about my research. We received some visit to inspect the results
>> >> but any feedback was provided.
>> >>
>> >> I am investigating two important questions: What makes two files
>> >> change together? Can we predict when they are going to co-change
>> >> again?
>> >>
>> >> I've tried to investigate this question on the Hadoop project. I've
>> >> collected data from issue reports, discussions and commits and using
>> >> some machine learning techniques to build a prediction model.
>> >>
>> >>
>> >> I collected a total of 950 commits in which a pair of files changed
>> >> together and could correctly predict 47% commits. These were the most
>> >> useful information for predicting co-changes of files:
>> >>
>> >> - sum of number of lines of code added, modified and removed,
>> >>
>> >> - number of words used to describe and discuss the issues,
>> >>
>> >> - median value of closeness, a social network measure obtained from
>> >> issue comments,
>> >>
>> >> - median value of constraint, a social network measure obtained from
>> >> issue comments, and
>> >>
>> >> - median value of hierarchy, a social network measure obtained from
>> >> issue comments.
>> >>
>> >> To illustrate, consider the following example from our analysis. For
>> >> release 0.22, the files "/ipc/Client.java" and
>> >> "security/SecurityUtil.java" changed together in 3 commits. In another
>> >> 1 commit, only the first file changed, but not the second. Collecting
>> >> contextual information for each commit made to first file in the
>> >> previous release, we were able to predict 2 commits in which both
>> >> files changed together in release 0.22, and we only issued 1 wrong
>> >> prediction. For this pair of files, the most important contextual
>> >> information were the social network metrics (density, hierarchy,
>> >> efficiency) obtained from issue comments.
>> >>
>> >>
>> >> - Do these results surprise you? Can you think in any explanation for
>> >> the results?
>> >>
>> >> 

Re: Can you help us Hadoop Community?

2015-12-14 Thread Igor Wiese
Hi Zhe! Thanks for your answer.

In fact, we are predicting the "co-change" based on contextual
information collected from issues, commits and developers
communication. Considering the files that i described in the example
("/ipc/Client.java" and
"security/SecurityUtil.java") I collected metrics in each issue and
commit from Client.java to predict when Client.java is prone to change
with SecurityUtil.java.

We are thinking to build a webservice to help newcomers during their
first contributions. Our research group interviewed some newcomers and
they told us that it is difficult to find files to change in a
specific issue. We can recommend files to be checked.

>From the committer perspective, we could help in code review tasks.

What do you think?

Our idea

2015-12-14 22:16 GMT-02:00 Zhe Zhang :
> Hi Igor,
>
> It's an interesting direction to study tickets/commits in the Hadoop
> community.
>
> A research group from Univ. Wisconsin did a similar study on Linux file
> systems and I found it quite insightful:
> http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf
>
> For your results, could you elaborate why you picked "co-change" as the
> metric, and how to improve software tools from the "co-change" predictions?
>
> Thanks,
> Zhe
>
> On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese  wrote:
>
>> Hi, Hadoop Community.
>>
>> My name is Igor Wiese, phd Student from Brazil. I sent an email a week
>> ago about my research. We received some visit to inspect the results
>> but any feedback was provided.
>>
>> I am investigating two important questions: What makes two files
>> change together? Can we predict when they are going to co-change
>> again?
>>
>> I've tried to investigate this question on the Hadoop project. I've
>> collected data from issue reports, discussions and commits and using
>> some machine learning techniques to build a prediction model.
>>
>>
>> I collected a total of 950 commits in which a pair of files changed
>> together and could correctly predict 47% commits. These were the most
>> useful information for predicting co-changes of files:
>>
>> - sum of number of lines of code added, modified and removed,
>>
>> - number of words used to describe and discuss the issues,
>>
>> - median value of closeness, a social network measure obtained from
>> issue comments,
>>
>> - median value of constraint, a social network measure obtained from
>> issue comments, and
>>
>> - median value of hierarchy, a social network measure obtained from
>> issue comments.
>>
>> To illustrate, consider the following example from our analysis. For
>> release 0.22, the files "/ipc/Client.java" and
>> "security/SecurityUtil.java" changed together in 3 commits. In another
>> 1 commit, only the first file changed, but not the second. Collecting
>> contextual information for each commit made to first file in the
>> previous release, we were able to predict 2 commits in which both
>> files changed together in release 0.22, and we only issued 1 wrong
>> prediction. For this pair of files, the most important contextual
>> information were the social network metrics (density, hierarchy,
>> efficiency) obtained from issue comments.
>>
>>
>> - Do these results surprise you? Can you think in any explanation for
>> the results?
>>
>> - Do you think that our rate of prediction is good enough to be used
>> for building tool support for the software community?
>>
>> - Do you have any suggestion on what can be done to improve the change
>> recommendation?
>>
>> You can visit our webpage to inspect the results in details:
>> http://flosscoach.com/index.php/17-cochanges/70-hadoop
>>
>> All the best,
>> Igor Wiese
>>
>> Phd Candidate
>>
>> --
>> =
>> Igor Scaliante Wiese
>> PhD Candidate - Computer Science @ IME/USP
>> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
>>



-- 
=
Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná


Can you help us Hadoop Community?

2015-12-14 Thread Igor Wiese
Hi, Hadoop Community.

My name is Igor Wiese, phd Student from Brazil. I sent an email a week
ago about my research. We received some visit to inspect the results
but any feedback was provided.

I am investigating two important questions: What makes two files
change together? Can we predict when they are going to co-change
again?

I've tried to investigate this question on the Hadoop project. I've
collected data from issue reports, discussions and commits and using
some machine learning techniques to build a prediction model.


I collected a total of 950 commits in which a pair of files changed
together and could correctly predict 47% commits. These were the most
useful information for predicting co-changes of files:

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues,

- median value of closeness, a social network measure obtained from
issue comments,

- median value of constraint, a social network measure obtained from
issue comments, and

- median value of hierarchy, a social network measure obtained from
issue comments.

To illustrate, consider the following example from our analysis. For
release 0.22, the files "/ipc/Client.java" and
"security/SecurityUtil.java" changed together in 3 commits. In another
1 commit, only the first file changed, but not the second. Collecting
contextual information for each commit made to first file in the
previous release, we were able to predict 2 commits in which both
files changed together in release 0.22, and we only issued 1 wrong
prediction. For this pair of files, the most important contextual
information were the social network metrics (density, hierarchy,
efficiency) obtained from issue comments.


- Do these results surprise you? Can you think in any explanation for
the results?

- Do you think that our rate of prediction is good enough to be used
for building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?

You can visit our webpage to inspect the results in details:
http://flosscoach.com/index.php/17-cochanges/70-hadoop

All the best,
Igor Wiese

Phd Candidate

-- 
=
Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná


Feedback of my Phd Work in Hadoop Project

2015-12-09 Thread Igor Wiese
Hi, Hadoop Community.

My name is Igor Wiese, phd Student from Brazil. I am investigating two
important questions: What makes two files change together? Can we predict
when they are going to co-change again?

I've tried to investigate this question on the Hadoop project. I've
collected data from issue reports, discussions and commits and using some
machine learning techniques to build a prediction model.

I collected a total of 950 commits in which a pair of files changed
together and could correctly predict 47% commits. These were the most
useful information for predicting co-changes of files:

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues,

- median value of closeness, a social network measure obtained from issue
comments,

- median value of constraint, a social network measure obtained from issue
comments, and

- median value of hierarchy, a social network measure obtained from issue
comments.

To illustrate, consider the following example from our analysis. For
release 0.22, the files "/ipc/Client.java" and "security/SecurityUtil.java"
changed together in 3 commits. In another 1 commit, only the first file
changed, but not the second. Collecting contextual information for each
commit made to first file in the previous release, we were able to predict
2 commits in which both files changed together in release 0.22, and we only
issued 1 wrong prediction. For this pair of files, the most important
contextual information were the social network metrics (density, hierarchy,
efficiency) obtained from issue comments.

- Do these results surprise you? Can you think in any explanation for the
results?

- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?

You can visit our webpage to inspect the results in details:
http://flosscoach.com/index.php/17-cochanges/70-hadoop

All the best,
Igor Wiese
Phd Candidate