For Sure Zhe. We are planning to study in the method level as well. Thanks for your suggestion All the best, Igor Wiese
2015-12-15 17:26 GMT-02:00 Zhe Zhang <zhezh...@cloudera.com>: >> >> it is difficult to find files to change in a >> specific issue. > > I guess this can be a useful reminder "you might also want to update file > Y". Maybe richer insights can be found on method level. > > --- > Zhe Zhang > > On Mon, Dec 14, 2015 at 7:07 PM, Igor Wiese <igor.wi...@gmail.com> wrote: > >> Hi Zhe! Thanks for your answer. >> >> In fact, we are predicting the "co-change" based on contextual >> information collected from issues, commits and developers >> communication. Considering the files that i described in the example >> ("/ipc/Client.java" and >> "security/SecurityUtil.java") I collected metrics in each issue and >> commit from Client.java to predict when Client.java is prone to change >> with SecurityUtil.java. >> >> We are thinking to build a webservice to help newcomers during their >> first contributions. Our research group interviewed some newcomers and >> they told us that it is difficult to find files to change in a >> specific issue. We can recommend files to be checked. >> >> From the committer perspective, we could help in code review tasks. >> >> What do you think? >> >> Our idea >> >> 2015-12-14 22:16 GMT-02:00 Zhe Zhang <z...@apache.org>: >> > Hi Igor, >> > >> > It's an interesting direction to study tickets/commits in the Hadoop >> > community. >> > >> > A research group from Univ. Wisconsin did a similar study on Linux file >> > systems and I found it quite insightful: >> > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf >> > >> > For your results, could you elaborate why you picked "co-change" as the >> > metric, and how to improve software tools from the "co-change" >> predictions? >> > >> > Thanks, >> > Zhe >> > >> > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese <igor.wi...@gmail.com> >> wrote: >> > >> >> Hi, Hadoop Community. >> >> >> >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week >> >> ago about my research. We received some visit to inspect the results >> >> but any feedback was provided. >> >> >> >> I am investigating two important questions: What makes two files >> >> change together? Can we predict when they are going to co-change >> >> again? >> >> >> >> I've tried to investigate this question on the Hadoop project. I've >> >> collected data from issue reports, discussions and commits and using >> >> some machine learning techniques to build a prediction model. >> >> >> >> >> >> I collected a total of 950 commits in which a pair of files changed >> >> together and could correctly predict 47% commits. These were the most >> >> useful information for predicting co-changes of files: >> >> >> >> - sum of number of lines of code added, modified and removed, >> >> >> >> - number of words used to describe and discuss the issues, >> >> >> >> - median value of closeness, a social network measure obtained from >> >> issue comments, >> >> >> >> - median value of constraint, a social network measure obtained from >> >> issue comments, and >> >> >> >> - median value of hierarchy, a social network measure obtained from >> >> issue comments. >> >> >> >> To illustrate, consider the following example from our analysis. For >> >> release 0.22, the files "/ipc/Client.java" and >> >> "security/SecurityUtil.java" changed together in 3 commits. In another >> >> 1 commit, only the first file changed, but not the second. Collecting >> >> contextual information for each commit made to first file in the >> >> previous release, we were able to predict 2 commits in which both >> >> files changed together in release 0.22, and we only issued 1 wrong >> >> prediction. For this pair of files, the most important contextual >> >> information were the social network metrics (density, hierarchy, >> >> efficiency) obtained from issue comments. >> >> >> >> >> >> - Do these results surprise you? Can you think in any explanation for >> >> the results? >> >> >> >> - Do you think that our rate of prediction is good enough to be used >> >> for building tool support for the software community? >> >> >> >> - Do you have any suggestion on what can be done to improve the change >> >> recommendation? >> >> >> >> You can visit our webpage to inspect the results in details: >> >> http://flosscoach.com/index.php/17-cochanges/70-hadoop >> >> >> >> All the best, >> >> Igor Wiese >> >> >> >> Phd Candidate >> >> >> >> -- >> >> ================================= >> >> Igor Scaliante Wiese >> >> PhD Candidate - Computer Science @ IME/USP >> >> Faculty in Dept. of Computing at Universidade Tecnológica Federal do >> Paraná >> >> >> >> >> >> -- >> ================================= >> Igor Scaliante Wiese >> PhD Candidate - Computer Science @ IME/USP >> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná >> -- ================================= Igor Scaliante Wiese PhD Candidate - Computer Science @ IME/USP Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná