Re: Can you help us Hadoop Community?
For Sure Zhe. We are planning to study in the method level as well. Thanks for your suggestion All the best, Igor Wiese 2015-12-15 17:26 GMT-02:00 Zhe Zhang : >> >> it is difficult to find files to change in a >> specific issue. > > I guess this can be a useful reminder "you might also want to update file > Y". Maybe richer insights can be found on method level. > > --- > Zhe Zhang > > On Mon, Dec 14, 2015 at 7:07 PM, Igor Wiese wrote: > >> Hi Zhe! Thanks for your answer. >> >> In fact, we are predicting the "co-change" based on contextual >> information collected from issues, commits and developers >> communication. Considering the files that i described in the example >> ("/ipc/Client.java" and >> "security/SecurityUtil.java") I collected metrics in each issue and >> commit from Client.java to predict when Client.java is prone to change >> with SecurityUtil.java. >> >> We are thinking to build a webservice to help newcomers during their >> first contributions. Our research group interviewed some newcomers and >> they told us that it is difficult to find files to change in a >> specific issue. We can recommend files to be checked. >> >> From the committer perspective, we could help in code review tasks. >> >> What do you think? >> >> Our idea >> >> 2015-12-14 22:16 GMT-02:00 Zhe Zhang : >> > Hi Igor, >> > >> > It's an interesting direction to study tickets/commits in the Hadoop >> > community. >> > >> > A research group from Univ. Wisconsin did a similar study on Linux file >> > systems and I found it quite insightful: >> > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf >> > >> > For your results, could you elaborate why you picked "co-change" as the >> > metric, and how to improve software tools from the "co-change" >> predictions? >> > >> > Thanks, >> > Zhe >> > >> > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese >> wrote: >> > >> >> Hi, Hadoop Community. >> >> >> >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week >> >> ago about my research. We received some visit to inspect the results >> >> but any feedback was provided. >> >> >> >> I am investigating two important questions: What makes two files >> >> change together? Can we predict when they are going to co-change >> >> again? >> >> >> >> I've tried to investigate this question on the Hadoop project. I've >> >> collected data from issue reports, discussions and commits and using >> >> some machine learning techniques to build a prediction model. >> >> >> >> >> >> I collected a total of 950 commits in which a pair of files changed >> >> together and could correctly predict 47% commits. These were the most >> >> useful information for predicting co-changes of files: >> >> >> >> - sum of number of lines of code added, modified and removed, >> >> >> >> - number of words used to describe and discuss the issues, >> >> >> >> - median value of closeness, a social network measure obtained from >> >> issue comments, >> >> >> >> - median value of constraint, a social network measure obtained from >> >> issue comments, and >> >> >> >> - median value of hierarchy, a social network measure obtained from >> >> issue comments. >> >> >> >> To illustrate, consider the following example from our analysis. For >> >> release 0.22, the files "/ipc/Client.java" and >> >> "security/SecurityUtil.java" changed together in 3 commits. In another >> >> 1 commit, only the first file changed, but not the second. Collecting >> >> contextual information for each commit made to first file in the >> >> previous release, we were able to predict 2 commits in which both >> >> files changed together in release 0.22, and we only issued 1 wrong >> >> prediction. For this pair of files, the most important contextual >> >> information were the social network metrics (density, hierarchy, >> >> efficiency) obtained from issue comments. >> >> >> >> >> >> - Do these results surprise you? Can you think in any explanation for >> >> the results? >> >> >> >>
Re: Can you help us Hadoop Community?
Hi Zhe! Thanks for your answer. In fact, we are predicting the "co-change" based on contextual information collected from issues, commits and developers communication. Considering the files that i described in the example ("/ipc/Client.java" and "security/SecurityUtil.java") I collected metrics in each issue and commit from Client.java to predict when Client.java is prone to change with SecurityUtil.java. We are thinking to build a webservice to help newcomers during their first contributions. Our research group interviewed some newcomers and they told us that it is difficult to find files to change in a specific issue. We can recommend files to be checked. >From the committer perspective, we could help in code review tasks. What do you think? Our idea 2015-12-14 22:16 GMT-02:00 Zhe Zhang : > Hi Igor, > > It's an interesting direction to study tickets/commits in the Hadoop > community. > > A research group from Univ. Wisconsin did a similar study on Linux file > systems and I found it quite insightful: > http://research.cs.wisc.edu/wind/Publications/fsstudy-tos14.pdf > > For your results, could you elaborate why you picked "co-change" as the > metric, and how to improve software tools from the "co-change" predictions? > > Thanks, > Zhe > > On Mon, Dec 14, 2015 at 3:01 PM, Igor Wiese wrote: > >> Hi, Hadoop Community. >> >> My name is Igor Wiese, phd Student from Brazil. I sent an email a week >> ago about my research. We received some visit to inspect the results >> but any feedback was provided. >> >> I am investigating two important questions: What makes two files >> change together? Can we predict when they are going to co-change >> again? >> >> I've tried to investigate this question on the Hadoop project. I've >> collected data from issue reports, discussions and commits and using >> some machine learning techniques to build a prediction model. >> >> >> I collected a total of 950 commits in which a pair of files changed >> together and could correctly predict 47% commits. These were the most >> useful information for predicting co-changes of files: >> >> - sum of number of lines of code added, modified and removed, >> >> - number of words used to describe and discuss the issues, >> >> - median value of closeness, a social network measure obtained from >> issue comments, >> >> - median value of constraint, a social network measure obtained from >> issue comments, and >> >> - median value of hierarchy, a social network measure obtained from >> issue comments. >> >> To illustrate, consider the following example from our analysis. For >> release 0.22, the files "/ipc/Client.java" and >> "security/SecurityUtil.java" changed together in 3 commits. In another >> 1 commit, only the first file changed, but not the second. Collecting >> contextual information for each commit made to first file in the >> previous release, we were able to predict 2 commits in which both >> files changed together in release 0.22, and we only issued 1 wrong >> prediction. For this pair of files, the most important contextual >> information were the social network metrics (density, hierarchy, >> efficiency) obtained from issue comments. >> >> >> - Do these results surprise you? Can you think in any explanation for >> the results? >> >> - Do you think that our rate of prediction is good enough to be used >> for building tool support for the software community? >> >> - Do you have any suggestion on what can be done to improve the change >> recommendation? >> >> You can visit our webpage to inspect the results in details: >> http://flosscoach.com/index.php/17-cochanges/70-hadoop >> >> All the best, >> Igor Wiese >> >> Phd Candidate >> >> -- >> = >> Igor Scaliante Wiese >> PhD Candidate - Computer Science @ IME/USP >> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná >> -- = Igor Scaliante Wiese PhD Candidate - Computer Science @ IME/USP Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
Can you help us Hadoop Community?
Hi, Hadoop Community. My name is Igor Wiese, phd Student from Brazil. I sent an email a week ago about my research. We received some visit to inspect the results but any feedback was provided. I am investigating two important questions: What makes two files change together? Can we predict when they are going to co-change again? I've tried to investigate this question on the Hadoop project. I've collected data from issue reports, discussions and commits and using some machine learning techniques to build a prediction model. I collected a total of 950 commits in which a pair of files changed together and could correctly predict 47% commits. These were the most useful information for predicting co-changes of files: - sum of number of lines of code added, modified and removed, - number of words used to describe and discuss the issues, - median value of closeness, a social network measure obtained from issue comments, - median value of constraint, a social network measure obtained from issue comments, and - median value of hierarchy, a social network measure obtained from issue comments. To illustrate, consider the following example from our analysis. For release 0.22, the files "/ipc/Client.java" and "security/SecurityUtil.java" changed together in 3 commits. In another 1 commit, only the first file changed, but not the second. Collecting contextual information for each commit made to first file in the previous release, we were able to predict 2 commits in which both files changed together in release 0.22, and we only issued 1 wrong prediction. For this pair of files, the most important contextual information were the social network metrics (density, hierarchy, efficiency) obtained from issue comments. - Do these results surprise you? Can you think in any explanation for the results? - Do you think that our rate of prediction is good enough to be used for building tool support for the software community? - Do you have any suggestion on what can be done to improve the change recommendation? You can visit our webpage to inspect the results in details: http://flosscoach.com/index.php/17-cochanges/70-hadoop All the best, Igor Wiese Phd Candidate -- = Igor Scaliante Wiese PhD Candidate - Computer Science @ IME/USP Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná
Feedback of my Phd Work in Hadoop Project
Hi, Hadoop Community. My name is Igor Wiese, phd Student from Brazil. I am investigating two important questions: What makes two files change together? Can we predict when they are going to co-change again? I've tried to investigate this question on the Hadoop project. I've collected data from issue reports, discussions and commits and using some machine learning techniques to build a prediction model. I collected a total of 950 commits in which a pair of files changed together and could correctly predict 47% commits. These were the most useful information for predicting co-changes of files: - sum of number of lines of code added, modified and removed, - number of words used to describe and discuss the issues, - median value of closeness, a social network measure obtained from issue comments, - median value of constraint, a social network measure obtained from issue comments, and - median value of hierarchy, a social network measure obtained from issue comments. To illustrate, consider the following example from our analysis. For release 0.22, the files "/ipc/Client.java" and "security/SecurityUtil.java" changed together in 3 commits. In another 1 commit, only the first file changed, but not the second. Collecting contextual information for each commit made to first file in the previous release, we were able to predict 2 commits in which both files changed together in release 0.22, and we only issued 1 wrong prediction. For this pair of files, the most important contextual information were the social network metrics (density, hierarchy, efficiency) obtained from issue comments. - Do these results surprise you? Can you think in any explanation for the results? - Do you think that our rate of prediction is good enough to be used for building tool support for the software community? - Do you have any suggestion on what can be done to improve the change recommendation? You can visit our webpage to inspect the results in details: http://flosscoach.com/index.php/17-cochanges/70-hadoop All the best, Igor Wiese Phd Candidate