Dear Andy and All, Thank you very much for the information that you have provided to me. You really helped me a lot. Below I list some of the most frequent issue types that I have found while I was examining the evolution of Jena.
Issue Name Issues over time Fixed Currently Open The members of an interface declaration or class should appear in a pre-defined order 52875 47530 5345 Sections of code should not be "commented out" 35465 31905 3560 Method names should comply with a naming convention 32809 29448 3361 Constant names should comply with a naming convention 18397 16572 1825 String literals should not be duplicated<https://sbforge.org/sonar/rules/show/squid:S1192?layout=false> 18106 16390 1716 Standard outputs should not be used directly to log anything<https://sbforge.org/sonar/rules/show/squid:S106?layout=false> 16174 14506 1668 Exception handlers should preserve the original exceptions<https://sbforge.org/sonar/rules/show/squid:S1166?layout=false> 10763 9747 1016 Source files should not have any duplicated blocks 9141 8078 1063 <https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>Methods should not be empty<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false> 8653 7796 857 switch case clauses should not have too many lines of code<https://sbforge.org/sonar/rules/show/squid:S1151?layout=false> 8280 7491 789 throws declarations should not be superfluous<https://sbforge.org/sonar/rules/show/squid:RedundantThrowsDeclarationCheck?layout=false> 7012 6138 874 Class variable fields should not have public accessibility<https://sbforge.org/sonar/rules/show/squid:ClassVariableVisibilityCheck?layout=false> 6593 5954 639 1. The issue that appears the most frequently is: The members of an interface declaration or class should appear in a pre-defined order. "According to the Java Code Conventions as defined by Oracle, the members of a class or interface declaration should appear in the following order in the source files: 1. Class and instance variables, 2.Constructors, and 3. Methods." 2. The second most frequent is: Sections of code should not be "commented out". "Programmers should not comment out code as it bloats programs and reduces readability. Unused code should be deleted and can be retrieved from source control history if required." 3. Method names should comply with a naming convention. "Shared naming conventions allow teams to collaborate efficiently. This rule checks that all method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$" 4. Constant names should comply with a naming convention. "Shared coding conventions allow teams to collaborate efficiently. This rule checks that all constant names match the default regular expression ^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$" 5. String literals should not be duplicated. "Duplicated string literals make the process of refactoring error-prone, since you must be sure to update all occurrences. On the other hand, constants can be referenced from many places, but only need to be updated in a single place." 6. Standard outputs should not be used directly to log anything. "When logging a message there are several important requirements which must be fulfilled: * The user must be able to easily retrieve the logs * The format of all logged message must be uniform to allow the user to easily read the log * Logged data must actually be recorded * Sensitive data must only be logged securely If a program directly writes to the standard outputs, there is absolutely no way to comply with those requirements. That's why defining and using a dedicated logger is highly recommended." 7. Exception handlers should preserve the original exceptions. "When handling a caught exception, the original exception's message and stack trace should be logged or passed forward." 8. Source files should not have any duplicated blocks. "An issue is created on a file as soon as there is at least one block of duplicated code on this file" 9. Methods should not be empty. "There are several reasons for a method not to have a method body: * It is an unintentional omission, and should be fixed to prevent an unexpected behavior in production. * It is not yet, or never will be, supported. In this case an UnsupportedOperationException should be thrown. * The method is an intentionally-blank override. In this case a nested comment should explain the reason for the blank override." 10. switch case clauses should not have too many lines of code. "The switch statement should be used only to clearly define some new branches in the control flow. As soon as a case clause contains too many statements this highly decreases the readability of the overall control flow statement. In such case, the content of the case clause should be extracted into a dedicated method." 11. "throws" declarations should not be superfluous. "An exception in a throws declaration in Java is superfluous if it is: * listed multiple times * a subclass of another listed exception * a RuntimeException, or one of its descendants * completely unnecessary because the declared exception type cannot actually be thrown" 12. Class variable fields should not have public accessibility. "Public class variable fields do not respect the encapsulation principle and has three main disadvantages: * Additional behavior such as validation cannot be added. * The internal representation is exposed, and cannot be changed afterwards. * Member values are subject to change from anywhere in the code and may not meet the programmer's assumptions. By using private attributes and accessor methods (set and get), unauthorized modifications are prevented." I would like to ask you: * If you agree/disagree (and why) * Are you aware of those? * Why do you think that they introduced? * Do you think that all those introduced because Jena does not enforce a "squash-before-merging" policy? * Are you planing to fix some issues of the above categories? * All the currently open issues are listed on the sheet: Jena: Open Issues - October 7, 2017 of this<https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing> spreadsheet Thank you in advance! With kind regards, George Digkas ________________________________ From: Andy Seaborne <a...@apache.org> Sent: Saturday, November 18, 2017 6:43 PM To: Γεώργιος Δίγκας; dev@jena.apache.org Subject: Re: Issues fixed in Apache Jena Jena was registered at SourceForge 2001-11-20 I found this in ASF SVN: [[ Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der ]] on copyright.txt. so it looks like we have full history somewhere in the Apache infrastructure. CVS:SF->SVN:SF->SVN:ASF ASF git does not include the pre-git history. Andy On 16/11/17 11:55, Andy Seaborne wrote: > Do not take git as complete! > > Jena started in 2000. > https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html > > Jena 2.0 was released 2003-08-28. > A whole 40M including dependencies! A 14.7M zip file! > https://sourceforge.net/projects/jena/files/ > > The whole of SF SVN history was imported by the Apache infrastructure > team (a herculean effort) into Apache SVN. I don't know how to get to it > from git, it may not be there and only in SVN. > > The earliest git root commit is for the move to Apache from SF > [4298106f1e], 6 years ago. (There are 4 root commits due to merges) > > --- > > It's an interesting start and to make the analysis usefully inform the > reader as to the state of the project I suggest treating different kinds > of issues different, not uniformly important. > > There are many (, many) minor things and they outweigh the major > problems. Calling them all "issues" gives them equal weight. Some are > about canonicalization of the code. > > Yet reformatting the whole code base (if practical, which it arguable) > then greatly decreases the usefulness of git history. That would be a > huge loss. > > (NB the "issue" word has a specific meaning for JIRA which a lot of > Apache projects use. Jena's current total, now, is 1424.) > > Andy > >> >> Thank you in advance! >> >> >> With kind regards, >> >> George Digkas