Hi, folks. Currently we have something like contribution guide parts in several places (I thought about [1] and [2] and Chris also mentioned [3]) covering different facets of contributing to Apache Tika.
One thing which make me upset is that we have very inconsistent codebase with different style, formatting, dependency management. It seems inevitable on some stage of any popular open source project developed by many contributors. But we can make it more consistent with moderate effort for maintaining status quo after. I propose: 1. make one source of truth about contribution guide and then automatically mirror it to README.md/CONTRIBUTING.md for github, publish on tika.a.o etc; 2. add info about logging in tika-core and other packages to these contribution guide to make all contributions consistent with current policy (with examples how logging should be used in different modules): 1. JUL in tika-core 2. SLF4J in `private static final Logger LOG` field in all other modules; 3. Allow to use logging backend (log4j) in tests (e.g. for tuning log levels for upstream libraries) and standalone application (e.g. to support `--quiet` and `--verbose` CLI keys); 4. Document logging configuration in case OSGi bundle is used; 3. add info about dependency handling (e.g. no additional deps in tika-core policy, exlusion of commons-logging/commons-logging-api/log4j from dependencies etc); 4. integrate checkstyle plugin [5], [6] to Maven build to allow contributors easily check that their code is conformant with simple policy to start (4 spaces indent, no TABs, spaces before opening braces, spaces after if/else/try/catch/finally, egyptian-style braces); 5. add documentation about checkstyle [5] configuration in IDE to simplify it's usage (I can write one for JetBrains IDEA at least). Main point are to bring Tika codebase to more consistent and clear state, simplify its maintainance and make it easier for contributors to make clean and pretty patches. Checkstyle configuration should be as simple as it can be to real to refactor. Also, these items should be integrated gradually, step by step. What do you think, folks? Would it be good thing for Tika and its community? Would it bring any serios challenges of which I've forgot? [1]: http://tika.apache.org/contribute.html [2]: https://wiki.apache.org/tika/DeveloperResources [3]: https://github.com/apache/tika/#contributing-via-github [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue [5]: http://checkstyle.sourceforge.net/ [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/ -- Best regards, Konstantin Gribov