Hi, folks.

Currently we have something like contribution guide parts in several places
(I thought about [1] and [2] and Chris also mentioned [3]) covering
different facets of contributing to Apache Tika.

One thing which make me upset is that we have very inconsistent codebase
with different style, formatting, dependency management. It seems
inevitable on some stage of any popular open source project developed by
many contributors. But we can make it more consistent with moderate effort
for maintaining status quo after.

I propose:

   1. make one source of truth about contribution guide and then
   automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
   tika.a.o etc;
   2. add info about logging in tika-core and other packages to these
   contribution guide to make all contributions consistent with current policy
   (with examples how logging should be used in different modules):
      1. JUL in tika-core
      2. SLF4J in `private static final Logger LOG` field in all other
      modules;
      3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
      levels for upstream libraries) and standalone application (e.g.
to support
      `--quiet` and `--verbose` CLI keys);
      4. Document logging configuration in case OSGi bundle is used;
   3. add info about dependency handling (e.g. no additional deps in
   tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
   from dependencies etc);
   4. integrate checkstyle plugin [5], [6] to Maven build to allow
   contributors easily check that their code is conformant with simple policy
   to start (4 spaces indent, no TABs, spaces before opening braces, spaces
   after if/else/try/catch/finally, egyptian-style braces);
   5. add documentation about checkstyle [5] configuration in IDE to
   simplify it's usage (I can write one for JetBrains IDEA at least).

Main point are to bring Tika codebase to more consistent and clear state,
simplify its maintainance and make it easier for contributors to make clean
and pretty patches. Checkstyle configuration should be as simple as it can
be to real to refactor.

Also, these items should be integrated gradually, step by step.

What do you think, folks?
Would it be good thing for Tika and its community?
Would it bring any serios challenges of which I've forgot?

[1]: http://tika.apache.org/contribute.html
[2]: https://wiki.apache.org/tika/DeveloperResources
[3]: https://github.com/apache/tika/#contributing-via-github
[4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
[5]: http://checkstyle.sourceforge.net/
[6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/



-- 

Best regards,
Konstantin Gribov

Reply via email to