Hello All,

Apologies if it is a repost as the earlier email did not seem to go through.

I have run into a dilemma and would like to know what our policy is to deal
with the following situation.

As part of the implementation for Kudu Input operator (
https://issues.apache.org/jira/browse/APEXMALHAR-2472) , I will be using
Antlr4 as the parser tool to parse a line of string as an SQL equivalent
statement to represent the set of tuples that will be streamed out of Kudu
store to the downstream operators.

I will post the design once a few things are finalised in a separate
mailing thread and this mail is more about Checkstyle and Auto generated
code from tools like Antlr4.

The design involves in writing a grammar file and let the maven tool
generate the parser and related code as .java files as part of the build
process. We only keep maintaining the grammar “.g4” file as part of the
repository checkins as Kudu functionality evolves. However this brings me
to the situation wherein the check style fails for the classes that are
autogenerated. Following are the three options that I think we have and
would like to get thoughts on what is the best way to go forward.

*Option 1:* We let the autogenerated code generate code in the
"target/generated-sources” path. This is the default for the maven antler
plugin. This however does not pass check style maven plugin as check style
plugin does check styles for auto-generated code as well. The fix for this
is to modify check style plugin to only look at “src/“ folder paths as
opposed “compiled sources”. This works from a build perspective but the
drawback is that IDEs will not include the “target/generated-sources” for
class resolution. IDEs do have plugins to resolve this error code but might
be considered irksome by the developer community.

*Option 2:* We let Antlr4 code-gen to generate code in the Kudu package
path and of course checkstlye would fail this as well. The fix is to let
Checktyle include a “excludes” pattern and make check style ignore all java
files that represent a pattern of files generated by the Antlr4 code-gen
tool. There is still an issue that remains to be resolved even if this
approach is approved by the community. The issue is the tool generates a
couple of “.token” files that are always placed in the root class path and
not under the package structure which will pollute the sanity a bit. I am
still working on this bit as this needs to be resolved.

*Option 3:* Perhaps the ideal is to let a separate module for kudu from the
top level to resolve all of the issues ideally ( i.e. token files are
generated in the kudu module root along with the java sources in the
correct package structure ) and I guess that is a separate discussion that
Thomas/Vlad and others are planning to take up as a separate thread in the
mailing list.

Could you please let me what you think is the ideal path to pursue ( or if
there are other alternatives for the use case above )

Regards,
Ananth

Reply via email to