Hello All,

I have run into a dilemma and would like to know what our policy is to deal 
with the following situation. 

As part of the implementation for Kudu Input operator 
(https://issues.apache.org/jira/browse/APEXMALHAR-2472 
<https://issues.apache.org/jira/browse/APEXMALHAR-2472>) , I will be using 
Antlr4 as the parser tool to parse a line of string as an SQL equivalent 
statement to represent the set of tuples that will be streamed out of Kudu 
store to the downstream operators. 

I will post the design once a few things are finalised in a separate mailing 
thread and this mail is more about Checkstyle and Auto generated code from 
tools like Antlr4. 

The design involves in writing a grammar file and let the maven tool generate 
the parser and related code as .java files as part of the build process. We 
only keep maintaining the grammar “.g4” file as part of the repository checkins 
as Kudu functionality evolves. However this brings me to the situation wherein 
the check style fails for the classes that are autogenerated. Following are the 
three options that I think we have and would like to get thoughts on what is 
the best way to go forward.

Option 1: We let the autogenerated code generate code in the 
"target/generated-sources” path. This is the default for the maven antler 
plugin. This however does not pass check style maven plugin as check style 
plugin does check styles for auto-generated code as well. The fix for this is 
to modify check style plugin to only look at “src/“ folder paths as opposed 
“compiled sources”. This works from a build perspective but the drawback is 
that IDEs will not include the “target/generated-sources” for class resolution. 
IDEs do have plugins to resolve this error code but might be considered irksome 
by the developer community. 

Option 2: We let Antlr4 code-gen to generate code in the Kudu package path and 
of course checkstlye would fail this as well. The fix is to let Checktyle 
include a “excludes” pattern and make check style ignore all java files that 
represent a pattern of files generated by the Antlr4 code-gen tool. There is 
still an issue that remains to be resolved even if this approach is approved by 
the community. The issue is the tool generates a couple of “.token” files that 
are always placed in the root class path and not under the package structure 
which will pollute the sanity a bit. I am still working on this bit as this 
needs to be resolved. 

Option 3: Perhaps the ideal is to let a separate module for kudu from the top 
level to resolve all of the issues ideally ( i.e. token files are generated in 
the kudu module root along with the java sources in the correct package 
structure ) and I guess that is a separate discussion that Thomas/Vlad and 
others are planning to take up as a separate thread in the mailing list. 

Could you please let me what you think is the ideal path to pursue ( or if 
there are other alternatives for the use case above ) 

Regards,
Ananth 

Reply via email to