Hi,

Thanks a lot for this, I think trimming down the dependencies of Calcite
will be of great help for its adoption.

> So, the easiest way to reduce dependencies would be to make certain
classes of SQL functions optional (i.e. move them out of core).

That sounds like a good idea.

> commons-lang3, commons-codec, commons-io are probably only used in one or
two places each;

To make some progress there, I've created PR
https://github.com/apache/calcite/pull/2672 which removes the dependency to
commons-lang3 from the entire code base. Any feedback on that PR would
be appreciated (I still need to log an issue, but wanted to share quickly
what I had). I can try and take a look at the other ones, if there's
interest in this.

Re Janino, is there any reason for not using the compiler implementation
coming with the JDK? Alternatively, one could also consider to generate
byte code directly using ASM, which wouldn't be beneficial dependency-wise,
but it may improve the performance of this generation step (I still lack
insight why this is done in the first place).

Thanks,

--Gunnar

Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
jhyde.apa...@gmail.com>:

> Regarding dependencies. Here are the runtime dependencies from
> core/build.gradle.kts (ignoring test and annotation libraries):
>
>  * api("com.esri.geometry:esri-geometry-api")
>  * api("com.fasterxml.jackson.core:jackson-annotations")
>  * api("com.google.guava:guava")
>  * api("org.apache.calcite.avatica:avatica-core")
>  * api("org.slf4j:slf4j-api")
>  * implementation("com.fasterxml.jackson.core:jackson-core")
>  * implementation("com.fasterxml.jackson.core:jackson-databind")
>  *
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>  * implementation("com.google.uzaygezen:uzaygezen-core")
>  * implementation("com.jayway.jsonpath:json-path")
>  * implementation("com.yahoo.datasketches:sketches-core")
>  * implementation("commons-codec:commons-codec")
>  * implementation("net.hydromatic:aggdesigner-algorithm")
>  * implementation("org.apache.commons:commons-dbcp2")
>  * implementation("org.apache.commons:commons-lang3")
>  * implementation("commons-io:commons-io")
>  * implementation("org.codehaus.janino:commons-compiler")
>  * implementation("org.codehaus.janino:janino")
>
> A few libraries are used only for a narrow range of functionality:
>  * esri-geometry and uzaygezen-core are used by geospatial functions;
>  * sketches-core is used by the HLL aggregate functions;
>  * json-path is used by some JSON functions;
>  * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
> load models, and to serialize RelNodes to and from JSON;
>  * commons-lang3, commons-codec, commons-io are probably only used in one
> or two places each;
>  * aggdesigner-algotihm is used for recommending materialized views.
>
> So, the easiest way to reduce dependencies would be to make certain
> classes of SQL functions optional (i.e. move them out of core).
>
> Julian
>
>
>
> > On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > WRT SBOM (Julian): My general experience is that most large orgs use
> > scanners now (either open or closed) and they will scan whether you have
> a
> > bill of materials or not. I wouldn't worry about adding something
> > additional.
> >
> > WRT too many dependencies (Gunnar): I completely agree with the general
> > feeling of too many (and with Guava, jackson less so). I think the core
> > challenge (no pun intended) is that calcite-core is really a lot of
> > different components. For example, I have frequently wished that parser,
> > planner and enumerable were separate modules. And if they were, I'd guess
> > that each would have a narrower dependency range. I've also wished many
> > times that runtime compilation was an optional addon as opposed to
> > required/coupled in the core...
> >
> > When I've thought about how to dissect in the past, I think the big
> > challenge would be tests, where things are sometimes mixed together.
> > Breaking change possibilities could be at least somewhat mitigated by
> > moving classes but not packages.
> >
> > On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> > <gunnar.morl...@googlemail.com.invalid> wrote:
> >
> >> Hi,
> >>
> >> In a way, Calcite's build configuration as well as the published POM
> could
> >> be considered as such an SBOM? In particular when looking at the latter
> >> through services like mvnrepository [1], you get quite a good view on
> the
> >> dependency versions, licenses, any potential CVEs, etc. I think this
> should
> >> satisfy most user needs around this? Or are you referring to the notion
> of
> >> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
> with
> >> all the Calcite component versions which people can then use with
> Maven's
> >> import scope (there should be something comparable for Gradle)? If so,
> that
> >> could be useful for users working with multiple Calcite components,
> though
> >> I think the usability improvement provided by such BOM POM wouldn't be
> >> huge.
> >>
> >> I wanted to bring up a related matter though. Coming to Calcite as a
> user
> >> just recently (loving the possibilities it provides!), I was surprised
> by
> >> the large number of dependencies of the project. It looks like 1.29
> >> improves that a little bit (no more kotlin-stdlib, no more transitive
> >> dependency to log4j 1.x), but the transitive hull of all dependencies of
> >> calcite-core still is quite big. I lack insight about what the different
> >> dependencies are used for; but as an application developer, Guava for
> >> instance is a dependency which I'd prefer to not get pushed onto the
> >> classpath transitively. Jackson is another heavy one; depending on how
> it's
> >> used, perhaps this could be pushed into some separate module which users
> >> could optionally  pull in? That'd help to avoid having it around when
> users
> >> work with other JSON libs themselves and don't require JSON support in
> >> Calcite.
> >>
> >> From a supply chain perspective, the less transitive dependencies a
> library
> >> like Calcite introduces to my project, the better IMHO. Less potential
> for
> >> version conflicts with my own (or other transitive) dependencies, and
> also
> >> less potential for introducing CVEs to the dependency graph, as e.g. in
> the
> >> case of the Guava version currently used by Calcite; I suppose it does
> not
> >> impact the usage in Calcite, but these things tend to be tricky to
> reason
> >> about, and typical CVE reporting tooling will now create a warning for a
> >> project using Calcite, no matter whether that specific issue actually
> is a
> >> problem or not.
> >>
> >> Best,
> >>
> >> --Gunnar
> >>
> >> [1]
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >> [2]
> >>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>
> >>
> >> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >> jhyde.apa...@gmail.com>:
> >>
> >>> In the wake of the log4j CVEs [1], people are asking how to improve the
> >>> security of open source projects, and one idea is to provide a SBOM
> >>> (Software Bill of Materials) [2] along with each release.
> >>>
> >>> I had not heard of SBOM until a couple of days ago. Is anyone on this
> >> list
> >>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
> >> Are
> >>> people aware of SBOM initiatives in other projects? What, in your
> >> opinion,
> >>> is the priority of this issue?
> >>>
> >>> Julian
> >>>
> >>> [1]
> >>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>
> >>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>
> >>
>
>

Reply via email to