TheNeuralBit commented on pull request #12202:
URL: https://github.com/apache/beam/pull/12202#issuecomment-656932229


   I looked into the test failure. I found that if I change the dependency 
configuration from `provided` to `compile` here it fixes the test:
   
https://github.com/apache/beam/blob/65297802aaaddda66b3fda4bafb15640f0fc3530/sdks/java/extensions/sql/build.gradle#L61
   
   From the stacktrace:
   ```
   java.util.ServiceConfigurationError: 
org.apache.beam.sdk.extensions.sql.meta.provider.TableProvider: Provider 
org.apache.beam.sdk.extensions.sql.meta.provider.parquet.ParquetTableProvider 
could not be instantiated
        at java.util.ServiceLoader.fail(ServiceLoader.java:232)
        at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
        at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at 
org.apache.beam.sdk.extensions.sql.impl.BeamCalciteSchemaFactory$AllProviders.getTableProvider(BeamCalciteSchemaFactory.java:86)
   ...
   Caused by: java.lang.NoClassDefFoundError: 
org/apache/beam/sdk/io/parquet/ParquetSchemaCapableIOProvider
        at 
org.apache.beam.sdk.extensions.sql.meta.provider.parquet.ParquetTableProvider.<init>(ParquetTableProvider.java:47)
   ```
   
   You can see the error is occurring when we try to instantiate a class from 
the parquet package at runtime, because the class can't be found. It looks like 
this may have been a problem before your PR, but it didn't come up because we 
just weren't exercising code that called the parquet package.
   
   TBH I don't have a great handle on the difference between these dependency 
configurations. My understanding of `compile` vs. `provided` is that `compile` 
will include the compiled java in the artifact, but `provided` assumes that it 
will be provided by some other jar on the classpath (useful SO answer: 
https://stackoverflow.com/questions/30731084/provided-dependency-in-gradle). So 
it seems what's happening is the parquet package is there when we compile, but 
nothing is adding it to the classpath when we run JdbcJarTest.
   
   I'm not sure why these IO dependencies are `provided` in the first place. I 
think maybe the intention is that way users can just include the IOs that they 
intend to use, but this seems problematic when BeamCalciteSchemaFactory is 
loading every TableProvider implementation: 
https://github.com/apache/beam/blob/65297802aaaddda66b3fda4bafb15640f0fc3530/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L86
   
   My suggestion would be to just make parquet a `compile`, dependency like 
we've already done for mongo. (cc @lukecwik and @kennknowles in case they think 
this is a bad idea).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to