Looks like calcite isn't easily repackageable. This issue can be fixed either in our shading (by also rewriting the "jdbc:calcite:" string when we shade calcite) or in calcite (by not using the driver manager to connect between calcite modules).
Andrew On Mon, Jul 23, 2018 at 11:18 PM Kai Jiang <jiang...@gmail.com> wrote: > Hi all, > > I met an issue when I ran Beam SQL on Spark. I want to check and see if > anyone has same issue with me. I believe let beam sql running on spark is > important. If you encountered same problem, it will be really helpful if > you could give some inputs. > > Context: > I setup TPC framework to run sql on spark. Code > <https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/src/main/java/org/apache/beam/sdk/extensions/tpc/BeamTpc.java> > is simple which just ingests csv data and apply Sql on that. Gradle > <https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/build.gradle> > setting > includes `runner-spark` and necessary libraries. Exception Stack trace > <https://gist.github.com/vectorijk/849cbcd5bce558e5e7c97916ca4c793a> shows > some details. However, same code can running on Flink and Dataflow > successfully. > > Investigations: > BEAM-3386 <https://issues.apache.org/jira/browse/BEAM-3386> also > describes the similar issue I have. It took me some time on investigating > it. I guess there should be a version conflict between Calcite library in > Spark and Beam SQL repackaged Calcite. The version of Calcite library Spark > ( * - 2.3.1) used is very old (1.2.0-incubating). > > After packaging fat jar and submitting it to Spark, Spark registered both > old version's calcite jdbc driver and Beam's repackaged jdbc driver in > > registeredDrivers(DriverManager.java#L294 > <https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L294>). > Jdbc's DriverManager always connects to old version calcite's jdbc in spark > instead of beam's repackaged calcite. > > > Looking into Line DriverManager.java#L556 > <https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556> > and insert a breakpoint, aClass = > Class.forName(driver.getClass().getName(), true, classLoader); > > driver.getClass().getName() -> "org.apache.calcite.jdbc.Driver" > classLoader only has class 'org.apache.beam.**' and > 'org.apache.beam.repackaged.beam_***'. (There is no path of class > 'org.apache.calcite.*') > > Oddly, aClass is assigned with Class "org.apache.calcite.jdbc.Driver". I > think it should raise an exception and be skipped. Actually, It did not. So > this spark's calcite jdbc driver has been connected. All logic afterwards > goes to spark's calcite classpath. I believe that's pivot point. > > Potentially solutions: > *1.* Figure out why DriverManager.java#L556 > <https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556> > does > not throw exception. > > I guess it is the best option. > > 2. Upgrade Spark' calcite. > > It is not a good option because old calcite version affects many spark > versions. > > 3. Not using repackage for calcite library. > > I tried. I built fat jar with non-repackaged calcite. But, Spark is still > using its own calcite. > > Plus, I am curious if there is any specific reason we need to use > repackage strategy for Calcite. @Mingmin Xu <mingm...@gmail.com> > > > Thanks for reading! > > Best, > Kai > ᐧ >