Hi Andrew, I tried on replacing "jdbc:calcite" to "jdbc:beam" in calcite and re-shadow. After that, Beam Sql can run on Spark now. However, I didn't find an approach to modify code during shading Calcite library. I think second method you mentioned is feasible. I'll forward this thread to dev@calcite and to see if we can connect between calcite modules without using the DriverManager.
Best, Kai ᐧ On Tue, Jul 24, 2018 at 1:04 PM Kai Jiang <jiang...@gmail.com> wrote: > Thank you Andrew! I will take a look at if it is feasible to rewrite > "jdbc:calcite:" in Beam's repackaged calcite. > > Best, > Kai > > On 2018/07/24 19:08:17, Andrew Pilloud <apill...@google.com> wrote: > > I don't really think this is something that involves changes to > > DriverManager. Beam is causing the problem by relocating calcite's path > but > > not also modifying the global state it creates. > > > > Andrew > > > > On Tue, Jul 24, 2018 at 12:03 PM Kai Jiang <jiang...@gmail.com> wrote: > > > > > Thanks Andrew! It's really helpful. I'll take a try on shade calcite > with > > > rewriting the "jdbc:calcite". > > > I also have a look at the doc of DriverManager. Do you think include > all > > > repackaged jdbc driver property setting like below will be helpful? > > > jdbc.drivers=org.apache.beam.repackaged.beam. > > > > > > Best, > > > Kai > > > > > > On 2018/07/24 16:56:50, Andrew Pilloud <apill...@google.com> wrote: > > > > Looks like calcite isn't easily repackageable. This issue can be > fixed > > > > either in our shading (by also rewriting the "jdbc:calcite:" string > when > > > we > > > > shade calcite) or in calcite (by not using the driver manager to > connect > > > > between calcite modules). > > > > > > > > Andrew > > > > > > > > On Mon, Jul 23, 2018 at 11:18 PM Kai Jiang <jiang...@gmail.com> > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I met an issue when I ran Beam SQL on Spark. I want to check and > see if > > > > > anyone has same issue with me. I believe let beam sql running on > spark > > > is > > > > > important. If you encountered same problem, it will be really > helpful > > > if > > > > > you could give some inputs. > > > > > > > > > > Context: > > > > > I setup TPC framework to run sql on spark. Code > > > > > < > > > > https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/src/main/java/org/apache/beam/sdk/extensions/tpc/BeamTpc.java > > > > > > > > > is simple which just ingests csv data and apply Sql on that. Gradle > > > > > < > > > > https://github.com/vectorijk/beam/blob/tpch/sdks/java/extensions/tpc/build.gradle > > > > > setting > > > > > includes `runner-spark` and necessary libraries. Exception Stack > trace > > > > > < > https://gist.github.com/vectorijk/849cbcd5bce558e5e7c97916ca4c793a> > > > shows > > > > > some details. However, same code can running on Flink and Dataflow > > > > > successfully. > > > > > > > > > > Investigations: > > > > > BEAM-3386 <https://issues.apache.org/jira/browse/BEAM-3386> also > > > > > describes the similar issue I have. It took me some time on > > > investigating > > > > > it. I guess there should be a version conflict between Calcite > library > > > in > > > > > Spark and Beam SQL repackaged Calcite. The version of Calcite > library > > > Spark > > > > > ( * - 2.3.1) used is very old (1.2.0-incubating). > > > > > > > > > > After packaging fat jar and submitting it to Spark, Spark > registered > > > both > > > > > old version's calcite jdbc driver and Beam's repackaged jdbc > driver in > > > > > > > > > > registeredDrivers(DriverManager.java#L294 < > > > > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L294 > >). > > > Jdbc's DriverManager always connects to old version calcite's jdbc in > spark > > > instead of beam's repackaged calcite. > > > > > > > > > > > > > > > Looking into Line DriverManager.java#L556 < > > > > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556 > > > > > and insert a breakpoint, aClass = > > > Class.forName(driver.getClass().getName(), true, classLoader); > > > > > > > > > > driver.getClass().getName() -> "org.apache.calcite.jdbc.Driver" > > > > > classLoader only has class 'org.apache.beam.**' and > > > > > 'org.apache.beam.repackaged.beam_***'. (There is no path of class > > > > > 'org.apache.calcite.*') > > > > > > > > > > Oddly, aClass is assigned with Class > "org.apache.calcite.jdbc.Driver". > > > I > > > > > think it should raise an exception and be skipped. Actually, It did > > > not. So > > > > > this spark's calcite jdbc driver has been connected. All logic > > > afterwards > > > > > goes to spark's calcite classpath. I believe that's pivot point. > > > > > > > > > > Potentially solutions: > > > > > *1.* Figure out why DriverManager.java#L556 > > > > > < > > > > https://github.com/JetBrains/jdk8u_jdk/blob/master/src/share/classes/java/sql/DriverManager.java#L556 > > > > > does > > > > > not throw exception. > > > > > > > > > > I guess it is the best option. > > > > > > > > > > 2. Upgrade Spark' calcite. > > > > > > > > > > It is not a good option because old calcite version affects many > spark > > > > > versions. > > > > > > > > > > 3. Not using repackage for calcite library. > > > > > > > > > > I tried. I built fat jar with non-repackaged calcite. But, Spark is > > > still > > > > > using its own calcite. > > > > > > > > > > Plus, I am curious if there is any specific reason we need to use > > > > > repackage strategy for Calcite. @Mingmin Xu <mingm...@gmail.com> > > > > > > > > > > > > > > > Thanks for reading! > > > > > > > > > > Best, > > > > > Kai > > > > > ᐧ > > > > > > > > > > > > > > >