Cheng Lian created SPARK-9974:
---------------------------------

             Summary: SBT build: com.twitter:parquet-hadoop-bundle:1.6.0 is not 
packaged into the assembly jar
                 Key: SPARK-9974
                 URL: https://issues.apache.org/jira/browse/SPARK-9974
             Project: Spark
          Issue Type: Bug
          Components: Build, SQL
    Affects Versions: 1.5.0
            Reporter: Cheng Lian
            Priority: Blocker


Git commit: 
[69930310115501f0de094fe6f5c6c60dade342bd|https://github.com/apache/spark/commit/69930310115501f0de094fe6f5c6c60dade342bd]

Build with SBT and check the assembly jar for 
{{parquet.hadoop.api.WriteSupport}}:
{noformat}
$ ./build/sbt -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 
clean assembly/assembly
...
$ jar tf 
assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar | 
fgrep "parquet/hadoop/api"
org/apache/parquet/hadoop/api/
org/apache/parquet/hadoop/api/DelegatingReadSupport.class
org/apache/parquet/hadoop/api/DelegatingWriteSupport.class
org/apache/parquet/hadoop/api/InitContext.class
org/apache/parquet/hadoop/api/ReadSupport$ReadContext.class
org/apache/parquet/hadoop/api/ReadSupport.class
org/apache/parquet/hadoop/api/WriteSupport$FinalizedWriteContext.class
org/apache/parquet/hadoop/api/WriteSupport$WriteContext.class
org/apache/parquet/hadoop/api/WriteSupport.class
{noformat}
Only classes of {{org.apache.parquet:parquet-mr:1.7.0}}. Note that classes in 
{{com.twitter:parquet-hadoop-bundle:1.6.0}} are not under the {{org.apache}} 
namespace.

Build with Maven and check the assembly jar for 
{{parquet.hadoop.api.WriteSupport}}:
{noformat}
$ ./build/mvn -Phive -Phive-thriftserver -Phadoop-1 -Dhadoop.version=1.2.1 
-DskipTests clean package
$ jar tf 
assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar | 
fgrep "parquet/hadoop/api"
org/apache/parquet/hadoop/api/
org/apache/parquet/hadoop/api/DelegatingReadSupport.class
org/apache/parquet/hadoop/api/DelegatingWriteSupport.class
org/apache/parquet/hadoop/api/InitContext.class
org/apache/parquet/hadoop/api/ReadSupport$ReadContext.class
org/apache/parquet/hadoop/api/ReadSupport.class
org/apache/parquet/hadoop/api/WriteSupport$FinalizedWriteContext.class
org/apache/parquet/hadoop/api/WriteSupport$WriteContext.class
org/apache/parquet/hadoop/api/WriteSupport.class
parquet/hadoop/api/
parquet/hadoop/api/DelegatingReadSupport.class
parquet/hadoop/api/DelegatingWriteSupport.class
parquet/hadoop/api/InitContext.class
parquet/hadoop/api/ReadSupport$ReadContext.class
parquet/hadoop/api/ReadSupport.class
parquet/hadoop/api/WriteSupport$FinalizedWriteContext.class
parquet/hadoop/api/WriteSupport$WriteContext.class
parquet/hadoop/api/WriteSupport.class
{noformat}
Expected classes are packaged properly.

One of the consequence of this issue is that Parquet tables created in Hive are 
not accessible from Spark SQL built with SBT. To reproduce this issue, first 
create a Parquet table with Hive (say 0.13.1):
{noformat}
hive> CREATE TABLE parquet_test STORED AS PARQUET AS SELECT 1;
{noformat}
Build Spark assembly jar with the SBT command above, start {{spark-shell}}:
{noformat}
scala> sqlContext.table("parquet_test").show()
15/08/14 17:52:50 INFO HiveMetaStore: 0: get_table : db=default tbl=parquet_test
15/08/14 17:52:50 INFO audit: ugi=lian  ip=unknown-ip-addr      cmd=get_table : 
db=default tbl=parquet_test
java.lang.NoClassDefFoundError: parquet/hadoop/api/WriteSupport
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at 
org.apache.hadoop.hive.ql.metadata.Table.getOutputFormatClass(Table.java:328)
        at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:320)
        at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
        at scala.Option.map(Option.scala:145)
        at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:303)
        at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:298)
        at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
        at 
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
        at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
        at 
org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:298)
        at 
org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:123)
        at 
org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60)
        at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:397)
        at 
org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:403)
        at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:170)
        at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:170)
        at scala.Option.getOrElse(Option.scala:120)
        at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:170)
        at 
org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:403)
        at org.apache.spark.sql.SQLContext.table(SQLContext.scala:806)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $iwC$$iwC$$iwC.<init>(<console>:33)
        at $iwC$$iwC.<init>(<console>:35)
        at $iwC.<init>(<console>:37)
        at <init>(<console>:39)
        at .<init>(<console>:43)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
        at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: parquet.hadoop.api.WriteSupport
        at 
scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:169)
        at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:153)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 68 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to