[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723768#comment-14723768
 ] 

Matt Cheah commented on SPARK-10374:
------------------------------------

I intend to create a smaller standalone program that reproduces the issue and 
where I can paste the full dependency graph. The application where I saw the 
issue at first is pretty big and viewing the whole graph would be pretty much 
intractible.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-10374
>                 URL: https://issues.apache.org/jira/browse/SPARK-10374
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> <redacted other messages>… java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
>         at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
>         at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.4.1
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.4.1
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |                   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>      \--- org.apache.spark:spark-core_2.10:1.4.1
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.4.1
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>                \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>      \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>                \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |                   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to