[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10374.
-------------------------------
    Resolution: Not A Problem

Thanks all, everyone else had much more useful things to say there. It was, 
sort of, something to do with bringing in mismatched versions from Maven. I 
think this JIRA itself is a good bit of documentation for this issue.

I also tend to believe that support for Hadoop 1 and 2.0/2.1 is becoming 
difficult and has problems sometimes, like the problem fixed by a recent change 
to use reflection in accessing some Hadoop 1 APIs, which means 1.4 was slightly 
broken with 1.x. 2.0.0 gets even less attention. Until support for these 
formally goes away it may require footwork to get recent releases to fully work 
and build with 2.0.0. Anything more than small patches to keep them working may 
be not worth it.

So that's a long way of saying that, yes I think this doesn't end in a 
particular change but this serves as a good reminder about the Akka dependency 
issue.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-10374
>                 URL: https://issues.apache.org/jira/browse/SPARK-10374
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> <redacted other messages>… java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
>         at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
>         at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.4.1
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.4.1
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |                   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>      \--- org.apache.spark:spark-core_2.10:1.4.1
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.4.1
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>                \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>      \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>           +--- compile
>           +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>           |    \--- compile
>           \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>                \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |    \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> |         +--- compile
> |         \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |              +--- compile
> |              +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |              |    \--- compile
> |              \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |                   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>      \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to