[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723768#comment-14723768
 ] 

Matt Cheah commented on SPARK-10374:


I intend to create a smaller standalone program that reproduces the issue and 
where I can paste the full dependency graph. The application where I saw the 
issue at first is pretty big and viewing the whole graph would be pretty much 
intractible.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>Priority: Blocker
> Fix For: 1.5.0
>
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This 

[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723792#comment-14723792
 ] 

Patrick Wendell commented on SPARK-10374:
-

Hey Matt,

I think the only thing that could have influenced you is that we changed our 
default advertised akka dependency. We used to advertise an older version of 
akka that shaded protobuf. What happens if you manually coerce that version of 
akka in your application?

Spark itself doesn't directly use protobuf. But some of our dependencies do, 
including both akka and Hadoop. My guess is that you are now in a situation 
where you can't reconcile the akka and hadoop protobuf versions and make them 
both happy. This would be consistent with the changes we made in 1.5 in 
SPARK-7042.

The fix would be to exclude all com.typsafe.akka artifacts from Spark and 
manually add org.spark-project.akka to your build.

However, since you didn't post a full stack trace, I can't know for sure 
whether it is akka that complains when you try to fix the protobuf version at 
2.4.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- 

[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723797#comment-14723797
 ] 

Marcelo Vanzin commented on SPARK-10374:


This is actually caused by the akka version change. In 1.4, Spark depends on a 
custom build of akka ({{2.3.4-spark}}) that has a shaded protobuf dependency. 
1.5 depends on {{2.3.11}} which depends on the unshaded protobuf 2.5.0.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723802#comment-14723802
 ] 

Marcelo Vanzin commented on SPARK-10374:


BTW since akka depends on protobuf, one cannot simply override the dependency, 
since then akka might break. Does anyone know what's the extent of akka's use 
of protobuf?

This does sounds pretty bad (it may make Spark's hadoop-1 builds unusable, at 
least in certain situations).

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by 

[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723817#comment-14723817
 ] 

Marcelo Vanzin commented on SPARK-10374:


nevermind, Patrick pointed out that the hadoop-1 binaries actually build with a 
different version of akka, so this is restricted to the published maven 
artifacts. Adding a dependency exclusion for {{protobuf-java}} in the 
spark-core dependency should fix this.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723826#comment-14723826
 ] 

Matt Cheah commented on SPARK-10374:


I'll try switching the Akka version pulled in by Spark and see how it goes. 
Thanks!

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-10374) Spark-core 1.5.0-RC2 can create version conflicts with apps depending on protobuf-2.4

2015-08-31 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724568#comment-14724568
 ] 

Matt Cheah commented on SPARK-10374:


Ok it works when we switch the Akka version back for our CDH4 build. Thanks a 
lot! I suppose we can close this as "Not a bug" but it would be nice to make 
this visible to devs that might hit similar issues.

> Spark-core 1.5.0-RC2 can create version conflicts with apps depending on 
> protobuf-2.4
> -
>
> Key: SPARK-10374
> URL: https://issues.apache.org/jira/browse/SPARK-10374
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Matt Cheah
>
> My Hadoop cluster is running 2.0.0-CDH4.7.0, and I have an application that 
> depends on the Spark 1.5.0 libraries via Gradle, and Hadoop 2.0.0 libraries. 
> When I run the driver application, I can hit the following error:
> {code}
> … java.lang.UnsupportedOperationException: This is 
> supposed to be overridden by subclasses.
> at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto.getSerializedSize(ClientNamenodeProtocolProtos.java:30108)
> at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.constructRpcRequest(ProtobufRpcEngine.java:149)
> {code}
> This application used to work when pulling in Spark 1.4.1 dependencies, and 
> thus this is a regression.
> I used Gradle’s dependencyInsight task to dig a bit deeper. Against our Spark 
> 1.4.1-backed project, it shows that dependency resolution pulls in Protobuf 
> 2.4.0a from the Hadoop CDH4 modules and Protobuf 2.5.0-spark from the Spark 
> modules. It appears that Spark used to shade its protobuf dependencies and 
> hence Spark’s and Hadoop’s protobuf dependencies wouldn’t collide. However 
> when I ran dependencyInsight again against Spark 1.5 and it looks like 
> protobuf is no longer shaded from the Spark module.
> 1.4.1 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.4.0a
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.4.1
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.4.1
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.4.1
> |   \--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> org.spark-project.protobuf:protobuf-java:2.5.0-spark
> \--- org.spark-project.akka:akka-remote_2.10:2.3.4-spark
>  \--- org.apache.spark:spark-core_2.10:1.4.1
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.4.1
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.4.1
>\--- org.apache.spark:spark-sql_2.10:1.4.1 (*)
> {code}
> 1.5.0-rc2 dependencyInsight:
> {code}
> com.google.protobuf:protobuf-java:2.5.0 (conflict resolution)
> \--- com.typesafe.akka:akka-remote_2.10:2.3.11
>  \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
>   +--- compile
>   +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
>   |\--- compile
>   \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
>\--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> com.google.protobuf:protobuf-java:2.4.0a -> 2.5.0
> +--- org.apache.hadoop:hadoop-common:2.0.0-cdh4.6.0
> |\--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0
> | +--- compile
> | \--- org.apache.spark:spark-core_2.10:1.5.0-rc2
> |  +--- compile
> |  +--- org.apache.spark:spark-sql_2.10:1.5.0-rc2
> |  |\--- compile
> |  \--- org.apache.spark:spark-catalyst_2.10:1.5.0-rc2
> |   \--- org.apache.spark:spark-sql_2.10:1.5.0-rc2 (*)
> \--- org.apache.hadoop:hadoop-hdfs:2.0.0-cdh4.6.0
>  \--- org.apache.hadoop:hadoop-client:2.0.0-mr1-cdh4.6.0 (*)
> {code}
> Clearly we can't force the version to be one way or the other. If I force 
> protobuf to use 2.5.0, then invoking Hadoop code from my application will 
> break as Hadoop 2.0.0 jars are compiled against protobuf-2.4. On the other 
> hand, forcing protobuf to use version 2.4 breaks spark-core code that is 
> compiled against protobuf-2.5. Note that protobuf-2.4 and protobuf-2.5 are 
> not binary compatible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)