[
https://issues.apache.org/jira/browse/DATAFU-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619286#comment-17619286
]
Eyal Allweil commented on DATAFU-167:
-------------------------------------
Most of the projects I've seen release multiple versions for different
Spark/Scala combinations. So far we haven't needed to for DataFu, but I don't
know if it will be possible to continue in this manner.
Did you see that your code breaks on different minor Spark 2.x versions? (I
know we have a problem with our extension of {_}collectLimitedList{_}, which
has changed over these versions, and in matching _spark-testing-base_ versions
to ours in our build script)
I will try to update the build script so we can easily test all our supported
combinations.
> Fix Scala Python Bridge support in Spark 2 minor version updates
> -----------------------------------------------------------------
>
> Key: DATAFU-167
> URL: https://issues.apache.org/jira/browse/DATAFU-167
> Project: DataFu
> Issue Type: Bug
> Affects Versions: 1.6.1
> Environment: _emphasized text_
> Reporter: Eyal Allweil
> Priority: Major
> Labels: up-for-grabs
> Fix For: 1.7.0
>
>
> The Scala Python Bridge, which works for versions 2.2.2, 2.3.2 and 2.4.3,
> doesn't work for versions 2.2.3, 2.3.3, and 2.4.4 (and up).
> This can be reproduced in testing by running the tests with the command
> {code:bash}
> ./gradlew :datafu-spark:test -PscalaVersion=2.11 -PsparkVersion=2.2.3
> -PscalaCompatVersion=2.11
> {code}
> The error message is
> {noformat}
> AttributeError: 'GatewayParameters' object has no attribute 'auth_token'
> at
> org.apache.spark.datafu.deploy.SparkPythonRunner.execFile(SparkPythonRunner.scala:137)
> {noformat}
>
> Currently our code runs using the PYSPARK_ALLOW_INSECURE_GATEWAY parameter;
> it's possible/probable that using the auth_token parameter will both fix this
> problem and be better in general.
>
> Please note that in order to test Spark 2.4.4, you need to upgrade the
> scalatest version used.
>
> A description of using an auth token with py4j can be found here:
> [https://www.py4j.org/advanced_topics.html#authentication]
>
> It's likely that (at least some of the) changes will need to be made here:
> [https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/spark/utils/overwrites/SparkPythonRunner.scala#L63]
> [https://github.com/apache/datafu/blob/master/datafu-spark/src/main/resources/pyspark_utils/bridge_utils.py#L43]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)