[ https://issues.apache.org/jira/browse/SPARK-28749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908751#comment-16908751 ]
Matt Foley edited comment on SPARK-28749 at 8/16/19 5:55 AM: ------------------------------------------------------------- Hi [~hyukjin.kwon], thanks for looking at the issue. I did try that, but it doesn't work for the following reason: In {{python/pyspark/streaming/tests.py}} * {{ENABLE_KAFKA_0_8_TESTS}} is used to derive boolean {{are_kafka_tests_enabled}} * The call to {{search_kafka_assembly_jar()}} is not guarded by the use of {{are_kafka_tests_enabled}}. * And the Failure exception is thrown from {{search_kafka_assembly_jar()}}. So to make {{ENABLE_KAFKA_0_8_TESTS}} to properly guard the call to {{search_kafka_assembly_jar()}} would be a similar bug fix. was (Author: mattf): Hi [~hyukjin.kwon], thanks for looking at the issue. I did try that, but it doesn't work for the following reason: In {{python/pyspark/streaming/tests.py}} * {{ENABLE_KAFKA_0_8_TESTS}} is used to derive boolean {{are_kafka_tests_enabled}} * The call to {{search_kafka_assembly_jar()}} is not guarded by the use of {{are_kafka_tests_enabled}}. * And the Failure exception is thrown from {search_kafka_assembly_jar()}. So to make {{ENABLE_KAFKA_0_8_TESTS}} to properly guard the call to {{search_kafka_assembly_jar()}} would be a similar bug fix. > Fix PySpark tests not to require kafka-0-8 in branch-2.4 > -------------------------------------------------------- > > Key: SPARK-28749 > URL: https://issues.apache.org/jira/browse/SPARK-28749 > Project: Spark > Issue Type: Bug > Components: PySpark, Tests > Affects Versions: 2.4.3 > Reporter: Matt Foley > Priority: Minor > > As noted in SPARK-27550 we want to encourage testing of Spark 2.4.x with > Scala-2.12, and kafka-0-8 does not support Scala-2.12. > Currently, the PySpark tests invoked by `python/run-tests` demand the > presence of kafka-0-8 libraries. If not present, this failure message will be > generated: > {code} > Traceback (most recent call last): > File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File "spark/python/pyspark/streaming/tests.py", line 1579, in <module> > kafka_assembly_jar = search_kafka_assembly_jar() > File "spark/python/pyspark/streaming/tests.py", line 1524, in > search_kafka_assembly_jar > "You need to build Spark with " > Exception: Failed to find Spark Streaming kafka assembly jar in > spark/external/kafka-0-8-assembly. You need to build Spark with 'build/sbt > -Pkafka-0-8 assembly/package streaming-kafka-0-8-assembly/assembly' or > 'build/mvn -DskipTests -Pkafka-0-8 package' before running this test. > Had test failures in pyspark.streaming.tests with > spark/py_virtenv/bin/python; see logs. > Process exited with code 255 > {code} > This change is only targeted at branch-2.4, as most kafka-0-8 related > materials have been removed in master and this problem no longer occurs there. > PROPOSED SOLUTION > The proposed solution is to make the kafka-0-8 stream testing optional for > pyspark testing, exactly the same as the Kinesis stream testing currently is, > in file `python/pyspark/streaming/tests.py`. This is only a few lines of > change. > Ideally it would be limited to when SPARK_SCALA_VERSION >= 2.12, but it turns > out to be somewhat onerous to reliably obtain that value from within the > python test env, and no other python test code currently does so. So my > proposed solution simply makes the use of the kafka-0-8 profile optional, and > leaves it to the tester to include it for Scala-2.11 test builds and exclude > it for Scala-2.12 test builds. > PR will be available in a day or so. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org