Sean Owen created SPARK-28903:
---------------------------------

             Summary: Fix AWS JDK version conflict that breaks Pyspark Kinesis 
tests
                 Key: SPARK-28903
                 URL: https://issues.apache.org/jira/browse/SPARK-28903
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.3, 3.0.0
            Reporter: Sean Owen
            Assignee: Sean Owen


The Pyspark Kinesis tests are failing, at least in master:
{code}
======================================================================
ERROR: test_kinesis_stream 
(pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py",
 line 44, in test_kinesis_stream
    kinesisTestUtils = 
self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling 
None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: 
com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
        at 
org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
        at 
org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
        at scala.collection.Iterator.find(Iterator.scala:993)
        at scala.collection.Iterator.find$(Iterator.scala:990)
        at scala.collection.AbstractIterator.find(Iterator.scala:1429)
        at scala.collection.IterableLike.find(IterableLike.scala:81)
        at scala.collection.IterableLike.find$(IterableLike.scala:80)
        at scala.collection.AbstractIterable.find(Iterable.scala:56)
        at 
org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
        at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
{code}

The non-Python Kinesis tests are fine though. It turns out that this is because 
Pyspark tests use the output of the Spark assembly, and it pulls in 
hadoop-cloud, which in turn pulls in an old AWS Java SDK.

Per [~ste...@apache.org], it seems like we can just resolve this by excluding 
the aws-java-sdk dependency. See the attached PR for some more detail about the 
debugging and other options.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to