Hi there. I am calling custom Scala code from pyspark (interpreter). The customer Scala code is simple: it just reads a textFile using sparkContext.textFile and returns RDD[String].
In pyspark, I am using sc._jvm to make the call to the Scala code: *s_rdd = sc._jvm.package_name.class_name.method().* It returns a py4j.JavaObject. Now I want to use this in pyspark and doing the following wrapping: *py_rdd = RDD(s_dd, sparkSession)* No error yet. But when I make a call to any RDD methods using py_rdd (e.g. py_rdd.count()), I get the following error: py4j.protocol.Py4JError: An error occurred while calling o50.rdd. Trace: py4j.Py4JException: Method rdd([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) Why is that? What I am doing wrong? Using: Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121) Spark 2.0.2 Hadoop 2.7.3-amzn-0 Thanks & Regards, Shahab