comet created ZEPPELIN-5916:
-------------------------------

             Summary: getNumPartitions() error in Zeppelin
                 Key: ZEPPELIN-5916
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5916
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.11.0
            Reporter: comet


i'm using latest version of Zeppelin with spark 3.3.0

below error only happen in zeppelin, but when i ssh into the zeppelin 
interpreter and run pyspark command with below configuration, no issue 
executing getNumPartitions()
pyspark --conf "spark.hadoop.fs.s3a.access.key=<masked>" --conf 
"spark.hadoop.fs.s3a.secret.key=<masked>" --conf 
"spark.hadoop.fs.s3a.endpoint=https://<masked>" --conf 
"spark.hadoop.fs.s3a.path.style.access=true"

spark.sparkContext.textFile("s3a://a_bucket/models/random_forest_zepp/bestModel/metadata",
 1).getNumPartitions()

when i run above code, i get below error. Can advice how to troubleshoot? i’ 
using spark 3.3.0. the above file path exist.
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Cell In[16], line 1
----> 1 
spark.sparkContext.textFile("s3a://a)bucket/models/random_forest_zepp/bestModel/metadata",
 1).getNumPartitions()

File /spark/python/lib/pyspark.zip/pyspark/rdd.py:599, in 
RDD.getNumPartitions(self)
    589 def getNumPartitions(self) -> int:
    590     """
    591     Returns the number of partitions in RDD
    592
   (...)
    597     2
    598     """
--> 599     return self._jrdd.partitions().size()

File /spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in 
JavaMember.__call__(self, *args)
   1315 command = proto.CALL_COMMAND_NAME +\
   1316     self.command_header +\
   1317     args_command +\
   1318     proto.END_COMMAND_PART
   1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
   1322     answer, self.gateway_client, self.target_id, self.name)
   1324 for temp_arg in temp_args:
   1325     temp_arg._detach()

File /spark/python/lib/pyspark.zip/pyspark/sql/utils.py:190, in 
capture_sql_exception.<locals>.deco(*a, **kw)
    188 def deco(*a: Any, **kw: Any) -> Any:
    189     try:
--> 190         return f(*a, **kw)
    191     except Py4JJavaError as e:
    192         converted = convert_exception(e.java_exception)

File /spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in 
get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o114.partitions.
: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:49)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:370)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
        at 
org.apache.spark.api.java.JavaRDDLike.partitions(JavaRDDLike.scala:61)
        at 
org.apache.spark.api.java.JavaRDDLike.partitions$(JavaRDDLike.scala:61)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:45)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to