comet created ZEPPELIN-5916:
-------------------------------
Summary: getNumPartitions() error in Zeppelin
Key: ZEPPELIN-5916
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5916
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.11.0
Reporter: comet
i'm using latest version of Zeppelin with spark 3.3.0
below error only happen in zeppelin, but when i ssh into the zeppelin
interpreter and run pyspark command with below configuration, no issue
executing getNumPartitions()
pyspark --conf "spark.hadoop.fs.s3a.access.key=<masked>" --conf
"spark.hadoop.fs.s3a.secret.key=<masked>" --conf
"spark.hadoop.fs.s3a.endpoint=https://<masked>" --conf
"spark.hadoop.fs.s3a.path.style.access=true"
spark.sparkContext.textFile("s3a://a_bucket/models/random_forest_zepp/bestModel/metadata",
1).getNumPartitions()
when i run above code, i get below error. Can advice how to troubleshoot? i’
using spark 3.3.0. the above file path exist.
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
Cell In[16], line 1
----> 1
spark.sparkContext.textFile("s3a://a)bucket/models/random_forest_zepp/bestModel/metadata",
1).getNumPartitions()
File /spark/python/lib/pyspark.zip/pyspark/rdd.py:599, in
RDD.getNumPartitions(self)
589 def getNumPartitions(self) -> int:
590 """
591 Returns the number of partitions in RDD
592
(...)
597 2
598 """
--> 599 return self._jrdd.partitions().size()
File /spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in
JavaMember.__call__(self, *args)
1315 command = proto.CALL_COMMAND_NAME +\
1316 self.command_header +\
1317 args_command +\
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()
File /spark/python/lib/pyspark.zip/pyspark/sql/utils.py:190, in
capture_sql_exception.<locals>.deco(*a, **kw)
188 def deco(*a: Any, **kw: Any) -> Any:
189 try:
--> 190 return f(*a, **kw)
191 except Py4JJavaError as e:
192 converted = convert_exception(e.java_exception)
File /spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in
get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o114.partitions.
: java.lang.NullPointerException
at
org.apache.hadoop.mapred.TextInputFormat.isSplitable(TextInputFormat.java:49)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:370)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:208)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at
org.apache.spark.api.java.JavaRDDLike.partitions(JavaRDDLike.scala:61)
at
org.apache.spark.api.java.JavaRDDLike.partitions$(JavaRDDLike.scala:61)
at
org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)