Reynold Xin created SPARK-7981:
----------------------------------
Summary: Improve DataFrame Python exception
Key: SPARK-7981
URL: https://issues.apache.org/jira/browse/SPARK-7981
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
It would be great if most exceptions thrown are rethrown as Python exceptions,
rather than some crazy Py4j exception with a long stacktrace that is not Python
friendly.
As an example
{code}
In [61]: df.stat.cov('id', 'uniform')
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-61-30146c89cbd6> in <module>()
----> 1 df.stat.cov('id', 'uniform')
/scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
1289
1290 def cov(self, col1, col2):
-> 1291 return self.df.cov(col1, col2)
1292
1293 cov.__doc__ = DataFrame.cov.__doc__
/scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
1139 if not isinstance(col2, str):
1140 raise ValueError("col2 should be a string.")
-> 1141 return self._jdf.stat().cov(col1, col2)
1142
1143 @since(1.4)
/Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc
in __call__(self, *args)
535 answer = self.gateway_client.send_command(command)
536 return_value = get_return_value(answer, self.gateway_client,
--> 537 self.target_id, self.name)
538
539 for temp_arg in temp_args:
/Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc
in get_return_value(answer, gateway_client, target_id, name)
298 raise Py4JJavaError(
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(
Py4JJavaError: An error occurred while calling o87.cov.
: java.lang.IllegalArgumentException: requirement failed: Couldn't find column
with name id
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:79)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:78)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.sql.execution.stat.StatFunctions$.collectStatisticalData(StatFunctions.scala:78)
at
org.apache.spark.sql.execution.stat.StatFunctions$.calculateCov(StatFunctions.scala:100)
at
org.apache.spark.sql.DataFrameStatFunctions.cov(DataFrameStatFunctions.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]