Reynold Xin created SPARK-7981:
----------------------------------

             Summary: Improve DataFrame Python exception
                 Key: SPARK-7981
                 URL: https://issues.apache.org/jira/browse/SPARK-7981
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Reynold Xin


It would be great if most exceptions thrown are rethrown as Python exceptions, 
rather than some crazy Py4j exception with a long stacktrace that is not Python 
friendly.

As an example
{code}
In [61]: df.stat.cov('id', 'uniform')
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-61-30146c89cbd6> in <module>()
----> 1 df.stat.cov('id', 'uniform')

/scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
   1289 
   1290     def cov(self, col1, col2):
-> 1291         return self.df.cov(col1, col2)
   1292 
   1293     cov.__doc__ = DataFrame.cov.__doc__

/scratch/rxin/spark/python/pyspark/sql/dataframe.pyc in cov(self, col1, col2)
   1139         if not isinstance(col2, str):
   1140             raise ValueError("col2 should be a string.")
-> 1141         return self._jdf.stat().cov(col1, col2)
   1142 
   1143     @since(1.4)

/Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/java_gateway.pyc
 in __call__(self, *args)
    535         answer = self.gateway_client.send_command(command)
    536         return_value = get_return_value(answer, self.gateway_client,
--> 537                 self.target_id, self.name)
    538 
    539         for temp_arg in temp_args:

/Users/rxin/anaconda/lib/python2.7/site-packages/py4j-0.8.1-py2.7.egg/py4j/protocol.pyc
 in get_return_value(answer, gateway_client, target_id, name)
    298                 raise Py4JJavaError(
    299                     'An error occurred while calling {0}{1}{2}.\n'.
--> 300                     format(target_id, '.', name), value)
    301             else:
    302                 raise Py4JError(

Py4JJavaError: An error occurred while calling o87.cov.
: java.lang.IllegalArgumentException: requirement failed: Couldn't find column 
with name id
        at scala.Predef$.require(Predef.scala:233)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:79)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:78)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$.collectStatisticalData(StatFunctions.scala:78)
        at 
org.apache.spark.sql.execution.stat.StatFunctions$.calculateCov(StatFunctions.scala:100)
        at 
org.apache.spark.sql.DataFrameStatFunctions.cov(DataFrameStatFunctions.scala:41)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:744)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to