[jira] [Created] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions

Narine Kokhlikyan (JIRA) Mon, 14 Dec 2015 12:53:50 -0800

Narine Kokhlikyan created SPARK-12325:
-----------------------------------------

Summary: Inappropriate error messages in DataFrame StatFunctions
Key: SPARK-12325
URL: https://issues.apache.org/jira/browse/SPARK-12325
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Narine Kokhlikyan
Priority: Critical

Hi there,

I have mentioned this issue earlier in one of my pull requests for SQL
component, but I've never received a feedback in any of them.
https://github.com/apache/spark/pull/9366#issuecomment-155171975

Although this has been very frustrating, I'll try to list certain facts again:

1. I call dataframe correlation method and it says that covariance is wrong.
I do not think that this is an appropriate message to show here.

scala> df.stat.corr("rating", "income")
java.lang.IllegalArgumentException: requirement failed: Covariance calculation
for columns with dataType StringType not supported.
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$collectStatisticalData$3.apply(StatFunctions.scala:81)

2. The biggest issue here is not the message shown, but the design.
A class called CovarianceCounter does the computations both for correlation and
covariance. This might be a convenient way
from certain perspective, however something like this is harder to understand
and extend, especially if you want to add another algorithm
e.g. Spearman correlation, or something else.

There are many possible solutions here:
starting from
1. just fixing the message
2. fixing the message and renaming CovarianceCounter and corresponding methods
3. create CorrelationCounter and splitting the computations for correlation and
covariance

and many more ....

Since I'm not getting any response and according to github all five of you have
been working on this, I'll try again:
[~brkyvz], [~rxin], [~davies], [~viirya], [~cloud_fan]

Can any of you ,please, explain me such a behavior or communicate more about
this.
In case you are planning to remove it or something else, we'd truly appreciate
if you communicate.

In fact, I would like to do a pull request on this, but since my pull requests
in SQL/ML components are just staying there without any response, I'll wait for
your response first.

cc: [~shivaram], [~mengxr]

Thank you,
Narine

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12325) Inappropriate error messages in DataFrame StatFunctions

Reply via email to