[jira] [Updated] (SPARK-31400) The catalogString doesn't distinguish Vectors in ml and mllib

2020-04-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31400:
-
Issue Type: Improvement  (was: Bug)
  Priority: Minor  (was: Major)

> The catalogString doesn't distinguish Vectors in ml and mllib
> -
>
> Key: SPARK-31400
> URL: https://issues.apache.org/jira/browse/SPARK-31400
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.4.5
> Environment: Ubuntu 16.04
>Reporter: Junpei Zhou
>Priority: Minor
>
> h2. Bug Description
> The `catalogString` is not detailed enough to distinguish the 
> pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
> h2. How to reproduce the bug
> [Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
> example from the official document (Python code). If I keep all other lines 
> untouched, and only modify the Vectors import line, which means:
> {code:java}
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> {code}
> Or you can directly execute the following code snippet:
> {code:java}
> from pyspark.ml.feature import MinMaxScaler
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> dataFrame = spark.createDataFrame([
> (0, Vectors.dense([1.0, 0.1, -1.0]),),
> (1, Vectors.dense([2.0, 1.1, 1.0]),),
> (2, Vectors.dense([3.0, 10.1, 3.0]),)
> ], ["id", "features"])
> scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
> scalerModel = scaler.fit(dataFrame)
> {code}
> It will raise an error:
> {code:java}
> IllegalArgumentException: 'requirement failed: Column features must be of 
> type struct,values:array> 
> but was actually 
> struct,values:array>.'
> {code}
> However, the actually struct and the desired struct are exactly the same 
> string, which cannot provide useful information to the programmer. I would 
> suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and 
> pyspark.mllib.linalg.Vectors.
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31400) The catalogString doesn't distinguish Vectors in ml and mllib

2020-04-09 Thread Junpei Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junpei Zhou updated SPARK-31400:

Description: 
h2. Bug Description

The `catalogString` is not detailed enough to distinguish the 
pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
h2. How to reproduce the bug

[Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
example from the official document (Python code). If I keep all other lines 
untouched, and only modify the Vectors import line, which means:
{code:java}
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
{code}
Or you can directly execute the following code snippet:
{code:java}
from pyspark.ml.feature import MinMaxScaler
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
dataFrame = spark.createDataFrame([
(0, Vectors.dense([1.0, 0.1, -1.0]),),
(1, Vectors.dense([2.0, 1.1, 1.0]),),
(2, Vectors.dense([3.0, 10.1, 3.0]),)
], ["id", "features"])
scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
scalerModel = scaler.fit(dataFrame)
{code}
It will raise an error:
{code:java}
IllegalArgumentException: 'requirement failed: Column features must be of type 
struct,values:array> but was 
actually struct,values:array>.'
{code}
However, the actually struct and the desired struct are exactly the same 
string, which cannot provide useful information to the programmer. I would 
suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and 
pyspark.mllib.linalg.Vectors.

Thanks!

 

  was:
h2. Bug Description

The `catalogString` is not detailed enough to distinguish the 
pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
h2. How to reproduce the bug

[Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
example from the official document (Python code). If I keep all other lines 
untouched, and only modify the Vectors import line, which means:

 
{code:java}
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
{code}
Or you can directly execute the following code snippet:

 

 
{code:java}
from pyspark.ml.feature import MinMaxScaler
# from pyspark.ml.linalg import Vectors
from pyspark.mllib.linalg import Vectors
dataFrame = spark.createDataFrame([
(0, Vectors.dense([1.0, 0.1, -1.0]),),
(1, Vectors.dense([2.0, 1.1, 1.0]),),
(2, Vectors.dense([3.0, 10.1, 3.0]),)
], ["id", "features"])
scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
scalerModel = scaler.fit(dataFrame)
{code}
It will raise an error:

 
{code:java}
IllegalArgumentException: 'requirement failed: Column features must be of type 
struct,values:array> but was 
actually struct,values:array>.'
{code}
However, the actually struct and the desired struct are exactly the same 
string, which cannot provide useful information to the programmer. I would 
suggest making the catalogString distinguish pyspark.ml.linalg.Vectors and 
pyspark.mllib.linalg.Vectors.

Thanks!

 


> The catalogString doesn't distinguish Vectors in ml and mllib
> -
>
> Key: SPARK-31400
> URL: https://issues.apache.org/jira/browse/SPARK-31400
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.4.5
> Environment: Ubuntu 16.04
>Reporter: Junpei Zhou
>Priority: Major
>
> h2. Bug Description
> The `catalogString` is not detailed enough to distinguish the 
> pyspark.ml.linalg.Vectors and pyspark.mllib.linalg.Vectors.
> h2. How to reproduce the bug
> [Here|https://spark.apache.org/docs/latest/ml-features#minmaxscaler] is an 
> example from the official document (Python code). If I keep all other lines 
> untouched, and only modify the Vectors import line, which means:
> {code:java}
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> {code}
> Or you can directly execute the following code snippet:
> {code:java}
> from pyspark.ml.feature import MinMaxScaler
> # from pyspark.ml.linalg import Vectors
> from pyspark.mllib.linalg import Vectors
> dataFrame = spark.createDataFrame([
> (0, Vectors.dense([1.0, 0.1, -1.0]),),
> (1, Vectors.dense([2.0, 1.1, 1.0]),),
> (2, Vectors.dense([3.0, 10.1, 3.0]),)
> ], ["id", "features"])
> scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
> scalerModel = scaler.fit(dataFrame)
> {code}
> It will raise an error:
> {code:java}
> IllegalArgumentException: 'requirement failed: Column features must be of 
> type struct,values:array> 
> but was actually 
> struct,values:array>.'
> {code}
> However, the actually struct and the desired struct are exactly the same 
> string, which cannot provide useful information to the programmer. I would 
> suggest making the catalogString distingui