[ 
https://issues.apache.org/jira/browse/SPARK-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17508:
------------------------------
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

I agree it ends up being more of an improvement, but also seems worth fixing if 
possible. Is the right behavior that None acts the same as if it weren't set? 
surely that's possible on the Python API side?

> Setting weightCol to None in ML library causes an error
> -------------------------------------------------------
>
>                 Key: SPARK-17508
>                 URL: https://issues.apache.org/jira/browse/SPARK-17508
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Evan Zamir
>            Priority: Minor
>
> The following code runs without error:
> {code}
> spark = SparkSession.builder.appName('WeightBug').getOrCreate()
> df = spark.createDataFrame(
>     [
>         (1.0, 1.0, Vectors.dense(1.0)),
>         (0.0, 1.0, Vectors.dense(-1.0))
>     ],
>     ["label", "weight", "features"])
> lr = LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
> model = lr.fit(df)
> {code}
> My expectation from reading the documentation is that setting weightCol=None 
> should treat all weights as 1.0 (regardless of whether a column exists). 
> However, the same code with weightCol set to None causes the following errors:
> Traceback (most recent call last):
>   File "/Users/evanzamir/ams/px-seed-model/scripts/bug.py", line 32, in 
> <module>
>     model = lr.fit(df)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 
> 64, in fit
>     return self._fit(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", 
> line 213, in _fit
>     java_model = self._fit_java(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", 
> line 210, in _fit_java
>     return self._java_obj.fit(dataset._jdf)
>   File 
> "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", 
> line 63, in deco
>     return f(*a, **kw)
>   File 
> "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o38.fit.
> : java.lang.NullPointerException
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:264)
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:259)
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:159)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>       at py4j.Gateway.invoke(Gateway.java:280)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:211)
>       at java.lang.Thread.run(Thread.java:745)
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to