[ 
https://issues.apache.org/jira/browse/SPARK-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-17508.
----------------------------------
    Resolution: Won't Fix

Resolving this for now unless there is more interest in the fix

> Setting weightCol to None in ML library causes an error
> -------------------------------------------------------
>
>                 Key: SPARK-17508
>                 URL: https://issues.apache.org/jira/browse/SPARK-17508
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Evan Zamir
>            Priority: Minor
>
> The following code runs without error:
> {code}
> spark = SparkSession.builder.appName('WeightBug').getOrCreate()
> df = spark.createDataFrame(
>     [
>         (1.0, 1.0, Vectors.dense(1.0)),
>         (0.0, 1.0, Vectors.dense(-1.0))
>     ],
>     ["label", "weight", "features"])
> lr = LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
> model = lr.fit(df)
> {code}
> My expectation from reading the documentation is that setting weightCol=None 
> should treat all weights as 1.0 (regardless of whether a column exists). 
> However, the same code with weightCol set to None causes the following errors:
> Traceback (most recent call last):
>   File "/Users/evanzamir/ams/px-seed-model/scripts/bug.py", line 32, in 
> <module>
>     model = lr.fit(df)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 
> 64, in fit
>     return self._fit(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", 
> line 213, in _fit
>     java_model = self._fit_java(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", 
> line 210, in _fit_java
>     return self._java_obj.fit(dataset._jdf)
>   File 
> "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", 
> line 63, in deco
>     return f(*a, **kw)
>   File 
> "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
>  line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o38.fit.
> : java.lang.NullPointerException
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:264)
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:259)
>       at 
> org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:159)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
>       at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>       at py4j.Gateway.invoke(Gateway.java:280)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:211)
>       at java.lang.Thread.run(Thread.java:745)
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to