[
https://issues.apache.org/jira/browse/SPARK-12780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119912#comment-15119912
]
Apache Spark commented on SPARK-12780:
--
User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/10950
> Inconsistency returning value of ML python models' properties
> -
>
> Key: SPARK-12780
> URL: https://issues.apache.org/jira/browse/SPARK-12780
> Project: Spark
> Issue Type: Bug
> Components: ML, PySpark
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>Priority: Minor
> Fix For: 2.0.0
>
>
> In spark/python/pyspark/ml/feature.py, StringIndexerModel has a property
> method named labels, which is different with other properties in other models.
> In StringIndexerModel:
> {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false}
> @property
> @since("1.5.0")
> def labels(self):
> """
> Ordered list of labels, corresponding to indices to be assigned.
> """
> return self._java_obj.labels
> {code}
> In CounterVectorizerModel (as an example):
> {code:title=CounterVectorizerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false}
> @property
> @since("1.6.0")
> def vocabulary(self):
> """
> An array of terms in the vocabulary.
> """
> return self._call_java("vocabulary")
> {code}
> In StringIndexerModel, the returned value of labels is not an array of labels
> as expected. Otherwise it is a JavaMember of py4j.
> What's more, the Pickle in Python side cannot deserialize Scala Array
> normally. According to my experiments, it translates Array[String] into
> Tuple, Array[Int] to array.array. It may bring some errors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org