[ https://issues.apache.org/jira/browse/SPARK-12780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-12780: -------------------------------------- Fix Version/s: 1.6.1 > Inconsistency returning value of ML python models' properties > ------------------------------------------------------------- > > Key: SPARK-12780 > URL: https://issues.apache.org/jira/browse/SPARK-12780 > Project: Spark > Issue Type: Bug > Components: ML, PySpark > Reporter: Xusen Yin > Assignee: Xusen Yin > Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > In spark/python/pyspark/ml/feature.py, StringIndexerModel has a property > method named labels, which is different with other properties in other models. > In StringIndexerModel: > {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false} > @property > @since("1.5.0") > def labels(self): > """ > Ordered list of labels, corresponding to indices to be assigned. > """ > return self._java_obj.labels > {code} > In CounterVectorizerModel (as an example): > {code:title=CounterVectorizerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false} > @property > @since("1.6.0") > def vocabulary(self): > """ > An array of terms in the vocabulary. > """ > return self._call_java("vocabulary") > {code} > In StringIndexerModel, the returned value of labels is not an array of labels > as expected. Otherwise it is a JavaMember of py4j. > What's more, the Pickle in Python side cannot deserialize Scala Array > normally. According to my experiments, it translates Array[String] into > Tuple, Array[Int] to array.array. It may bring some errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org