[ https://issues.apache.org/jira/browse/SPARK-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Szymkiewicz updated SPARK-12006: --------------------------------------- Description: Steps to reproduce : {code} from pyspark.mllib.clustering import GaussianMixture from numpy import array data = sc.textFile("data/mllib/gmm_data.txt") parsedData = data.map(lambda line: array([float(x) for x in line.strip().split(' ')])) gmm = GaussianMixture.train(parsedData, 2) GaussianMixture.train(parsedData, 2, initialModel=gmm) {code} It looks like the source of the problem is [{{initialModelWeights}}|https://github.com/apache/spark/blob/branch-1.6/python/pyspark/mllib/clustering.py#L349] NumPy array. In 1.5 / 1.6 it leads to {{net.razorvine.pickle.PickleException}}, in 1.4 we get {{Method trainGaussianMixture(\[..., class org.apache.spark.mllib.linalg.DenseVector, class java.util.ArrayList, class java.util.ArrayList\]) does not exist}} was: Steps to reproduce : {code} from pyspark.mllib.clustering import GaussianMixture from numpy import array data = sc.textFile("data/mllib/gmm_data.txt") parsedData = data.map(lambda line: array([float(x) for x in line.strip().split(' ')])) gmm = GaussianMixture.train(parsedData, 2) GaussianMixture.train(parsedData, 2, initialModel=gmm) {code} It looks like the source of the problem is [{{initialModelWeights}}|https://github.com/apache/spark/blob/branch-1.6/python/pyspark/mllib/clustering.py#L349] NumPy array. In 1.5 / 1.6 causes {{net.razorvine.pickle.PickleException}}, in 1.4 we get {{Method trainGaussianMixture(\[..., class org.apache.spark.mllib.linalg.DenseVector, class java.util.ArrayList, class java.util.ArrayList\]) does not exist}} > GaussianMixture.train crashes if an itnital model is not None > ------------------------------------------------------------- > > Key: SPARK-12006 > URL: https://issues.apache.org/jira/browse/SPARK-12006 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark > Affects Versions: 1.4.0, 1.5.0, 1.6.0 > Reporter: Maciej Szymkiewicz > > Steps to reproduce : > {code} > from pyspark.mllib.clustering import GaussianMixture > from numpy import array > data = sc.textFile("data/mllib/gmm_data.txt") > parsedData = data.map(lambda line: array([float(x) for x in > line.strip().split(' ')])) > gmm = GaussianMixture.train(parsedData, 2) > GaussianMixture.train(parsedData, 2, initialModel=gmm) > {code} > It looks like the source of the problem is > [{{initialModelWeights}}|https://github.com/apache/spark/blob/branch-1.6/python/pyspark/mllib/clustering.py#L349] > NumPy array. In 1.5 / 1.6 it leads to > {{net.razorvine.pickle.PickleException}}, in 1.4 we get {{Method > trainGaussianMixture(\[..., class org.apache.spark.mllib.linalg.DenseVector, > class java.util.ArrayList, class java.util.ArrayList\]) does not exist}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org