Maciej Szymkiewicz created SPARK-12006: ------------------------------------------
Summary: GaussianMixture.train crashes if an itnital model is not None Key: SPARK-12006 URL: https://issues.apache.org/jira/browse/SPARK-12006 Project: Spark Issue Type: Bug Components: MLlib, PySpark Affects Versions: 1.5.0, 1.4.0, 1.6.0 Reporter: Maciej Szymkiewicz Steps to reproduce : {code} from pyspark.mllib.clustering import GaussianMixture from numpy import array data = sc.textFile("data/mllib/gmm_data.txt") parsedData = data.map(lambda line: array([float(x) for x in line.strip().split(' ')])) gmm = GaussianMixture.train(parsedData, 2) GaussianMixture.train(parsedData, 2, initialModel=gmm) {code} It looks like the source of the problem is [{{initialModelWeights}}|https://github.com/apache/spark/blob/branch-1.6/python/pyspark/mllib/clustering.py#L349] NumPy array. In 1.5 / 1.6 causes {{net.razorvine.pickle.PickleException}}, in 1.4 we get {{Method trainGaussianMixture(\[..., class org.apache.spark.mllib.linalg.DenseVector, class java.util.ArrayList, class java.util.ArrayList\]) does not exist}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org