If I try to use LogisticRegression with only positive training it always gives me positive results:
Positive Only private def positiveOnly(): Unit = { val training = spark.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (1.0, Vectors.dense(0.0, 1.0, -1.0)), (1.0, Vectors.dense(0.2, 1.3, 1.0)), (1.0, Vectors.dense(0.1, 1.2, -0.5)) )).toDF("label", "features") val lr = new LogisticRegression() lr.setMaxIter(10).setRegParam(0.01) val model = lr.fit(training) val test = spark.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) )).toDF("label", "features") model.transform(test) .select("features", "label", "probability", "prediction") .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction") } } Not using Mixmax yet? The results look like this: [info] ([-1.0,1.5,1.3], 1.0) -> prob=[0.0,1.0], prediction=1.0[info] ([3.0,2.0,-0.1], 0.0) -> prob=[0.0,1.0], prediction=1.0[info] ([0.0,2.2,-1.5], 1.0) -> prob=[0.0,1.0], prediction=1.0 On Tue, Jan 16, 2018 8:51 AM, Matt Hicks m...@outr.com wrote: Hi Hari, I'm not sure I understand. I apologize, I'm still pretty new to Spark and Spark ML. Can you point me to some example code or documentation that would more fully represent this? Thanks On Tue, Jan 16, 2018 2:54 AM, hosur narahari hnr1...@gmail.com wrote: You can make use of probability vector from spark classification.When you run spark classification model for prediction, along with classifying into its class spark also gives probability vector(what's the probability that this could belong to each individual class) . So just take the probability corresponding to the donor class. And it'll be same as what's the probability the a person will become donor. Best Regards,Hari On 15 Jan 2018 11:51 p.m., "Matt Hicks" <m...@outr.com> wrote: I'm attempting to create a training classification, but only have positive information. Specifically in this case it is a donor list of users, but I want to use it as training in order to determine classification for new contacts to give probabilities that they will donate. Any insights or links are appreciated. I've gone through the documentation but have been unable to find any references to how I might do this. Thanks --- Matt Hicks Chief Technology Officer 405.283.6887 | http://outr.com