[ 
https://issues.apache.org/jira/browse/SPARK-21594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wang updated SPARK-21594:
--------------------------------
    Remaining Estimate: 168h
     Original Estimate: 168h
           Description: 
The semi-supervised learning efforts have just started in Spark machine 
learning library.
This is a very important direction for limited and costly labelled data.
With the effort, the warm up time for supervised learning can be minimized.
One of the key feature is to be able to output probability in the existing 
machine learning library for selecting the unlablled data by probability 
including self-training. The algorithm which has a tendency to overfit is 
particularly useful. For example, multilayer perceptron classifier(MLP) is one 
of the case. 
I found this is not possible with MLP(or neural network). This is an 
inconsistent offering which needs to be improved. 
thanks
Joseph

  was:
My question is, is it possible to get not only the labels, but also (or only) 
the probability for that label? Like not just 0 or 1 for every input, but 
something like 0.95 for 0 and 0.05 for 1. If this is not possible with MLP, but 
is possible with other classifier. I have only used MLP because I know they 
should be capable of returning the probability, but I can't find it in PySpark. 
This is an inconsistent offering which needs to be fixed, which is provided by 
other algorithms in Spark MLlib with Spark Data Frame but not MLP which is 
related to AI stuff. 
thanks
Joseph


> Missing probability output from MutilayerPerceptronClassifier
> -------------------------------------------------------------
>
>                 Key: SPARK-21594
>                 URL: https://issues.apache.org/jira/browse/SPARK-21594
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.2.0
>         Environment: SPARK, PySpark,Scala, SparkR
>            Reporter: Joseph Wang
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The semi-supervised learning efforts have just started in Spark machine 
> learning library.
> This is a very important direction for limited and costly labelled data.
> With the effort, the warm up time for supervised learning can be minimized.
> One of the key feature is to be able to output probability in the existing 
> machine learning library for selecting the unlablled data by probability 
> including self-training. The algorithm which has a tendency to overfit is 
> particularly useful. For example, multilayer perceptron classifier(MLP) is 
> one of the case. 
> I found this is not possible with MLP(or neural network). This is an 
> inconsistent offering which needs to be improved. 
> thanks
> Joseph



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to