[ 
https://issues.apache.org/jira/browse/SPARK-20802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029950#comment-16029950
 ] 

Bettadapura Srinath Sharma commented on SPARK-20802:
----------------------------------------------------

In Java, (Correct behavior)
code:
KolmogorovSmirnovTestResult testResult = Statistics.kolmogorovSmirnovTest(col1, 
"norm", mean[1], stdDev[1]);
produces:
Kolmogorov-Smirnov test summary:
degrees of freedom = 0 
statistic = 0.005983051038968901 
pValue = 0.8643736171652615 
No presumption against null hypothesis: Sample follows theoretical distribution.


> kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws 
> net.razorvine.pickle.PickleException when input data is normally distributed 
> (no error when data is not normally distributed)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20802
>                 URL: https://issues.apache.org/jira/browse/SPARK-20802
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 2.1.1
>         Environment: Linux version 4.4.14-smp
> x86/fpu: Legacy x87 FPU detected.
> using command line: 
> bash-4.3$ ./bin/spark-submit ~/work/python/Features.py
> bash-4.3$ pwd
> /home/bsrsharma/spark-2.1.1-bin-hadoop2.7
> export JAVA_HOME=/home/bsrsharma/jdk1.8.0_121
>            Reporter: Bettadapura Srinath Sharma
>
> In Scala,(correct behavior)
> code:
> testResult = Statistics.kolmogorovSmirnovTest(vecRDD, "norm", means(j), 
> stdDev(j))
> produces:
> 17/05/18 10:52:53 INFO FeatureLogger: Kolmogorov-Smirnov test summary:
> degrees of freedom = 0 
> statistic = 0.005495681749849268 
> pValue = 0.9216108887428276 
> No presumption against null hypothesis: Sample follows theoretical 
> distribution.
> in python (incorrect behavior):
> the code:
> testResult = Statistics.kolmogorovSmirnovTest(vecRDD, 'norm', numericMean[j], 
> numericSD[j])
> causes this error:
> 17/05/17 21:59:23 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14)
> net.razorvine.pickle.PickleException: expected zero arguments for 
> construction of ClassDict (for numpy.dtype)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to