[ https://issues.apache.org/jira/browse/SPARK-20802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029950#comment-16029950 ]
Bettadapura Srinath Sharma commented on SPARK-20802: ---------------------------------------------------- In Java, (Correct behavior) code: KolmogorovSmirnovTestResult testResult = Statistics.kolmogorovSmirnovTest(col1, "norm", mean[1], stdDev[1]); produces: Kolmogorov-Smirnov test summary: degrees of freedom = 0 statistic = 0.005983051038968901 pValue = 0.8643736171652615 No presumption against null hypothesis: Sample follows theoretical distribution. > kolmogorovSmirnovTest in pyspark.mllib.stat.Statistics throws > net.razorvine.pickle.PickleException when input data is normally distributed > (no error when data is not normally distributed) > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-20802 > URL: https://issues.apache.org/jira/browse/SPARK-20802 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark > Affects Versions: 2.1.1 > Environment: Linux version 4.4.14-smp > x86/fpu: Legacy x87 FPU detected. > using command line: > bash-4.3$ ./bin/spark-submit ~/work/python/Features.py > bash-4.3$ pwd > /home/bsrsharma/spark-2.1.1-bin-hadoop2.7 > export JAVA_HOME=/home/bsrsharma/jdk1.8.0_121 > Reporter: Bettadapura Srinath Sharma > > In Scala,(correct behavior) > code: > testResult = Statistics.kolmogorovSmirnovTest(vecRDD, "norm", means(j), > stdDev(j)) > produces: > 17/05/18 10:52:53 INFO FeatureLogger: Kolmogorov-Smirnov test summary: > degrees of freedom = 0 > statistic = 0.005495681749849268 > pValue = 0.9216108887428276 > No presumption against null hypothesis: Sample follows theoretical > distribution. > in python (incorrect behavior): > the code: > testResult = Statistics.kolmogorovSmirnovTest(vecRDD, 'norm', numericMean[j], > numericSD[j]) > causes this error: > 17/05/17 21:59:23 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 14) > net.razorvine.pickle.PickleException: expected zero arguments for > construction of ClassDict (for numpy.dtype) > -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org