Memory issues in 3.0.2 but works well on 2.4.4

2021-05-21 Thread Praneeth Shishtla
Hi, I have a simple DecisionForest model and was able to train the model on pyspark==2.4.4 without any issues. However, when I upgraded to pyspark==3.0.2, the fit takes a lot of time and eventually errors out saying out of memory. Even tried reducing the number of samples for training but no luck.

Sleep behavior

2021-04-23 Thread Praneeth Shishtla
Hi, We have a 6 node spark cluster and have some pyspark jobs running on it. The job is dependent on external application and to have resiliency we try a couple of times. Will it be fine to induce some wait time between two runs(using time.sleep()) ? Or could there by any sync issues? Wanted to

Calibration Methods

2020-12-14 Thread Praneeth Shishtla
Hi, I am looking for calibrating the output of a pyspark model. I looked for possible implementations in Spark but didn't find any. Sklearn has CalibratedClassifierCV https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html Could anyone point if there

Calibration Methods

2020-12-14 Thread Praneeth Shishtla
Hi All, I am looking for ways to calibrate the output of a pyspark ML model. Could anyone share if there are any implementations around of the same available in spark/pyspark? Here is the implementation available in sklearn: