Hello,

I am using pyspark to train a Logistic Regression model using cross validation 
with ML. My dataset is - for testing purposes very small - like no more than 50 
records for train.
On the other hand, my "feature" column has a very large size - i.e., 1500+ 
columns.

I am running on yarn using 3 executors, with 4gb and 4 cores each. I am using 
cache to store dataframes.

Unfortunately, my process does not finish and hangs in doing cross validation. 

Any clues? 

Thanks guys

Simone

Reply via email to