The following code is failing on the collect. If I don't do the collect and go 
with a JavaRDD<Document> it works fine. Except I really would like to collect. 
At first I was getting an error regarding JDI threads and an index being 0. 
Then it just started locking up. I'm running the spark context locally on 8 
cores. 

                long count = documents.filter(d -> d.getFeatures().size() > 
Parameters.MIN_CENTROID_FEATURES).count();          List<Document> 
sampledDocuments = documents.filter(d -> d.getFeatures().size() > 
Parameters.MIN_CENTROID_FEATURES)                              .sample(false, 
samplingFraction(count)).collect();


                                          

Reply via email to