Re: k8s orchestrating Spark service

2019-07-03 Thread Pat Ferrel
Thanks for the in depth explanation. These methods would require us to architect our Server around Spark and it is actually designed to be independent of the ML implementation. SparkML is an important algo source, to be sure, but so is TensorFlow, and Python non-spark libs among others. So Spark

Attempting to avoid a shuffle on join

2019-07-03 Thread Mkal
Please keep in mind i'm fairly new to spark. I have some spark code where i load two textfiles as datasets and after some map and filter operations to bring the columns in a specific shape, i join the datasets. The join takes place on a common column (of type string). Is there any way to avoid