Hi We have a dataframe, then want group it and apply a ML algorithm or statistics(say t test) to each one. Is there any efficient way for this situation?
Currently, we transfer to pyspark, use groupbykey and apply numpy function to array. But this wasn't an efficient way, right? Regards. Wenpei.