AjayBoddeda4 commented on issue #364: URL: https://github.com/apache/wayang/issues/364#issuecomment-4140415430
Hi, I am Ajay Boddeda, a GSoC 2026 applicant working on the DataFrames API proposal for Apache Wayang. This TODO is interesting from a DataFrame API perspective. The SparkKMeansOperator currently uses fit() and transform() separately inside the evaluate() method but does not expose them as distinct operations. In the DataFrame API I am proposing, ML operations like KMeans should follow the standard fit/transform pattern — where fit() trains the model on a DataFrame and transform() applies it to produce a new DataFrame with predictions. This separation would make the DataFrame API consistent with scikit-learn and Spark ML conventions that data engineers already know. Would love to understand the expected design for fit/transform support and potentially contribute to this as part of my GSoC work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
