Re: [D] Improving the Usability of ML Operators [texera]

via GitHub Fri, 13 Feb 2026 13:10:19 -0800


GitHub user SarahAsad23 closed a discussion: Improving the Usability of ML 
Operators


After teaching ML to a group of students using Texera, we noticed that it is 
not very intuitive to have to split the data twice when creating a ML workflow. 

The spits produce: 
1. Training data - passed to ML operator 
2. Test data - passed to ML operator 
3. Prediction data - used to do the predictions

For reference, here is the workflow that was used: 
<img width="1972" height="544" alt="IMG_9364" 
src="https://github.com/user-attachments/assets/473b12d9-9f6f-4e8f-90bc-51083b94c80e";
 />

I am thinking it would be more intuitive for the train/test split to happen 
internally within the ML operator, which could remove the first split. 


 




GitHub link: https://github.com/apache/texera/discussions/4202

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Improving the Usability of ML Operators [texera]

Reply via email to