[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bago Amirbekian updated SPARK-21926: ------------------------------------ Description: We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes. Failing cases: 1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata. Possible fixes: More details here SPARK-22346. 2) OneHotEncoder where the input is a column with no metadata. Possible fixes: a) Make OneHotEncoder an estimator (SPARK-13030). -b) Allow user to set the cardinality of OneHotEncoder.- was: We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes. Failing cases: 1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata. Possible fixes: More details here SPARK-22346. 2) OneHotEncoder where the input is a column with no metadata. Possible fixes: a) Make OneHotEncoder an estimator (SPARK-13030). b) Allow user to set the cardinality of OneHotEncoder. > Compatibility between ML Transformers and Structured Streaming > -------------------------------------------------------------- > > Key: SPARK-21926 > URL: https://issues.apache.org/jira/browse/SPARK-21926 > Project: Spark > Issue Type: Umbrella > Components: ML, Structured Streaming > Affects Versions: 2.2.0 > Reporter: Bago Amirbekian > > We've run into a few cases where ML components don't play nice with streaming > dataframes (for prediction). This ticket is meant to help aggregate these > known cases in one place and provide a place to discuss possible fixes. > Failing cases: > 1) VectorAssembler where one of the inputs is a VectorUDT column with no > metadata. > Possible fixes: > More details here SPARK-22346. > 2) OneHotEncoder where the input is a column with no metadata. > Possible fixes: > a) Make OneHotEncoder an estimator (SPARK-13030). > -b) Allow user to set the cardinality of OneHotEncoder.- -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org