[ https://issues.apache.org/jira/browse/SPARK-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832691#comment-16832691 ]
yuhao yang commented on SPARK-22974: ------------------------------------ On a business trip from April 29th to May 3rd . Please expect delayed email response. Conctact +1 669 243 8273for anything urgent. Thanks, Yuhao > CountVectorModel does not attach attributes to output column > ------------------------------------------------------------ > > Key: SPARK-22974 > URL: https://issues.apache.org/jira/browse/SPARK-22974 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.2.1 > Reporter: William Zhang > Assignee: Liang-Chi Hsieh > Priority: Major > Fix For: 2.4.0 > > > If CountVectorModel transforms columns, the output column will not have > attributes attached to it. If later on, those output columns are used in > Interaction transformer, an exception will be thrown: > {quote}"org.apache.spark.SparkException: Vector attributes must be defined > for interaction." > {quote} > To reproduce it: > {quote}import org.apache.spark.ml.feature._ > import org.apache.spark.sql.functions._ > val df = spark.createDataFrame(Seq( > (0, Array("a", "b", "c"), Array("1", "2")), > (1, Array("a", "b", "b", "c", "a", "d"), Array("1", "2", "3")) > )).toDF("id", "words", "nums") > val cvModel: CountVectorizerModel = new CountVectorizer() > .setInputCol("nums") > .setOutputCol("features2") > .setVocabSize(4) > .setMinDF(0) > .fit(df) > val cvm = new CountVectorizerModel(Array("a", "b", "c")) > .setInputCol("words") > .setOutputCol("features1") > val df1 = cvm.transform(df) > val df2 = cvModel.transform(df1) > val interaction = new Interaction().setInputCols(Array("features1", > "features2")).setOutputCol("features") > val df3 = interaction.transform(df2) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org