[ https://issues.apache.org/jira/browse/SPARK-48837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-48837. ----------------------------------- Resolution: Fixed Issue resolved by pull request 47258 [https://github.com/apache/spark/pull/47258] > In CountVectorizer, only read binary parameter once per transform, not once > per row > ----------------------------------------------------------------------------------- > > Key: SPARK-48837 > URL: https://issues.apache.org/jira/browse/SPARK-48837 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.0.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-13629 added a binary parameter to CountVectorizer, but due to the way > the code is structured the configuration parameter is read once-per-row in a > UDF. Instead, we should read it once-per-transform call (similar to how other > parameters are read). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org