[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592174#comment-16592174 ] Apache Spark commented on SPARK-25124: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/8 > VectorSizeHint.size is buggy, breaking streaming pipeline > - > > Key: SPARK-25124 > URL: https://issues.apache.org/jira/browse/SPARK-25124 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.1 >Reporter: Timothy Hunter >Assignee: Huaxin Gao >Priority: Major > Labels: beginner, starter > Fix For: 2.4.0 > > > Currently, when using {{VectorSizeHint().setSize(3)}} in an ML pipeline, > transforming a stream will return a nondescript exception about the stream > not started. At core are the following bugs that {{setSize}} and {{getSize}} > do not {{return}} values but {{None}}: > https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3846 > How to reproduce, using the example in the doc: > {code} > from pyspark.ml.linalg import Vectors > from pyspark.ml import Pipeline, PipelineModel > from pyspark.ml.feature import VectorAssembler, VectorSizeHint > data = [(Vectors.dense([1., 2., 3.]), 4.)] > df = spark.createDataFrame(data, ["vector", "float"]) > sizeHint = VectorSizeHint(inputCol="vector", handleInvalid="skip").setSize(3) > # Will fail > vecAssembler = VectorAssembler(inputCols=["vector", "float"], > outputCol="assembled") > pipeline = Pipeline(stages=[sizeHint, vecAssembler]) > pipelineModel = pipeline.fit(df) > pipelineModel.transform(df).head().assembled > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590919#comment-16590919 ] Joseph K. Bradley commented on SPARK-25124: --- I merged https://github.com/apache/spark/pull/22136 into master for target 2.4.0. I'll leave this open until we backport it to branch-2.3 > VectorSizeHint.size is buggy, breaking streaming pipeline > - > > Key: SPARK-25124 > URL: https://issues.apache.org/jira/browse/SPARK-25124 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.1 >Reporter: Timothy Hunter >Assignee: Huaxin Gao >Priority: Major > Labels: beginner, starter > Fix For: 2.4.0 > > > Currently, when using {{VectorSizeHint().setSize(3)}} in an ML pipeline, > transforming a stream will return a nondescript exception about the stream > not started. At core are the following bugs that {{setSize}} and {{getSize}} > do not {{return}} values but {{None}}: > https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3846 > How to reproduce, using the example in the doc: > {code} > from pyspark.ml.linalg import Vectors > from pyspark.ml import Pipeline, PipelineModel > from pyspark.ml.feature import VectorAssembler, VectorSizeHint > data = [(Vectors.dense([1., 2., 3.]), 4.)] > df = spark.createDataFrame(data, ["vector", "float"]) > sizeHint = VectorSizeHint(inputCol="vector", handleInvalid="skip").setSize(3) > # Will fail > vecAssembler = VectorAssembler(inputCols=["vector", "float"], > outputCol="assembled") > pipeline = Pipeline(stages=[sizeHint, vecAssembler]) > pipelineModel = pipeline.fit(df) > pipelineModel.transform(df).head().assembled > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584251#comment-16584251 ] Apache Spark commented on SPARK-25124: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/22136 > VectorSizeHint.size is buggy, breaking streaming pipeline > - > > Key: SPARK-25124 > URL: https://issues.apache.org/jira/browse/SPARK-25124 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.1 >Reporter: Timothy Hunter >Priority: Major > Labels: beginner, starter > > Currently, when using {{VectorSizeHint().setSize(3)}} in an ML pipeline, > transforming a stream will return a nondescript exception about the stream > not started. At core are the following bugs that {{setSize}} and {{getSize}} > do not {{return}} values but {{None}}: > https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3846 > How to reproduce, using the example in the doc: > {code} > from pyspark.ml.linalg import Vectors > from pyspark.ml import Pipeline, PipelineModel > from pyspark.ml.feature import VectorAssembler, VectorSizeHint > data = [(Vectors.dense([1., 2., 3.]), 4.)] > df = spark.createDataFrame(data, ["vector", "float"]) > sizeHint = VectorSizeHint(inputCol="vector", handleInvalid="skip").setSize(3) > # Will fail > vecAssembler = VectorAssembler(inputCols=["vector", "float"], > outputCol="assembled") > pipeline = Pipeline(stages=[sizeHint, vecAssembler]) > pipelineModel = pipeline.fit(df) > pipelineModel.transform(df).head().assembled > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584215#comment-16584215 ] Huaxin Gao commented on SPARK-25124: I will submit a PR very soon. > VectorSizeHint.size is buggy, breaking streaming pipeline > - > > Key: SPARK-25124 > URL: https://issues.apache.org/jira/browse/SPARK-25124 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.1 >Reporter: Timothy Hunter >Priority: Major > Labels: beginner, starter > > Currently, when using {{VectorSizeHint().setSize(3)}} in an ML pipeline, > transforming a stream will return a nondescript exception about the stream > not started. At core are the following bugs that {{setSize}} and {{getSize}} > do not {{return}} values but {{None}}: > https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3846 > How to reproduce, using the example in the doc: > {code} > from pyspark.ml.linalg import Vectors > from pyspark.ml import Pipeline, PipelineModel > from pyspark.ml.feature import VectorAssembler, VectorSizeHint > data = [(Vectors.dense([1., 2., 3.]), 4.)] > df = spark.createDataFrame(data, ["vector", "float"]) > sizeHint = VectorSizeHint(inputCol="vector", handleInvalid="skip").setSize(3) > # Will fail > vecAssembler = VectorAssembler(inputCols=["vector", "float"], > outputCol="assembled") > pipeline = Pipeline(stages=[sizeHint, vecAssembler]) > pipelineModel = pipeline.fit(df) > pipelineModel.transform(df).head().assembled > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org