[GitHub] spark pull request #20112: [SPARK-22734][ML][PySpark] Added Python API for V...

2017-12-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20112


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20112: [SPARK-22734][ML][PySpark] Added Python API for V...

2017-12-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20112#discussion_r159097278
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -3466,6 +3466,72 @@ def selectedFeatures(self):
 return self._call_java("selectedFeatures")
 
 
+@inherit_doc
+class VectorSizeHint(JavaTransformer, HasInputCol, HasHandleInvalid, 
JavaMLReadable,
--- End diff --

You'll need to override handleInvalid, like in the Scala API, since it 
takes different values & has a different docstring.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20112: [SPARK-22734][ML][PySpark] Added Python API for V...

2017-12-29 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/20112#discussion_r159096655
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -3466,6 +3466,72 @@ def selectedFeatures(self):
 return self._call_java("selectedFeatures")
 
 
+@inherit_doc
+class VectorSizeHint(JavaTransformer, HasInputCol, HasHandleInvalid, 
JavaMLReadable,
+ JavaMLWritable):
+"""
+A feature transformer that adds size information to the metadata of a 
vector column.
+VectorAssembler needs size information for its input columns and 
cannot be used on streaming
+dataframes without this metadata.
+
+>>> from pyspark.ml.linalg import Vectors
+>>> from pyspark.ml import Pipeline, PipelineModel
+>>> data = [(Vectors.dense([1., 2., 3.]), 4.)]
+>>> df = spark.createDataFrame(data, ["vector", "float"])
+>>>
+>>> sizeHint = VectorSizeHint(inputCol="vector", size=3, 
handleInvalid="skip")
+>>> vecAssembler = VectorAssembler(inputCols=["vector", "float"], 
outputCol="assembled")
+>>> pipeline = Pipeline(stages=[sizeHint, vecAssembler])
+>>>
+>>> pipelineModel = pipeline.fit(df)
+>>> pipelineModel.transform(df).head().assembled
+DenseVector([1.0, 2.0, 3.0, 4.0])
+>>> vectorSizeHintPath = temp_path + "/vector-size-hint-pipeline"
+>>> pipelineModel.save(vectorSizeHintPath)
+>>> loadedPipeline = PipelineModel.load(vectorSizeHintPath)
+>>> loaded = loadedPipeline.transform(df).head().assembled
+>>> expected = pipelineModel.transform(df).head().assembled
+>>> loaded == expected
+True
+
+.. versionadded:: 2.3.0
+.. note:: Experimental
+"""
+
+size = Param(Params._dummy(), "size", "Size of vectors in column.",
+ typeConverter=TypeConverters.toInt)
+
+@since("2.3.0")
+def getSize(self):
+""" Gets size param, the size of vectors in `inputCol`."""
+self.getOrDefault(self.size)
+
+@since("2.3.0")
+def setSize(self, value):
+""" Sets size param, the size of vectors in `inputCol`."""
+self._set(size=value)
+
+@keyword_only
+def __init__(self, inputCol=None, size=None, handleInvalid="error"):
--- End diff --

Let's stick with the order which all other python classes follow: dummy 
Params, __init__, Param setters & getters


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20112: [SPARK-22734][ML][PySpark] Added Python API for V...

2017-12-28 Thread MrBago
GitHub user MrBago opened a pull request:

https://github.com/apache/spark/pull/20112

[SPARK-22734][ML][PySpark] Added Python API for VectorSizeHint.

(Please fill in changes proposed in this fix)

Python API for VectorSizeHint Transformer.

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)

doc-tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MrBago/spark vectorSizeHint-PythonAPI

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20112


commit 83bb7ded0d58d4173671904a452039b57bcbea3d
Author: Bago Amirbekian 
Date:   2017-12-29T03:05:53Z

Added Python API for VectorSizeHint.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org