spark git commit: [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer

meng Tue, 21 Jun 2016 00:48:07 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 14e5decc5 -> 0499ed961



[SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and 
binarizer

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-16045
2.0 Audit: Update document for StopWordsRemover and Binarizer.

## How was this patch tested?

manual review for doc

Author: Yuhao Yang <hhb...@gmail.com>
Author: Yuhao Yang <yuhao.y...@intel.com>

Closes #13375 from hhbyyh/stopdoc.

(cherry picked from commit a58f40239444d42adbc480ddde02cbb02a79bbe4)
Signed-off-by: Xiangrui Meng <m...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0499ed96
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0499ed96
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0499ed96

Branch: refs/heads/branch-2.0
Commit: 0499ed961838686acccefc08a42efa523f1648dd
Parents: 14e5dec
Author: Yuhao Yang <hhb...@gmail.com>
Authored: Tue Jun 21 00:47:36 2016 -0700
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Tue Jun 21 00:47:44 2016 -0700

----------------------------------------------------------------------
 docs/ml-features.md | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/0499ed96/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 3db24a3..3cb2644 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -251,11 +251,12 @@ frequently and don't carry as much meaning.
 `StopWordsRemover` takes as input a sequence of strings (e.g. the output
 of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop
 words from the input sequences. The list of stopwords is specified by
-the `stopWords` parameter.  We provide [a list of stop
-words](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) by
-default, accessible by calling `getStopWords` on a newly instantiated
-`StopWordsRemover` instance. A boolean parameter `caseSensitive` indicates
-if the matches should be case sensitive (false by default).
+the `stopWords` parameter. Default stop words for some languages are 
accessible 
+by calling `StopWordsRemover.loadDefaultStopWords(language)`, for which 
available 
+options are "danish", "dutch", "english", "finnish", "french", "german", 
"hungarian", 
+"italian", "norwegian", "portuguese", "russian", "spanish", "swedish" and 
"turkish". 
+A boolean parameter `caseSensitive` indicates if the matches should be case 
sensitive 
+(false by default).
 
 **Examples**
 
@@ -346,7 +347,10 @@ for more details on the API.
 
 Binarization is the process of thresholding numerical features to binary (0/1) 
features.
 
-`Binarizer` takes the common parameters `inputCol` and `outputCol`, as well as 
the `threshold` for binarization. Feature values greater than the threshold are 
binarized to 1.0; values equal to or less than the threshold are binarized to 
0.0.
+`Binarizer` takes the common parameters `inputCol` and `outputCol`, as well as 
the `threshold`
+for binarization. Feature values greater than the threshold are binarized to 
1.0; values equal
+to or less than the threshold are binarized to 0.0. Both Vector and Double 
types are supported
+for `inputCol`.
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and binarizer

Reply via email to