[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8436 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135579744 LGTM. Merged into master and branch-1.5. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135572729 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41708/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135572728 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135572593 [Test build #41708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41708/console) for PR 8436 at commit [`24eba04`](https://github.com/apache/spark/commit/24eba0448fe20efcdbc98ea6ec2bea1820fa0055). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a sequence of $n$ tokens (typically words) for some integer $n$. The `NGram` class can be used to transform input features into $n$-grams.` * `public class JavaStopWordsRemoverSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135564548 [Test build #41708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41708/consoleFull) for PR 8436 at commit [`24eba04`](https://github.com/apache/spark/commit/24eba0448fe20efcdbc98ea6ec2bea1820fa0055). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135562698 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135562745 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38149122 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaStopWordsRemoverSuite.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature; + +import java.util.Arrays; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +public class JavaStopWordsRemoverSuite { + + private transient JavaSparkContext jsc; + private transient SQLContext jsql; + + @Before + public void setUp() { +jsc = new JavaSparkContext("local", "JavaStopWordsRemoverSuite"); +jsql = new SQLContext(jsc); + } + + @After + public void tearDown() { +jsc.stop(); +jsc = null; + } + + @Test + public void javaCompatibilityTest() { +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[] { + new StructField("raw", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) +}); +DataFrame dataset = jsql.createDataFrame(rdd, schema); + +remover.transform(dataset); --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38148971 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default). + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), + (Seq("Mary", "had", "a", "little", "lamb")) +).map(Tuple1.apply)).toDF("raw") + +remover.transform(dataSet).show() --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38148403 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default). + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38148019 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38147717 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38146617 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaStopWordsRemoverSuite.java --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature; + +import java.util.Arrays; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +public class JavaStopWordsRemoverSuite { + + private transient JavaSparkContext jsc; + private transient SQLContext jsql; + + @Before + public void setUp() { +jsc = new JavaSparkContext("local", "JavaStopWordsRemoverSuite"); +jsql = new SQLContext(jsc); + } + + @After + public void tearDown() { +jsc.stop(); +jsc = null; + } + + @Test + public void javaCompatibilityTest() { +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[] { + new StructField("raw", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) +}); +DataFrame dataset = jsql.createDataFrame(rdd, schema); + +remover.transform(dataset); --- End diff -- This doesn't trigger any action. We can call `.collect()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38146489 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default). + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), --- End diff -- The outer parenthesis are not needed. I think using `((0, Seq("I", ...)), (1, Seq(...))).toDF("id", "raw")` might be easier to understand than `Tuple1.apply`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38146305 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. --- End diff -- `StopWords.EnglishStopWords` is a private API. Users can get this list by calling `sw.getStopWords`. We do need to mention it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38146311 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default). + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), + (Seq("Mary", "had", "a", "little", "lamb")) +).map(Tuple1.apply)).toDF("raw") + +remover.transform(dataSet).show() --- End diff -- It is useful to show the result, as in the user guide of `StringIndexer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38146300 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in --- End diff -- Shall we put the link on `a list of stop words`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502190 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135502050 [Test build #41702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/console) for PR 8436 at commit [`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaStopWordsRemoverSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135492754 [Test build #41702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41702/consoleFull) for PR 8436 at commit [`074583e`](https://github.com/apache/spark/commit/074583e2fb5b31275f94af5d35f58fa0f2737c50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135491108 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-135491197 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r38010333 --- Diff: docs/ml-features.md --- @@ -306,16 +306,88 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer)) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default). + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), + (Seq("Mary", "had", "a", "little", "lamb")) +).map(Tuple1.apply)).toDF("raw") + +remover.transform(dataSet).show() +{% endhighlight %} + + + + +[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. --- End diff -- close paren after "default" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134848565 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134848568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41598/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134847882 [Test build #41598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41598/console) for PR 8436 at commit [`5169ce0`](https://github.com/apache/spark/commit/5169ce03d5be84446d6236eaf0413e97006419d6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaStopWordsRemoverSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134834323 [Test build #41598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41598/consoleFull) for PR 8436 at commit [`5169ce0`](https://github.com/apache/spark/commit/5169ce03d5be84446d6236eaf0413e97006419d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134833845 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134833851 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37946843 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaStopWordsRemoverSuite.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature; + +import java.util.Arrays; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +public class JavaStopWordsRemoverSuite { + + private transient JavaSparkContext jsc; + private transient SQLContext jsql; + + @Before + public void setUp() { +jsc = new JavaSparkContext("local", "JavaStopWordsRemoverSuite"); +jsql = new SQLContext(jsc); + } + + @After + public void tearDown() { +jsc.stop(); +jsc = null; + } + + @Test + public void javaCompatibilityTest() { +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[]{ --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37946831 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37946834 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134779180 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41564/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134779177 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134779098 [Test build #41564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41564/console) for PR 8436 at commit [`28a3deb`](https://github.com/apache/spark/commit/28a3deb11040c825970f6d05ca358f4a69f466d9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaStopWordsRemoverSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134772031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134772032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41570/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134771950 [Test build #41570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41570/console) for PR 8436 at commit [`28a3deb`](https://github.com/apache/spark/commit/28a3deb11040c825970f6d05ca358f4a69f466d9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode ` * `case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right) ` * `case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)` * `case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37932514 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaStopWordsRemoverSuite.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature; + +import java.util.Arrays; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +public class JavaStopWordsRemoverSuite { + + private transient JavaSparkContext jsc; + private transient SQLContext jsql; + + @Before + public void setUp() { +jsc = new JavaSparkContext("local", "JavaStopWordsRemoverSuite"); +jsql = new SQLContext(jsc); + } + + @After + public void tearDown() { +jsc.stop(); +jsc = null; + } + + @Test + public void javaCompatibilityTest() { +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[]{ + new StructField("raw", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) +}); +DataFrame dataset = jsql.createDataFrame(rdd, schema); + } --- End diff -- remover.fit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37932511 --- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaStopWordsRemoverSuite.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature; + +import java.util.Arrays; + +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.api.java.JavaSparkContext; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.SQLContext; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + + +public class JavaStopWordsRemoverSuite { + + private transient JavaSparkContext jsc; + private transient SQLContext jsql; + + @Before + public void setUp() { +jsc = new JavaSparkContext("local", "JavaStopWordsRemoverSuite"); +jsql = new SQLContext(jsc); + } + + @After + public void tearDown() { +jsc.stop(); +jsc = null; + } + + @Test + public void javaCompatibilityTest() { +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[]{ --- End diff -- style: space between bracket and brace: ```[] {``` (Xiangrui tells me to do this.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134770419 That's it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37932506 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop --- End diff -- closing parenthesis needed after Tokenizer link for "e.g." clause --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37932507 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. --- End diff -- closing parenthesis needed after "default" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134768721 I'll take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134765887 [Test build #41570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41570/consoleFull) for PR 8436 at commit [`28a3deb`](https://github.com/apache/spark/commit/28a3deb11040c825970f6d05ca358f4a69f466d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134765611 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134765595 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
GitHub user feynmanliang reopened a pull request: https://github.com/apache/spark/pull/8436 [SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 You can merge this pull request into a Git repository by running: $ git pull https://github.com/feynmanliang/spark SPARK-10230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8436.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8436 commit 8034e2976aac76566ebb7fad4fca972bf8829461 Author: Feynman Liang Date: 2015-08-25T20:56:53Z StopWordsRemover Java Compatibility Test commit 28a3deb11040c825970f6d05ca358f4a69f466d9 Author: Feynman Liang Date: 2015-08-25T21:01:54Z Adds javadocs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang closed the pull request at: https://github.com/apache/spark/pull/8436 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37929502 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), + (Seq("Mary", "had", "a", "little", "lamb")) +).map(Tuple1.apply)).toDF("raw") + +remover.transform(dataSet).show() +{% endhighlight %} + + + + +[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. + +{% highlight java %} +import java.util.Arrays; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.ml.feature.StopWordsRemover; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[]{ + new StructField("raw", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) +}); +DataFrame dataset = jsql.createDataFrame(rdd, schema); + +remover.transform(dataset).show(); +{% endhighlight %} + --- End diff -- Actually no Python example is possible until Python API is added (SPARK-9679, #8118). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134759745 [Test build #41564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41564/consoleFull) for PR 8436 at commit [`28a3deb`](https://github.com/apache/spark/commit/28a3deb11040c825970f6d05ca358f4a69f466d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134758057 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134758078 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134757388 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134756425 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134756426 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41563/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8436#discussion_r37924348 --- Diff: docs/ml-features.md --- @@ -306,6 +306,80 @@ regexTokenizer = RegexTokenizer(inputCol="sentence", outputCol="words", pattern= +## StopWordsRemover +[Stop words](https://en.wikipedia.org/wiki/Stop_words) are words which +should be excluded from the input, typically because the words appear +frequently and don't carry as much meaning. + +`StopWordsRemover` takes as input a sequence of strings (e.g. the output +of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop +words from the input sequences. The list of stopwords is specified by +the `stopWords` parameter. We provide a list of stop words created by +the [Glasgow Information Retrieval +Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in +`StopWords.EnglishStopWords`, which is used by default. + + + + + +[`StopWordsRemover`](api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. + +{% highlight scala %} +import org.apache.spark.ml.feature.StopWordsRemover + +val remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered") +val dataSet = sqlContext.createDataFrame(Seq( + (Seq("I", "saw", "the", "red", "baloon")), + (Seq("Mary", "had", "a", "little", "lamb")) +).map(Tuple1.apply)).toDF("raw") + +remover.transform(dataSet).show() +{% endhighlight %} + + + + +[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html) +takes an input column name, an output column name, a list of stop words, +and a boolean indicating if the matches should be case sensitive (false +by default. + +{% highlight java %} +import java.util.Arrays; + +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.ml.feature.StopWordsRemover; +import org.apache.spark.sql.DataFrame; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.RowFactory; +import org.apache.spark.sql.types.DataTypes; +import org.apache.spark.sql.types.Metadata; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +StopWordsRemover remover = new StopWordsRemover() + .setInputCol("raw") + .setOutputCol("filtered"); + +JavaRDD rdd = jsc.parallelize(Arrays.asList( + RowFactory.create(Arrays.asList("I", "saw", "the", "red", "baloon")), + RowFactory.create(Arrays.asList("Mary", "had", "a", "little", "lamb")) +)); +StructType schema = new StructType(new StructField[]{ + new StructField("raw", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty()) +}); +DataFrame dataset = jsql.createDataFrame(rdd, schema); + +remover.transform(dataset).show(); +{% endhighlight %} + --- End diff -- TODO: add Python docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
GitHub user feynmanliang opened a pull request: https://github.com/apache/spark/pull/8436 [SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility You can merge this pull request into a Git repository by running: $ git pull https://github.com/feynmanliang/spark SPARK-10230 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8436.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8436 commit 8034e2976aac76566ebb7fad4fca972bf8829461 Author: Feynman Liang Date: 2015-08-25T20:56:53Z StopWordsRemover Java Compatibility Test commit 28a3deb11040c825970f6d05ca358f4a69f466d9 Author: Feynman Liang Date: 2015-08-25T21:01:54Z Adds javadocs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134752583 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9680][MLlib][Doc] StopWordsRemovers use...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8436#issuecomment-134752644 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org