[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-220903177 I'm not sure if this will be shipped with Spark 2.0. If yes, we should update user guide accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r64152062 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,38 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @see [[http://www.localeplanet.com/java/]] + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { val outputSchema = transformSchema(dataset.schema) val t = if ($(caseSensitive)) { val stopWordsSet = $(stopWords).toSet udf { terms: Seq[String] => -terms.filter(s => !stopWordsSet.contains(s)) +terms.filterNot(stopWordsSet.contains) } } else { - // TODO: support user locale (SPARK-15064) - val toLower = (s: String) => if (s != null) s.toLowerCase else s + val loadedLocale = StopWordsRemover.loadLocale($(locale)) --- End diff -- Maybe just `new Locale($(locale))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r64152042 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) --- End diff -- Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r64151936 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala --- @@ -98,6 +98,7 @@ class StopWordsRemoverSuite .setInputCol("raw") .setOutputCol("filtered") .setStopWords(stopWords) + .setLocale("tr") --- End diff -- Maybe consider to use `// scalastyle:off` as necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-220838621 Can you specify the blocking? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r64151401 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") --- End diff -- Yes, En Locale is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r63139168 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") --- End diff -- But, in any event, if 'stopwords' is not set, English list will be loaded. I think, It is better to use English locale as default. If users want to change locale, they can simply change by setLocale. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r63138800 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) --- End diff -- Can we use LocaleUtils from https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/LocaleUtils.html ? It provides us to validate locale. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r63138686 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") --- End diff -- It's done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r63137161 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala --- @@ -98,6 +98,7 @@ class StopWordsRemoverSuite .setInputCol("raw") .setOutputCol("filtered") .setStopWords(stopWords) + .setLocale("tr") --- End diff -- I couldn't use special charset because of styles check, but I did it in Python's test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-218934465 This is blocking user guide /examples update for 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62461327 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") --- End diff -- If the English set is loaded by default the locale should match, rather than use the platform default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-217780806 Made a pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62452057 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala --- @@ -98,6 +98,7 @@ class StopWordsRemoverSuite .setInputCol("raw") .setOutputCol("filtered") .setStopWords(stopWords) + .setLocale("tr") --- End diff -- Maybe something more specific to test that Locale setter is working. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451491 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) + + /** @group getParam */ + def getLocale: String = $(locale) + + setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), +caseSensitive -> false, locale -> "en") --- End diff -- Comparing with EN, it perhaps better to use Locale.default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451393 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", +"locale for doing a case sensitive comparison") + + /** @group setParam */ + def setLocale(value: String): this.type = set(locale, value) --- End diff -- Add parameter check here or in transformSchema, to help detect error before pipeline executes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62451347 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") --- End diff -- Shall we list what're the available options, or provide some reference here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62428404 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", --- End diff -- For supported languages, we can know the appropriate locale and maintain an internal mapping. So "french" is known to map to `Locale.FRENCH`. For loading an arbitrary list, we don't know, but you could provide an overload where you provide a `Locale`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-217643539 @HyukjinKwon, thank you for informing. Yes, you're right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-217643227 (@burakkose I think the `cc:@mengxr` can be left in comments not in the PR description because I guess cc for someone to review may not be the part of the PR itself. `@` will be removed by merging script anyway but `cc:` and `mengxr` in the PR summary. I am a bit careful about this because it seems some committers agree with this and some not. but strictly I think it is right.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62417437 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -109,6 +126,7 @@ class StopWordsRemover(override val uid: String) object StopWordsRemover extends DefaultParamsReadable[StopWordsRemover] { private[feature] + def loadLocale(value : String) = new Locale(value) --- End diff -- (BTW, I guess it would be nicer if the return type is specified (See https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-ReturnTypes)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user burakkose commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62414991 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", --- End diff -- Yes, but, How can we know that users loaded the French stopwords? User can load stopwords by `StopWordsRemover.loadDefaultStopWords("french")` and setting is `new StopWordsRemover().setStopWords(stopWords)` . Do you have any suggestion about that case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/12968#discussion_r62413322 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String) /** @group getParam */ def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale for doing a case sensitive comparison + * Default: English locale ("en") + * @group param + */ + val locale: Param[String] = new Param[String](this, "locale", --- End diff -- Hm, shouldn't all this perhaps be linked to the stopwords set? if you loaded the French stopwords you'd want the French locale always? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12968#issuecomment-217583624 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...
GitHub user burakkose opened a pull request: https://github.com/apache/spark/pull/12968 [SPARK-15064][ML] Locale support in StopWordsRemover ## What changes were proposed in this pull request? - add locale support for ## How was this patch tested? python's unit tests cc:@mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/burakkose/spark 15064 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12968.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12968 commit 59c5611c0d1897a2446f27fec9e9c7ae0db0f4a4 Author: Burak Köse Date: 2016-05-06T22:54:22Z locale support to StopWords --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org