[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-23 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-220903177
  
I'm not sure if this will be shipped with Spark 2.0. If yes, we should 
update user guide accordingly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r64152062
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,38 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @see [[http://www.localeplanet.com/java/]]
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
+
+  /** @group getParam */
+  def getLocale: String = $(locale)
+
+  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
+caseSensitive -> false, locale -> "en")
 
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
   val stopWordsSet = $(stopWords).toSet
   udf { terms: Seq[String] =>
-terms.filter(s => !stopWordsSet.contains(s))
+terms.filterNot(stopWordsSet.contains)
   }
 } else {
-  // TODO: support user locale (SPARK-15064)
-  val toLower = (s: String) => if (s != null) s.toLowerCase else s
+  val loadedLocale = StopWordsRemover.loadLocale($(locale))
--- End diff --

Maybe just `new Locale($(locale))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r64152042
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
--- End diff --

Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r64151936
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
@@ -98,6 +98,7 @@ class StopWordsRemoverSuite
   .setInputCol("raw")
   .setOutputCol("filtered")
   .setStopWords(stopWords)
+  .setLocale("tr")
--- End diff --

Maybe consider to use `// scalastyle:off` as necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-22 Thread burakkose
Github user burakkose commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-220838621
  
Can you specify the blocking?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-22 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r64151401
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
+
+  /** @group getParam */
+  def getLocale: String = $(locale)
+
+  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
+caseSensitive -> false, locale -> "en")
--- End diff --

Yes, En Locale is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-13 Thread burakkose
Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r63139168
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
+
+  /** @group getParam */
+  def getLocale: String = $(locale)
+
+  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
+caseSensitive -> false, locale -> "en")
--- End diff --

But, in any event, if 'stopwords' is not set, English list will be loaded. 
I think, It is better to use English locale as default. If users want to change 
locale, they can simply change by setLocale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-13 Thread burakkose
Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r63138800
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
--- End diff --

Can we use LocaleUtils from 
https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/LocaleUtils.html
 ? It provides us to validate locale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-13 Thread burakkose
Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r63138686
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
--- End diff --

It's done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-13 Thread burakkose
Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r63137161
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
@@ -98,6 +98,7 @@ class StopWordsRemoverSuite
   .setInputCol("raw")
   .setOutputCol("filtered")
   .setStopWords(stopWords)
+  .setLocale("tr")
--- End diff --

I couldn't use special charset because of styles check, but I did it in 
Python's test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-12 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-218934465
  
This is blocking user guide /examples update for 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62461327
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
+
+  /** @group getParam */
+  def getLocale: String = $(locale)
+
+  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
+caseSensitive -> false, locale -> "en")
--- End diff --

If the English set is loaded by default the locale should match, rather 
than use the platform default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-217780806
  
Made a pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62452057
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
@@ -98,6 +98,7 @@ class StopWordsRemoverSuite
   .setInputCol("raw")
   .setOutputCol("filtered")
   .setStopWords(stopWords)
+  .setLocale("tr")
--- End diff --

Maybe something more specific to test that Locale setter is working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62451491
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
+
+  /** @group getParam */
+  def getLocale: String = $(locale)
+
+  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
+caseSensitive -> false, locale -> "en")
--- End diff --

Comparing with EN, it perhaps better to use Locale.default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62451393
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
+"locale for doing a case sensitive comparison")
+
+  /** @group setParam */
+  def setLocale(value: String): this.type = set(locale, value)
--- End diff --

Add parameter check here or in transformSchema, to help detect error before 
pipeline executes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62451347
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
--- End diff --

Shall we list what're the available options, or provide some reference here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62428404
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
--- End diff --

For supported languages, we can know the appropriate locale and maintain an 
internal mapping. So "french" is known to map to `Locale.FRENCH`. For loading 
an arbitrary list, we don't know, but you could provide an overload where you 
provide a `Locale`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-07 Thread burakkose
Github user burakkose commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-217643539
  
@HyukjinKwon, thank you for informing. Yes, you're right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-217643227
  
(@burakkose I think the `cc:@mengxr` can be left in comments not in the PR 
description because I guess cc for someone to review may not be the part of the 
PR itself. `@` will be removed by merging script anyway but `cc:` and `mengxr` 
in the PR summary. I am a bit careful about this because it seems some 
committers agree with this and some not. but strictly I think it is right.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62417437
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -109,6 +126,7 @@ class StopWordsRemover(override val uid: String)
 object StopWordsRemover extends DefaultParamsReadable[StopWordsRemover] {
 
   private[feature]
+  def loadLocale(value : String) = new Locale(value)
--- End diff --

(BTW, I guess it would be nicer if the return type is specified (See 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-ReturnTypes))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-07 Thread burakkose
Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62414991
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
--- End diff --

Yes, but, How can we know that users loaded the French stopwords? User can 
load stopwords by
`StopWordsRemover.loadDefaultStopWords("french")`
and setting is
`new StopWordsRemover().setStopWords(stopWords)`
. Do you have any suggestion about that case?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-07 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12968#discussion_r62413322
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -73,22 +75,37 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale for doing a case sensitive comparison
+   * Default: English locale ("en")
+   * @group param
+   */
+  val locale: Param[String] = new Param[String](this, "locale",
--- End diff --

Hm, shouldn't all this perhaps be linked to the stopwords set? if you 
loaded the French stopwords you'd want the French locale always?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12968#issuecomment-217583624
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15064][ML] Locale support in StopWordsR...

2016-05-06 Thread burakkose
GitHub user burakkose opened a pull request:

https://github.com/apache/spark/pull/12968

[SPARK-15064][ML] Locale support in StopWordsRemover

## What changes were proposed in this pull request?

- add locale support for 

## How was this patch tested?

python's unit tests

cc:@mengxr

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/burakkose/spark 15064

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12968


commit 59c5611c0d1897a2446f27fec9e9c7ae0db0f4a4
Author: Burak Köse 
Date:   2016-05-06T22:54:22Z

locale support to StopWords




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org