[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

hhbyyh Fri, 08 Jun 2018 08:54:42 -0700

Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194098947
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") 
override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when 
[[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    --- End diff --
    
    I understand that Locale.getDefault is the current behavior, yet since 
we're using english as default stopwords, I'm leaning towards using English as 
default locale.
    Also using Locale.getDefault means that the same code will behave 
differently on different nodes, which adds extra complexity for 
trouble-shooting.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Reply via email to