Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194098947
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") 
override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when 
[[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    --- End diff --
    
    I understand that Locale.getDefault is the current behavior, yet since 
we're using english as default stopwords, I'm leaning towards using English as 
default locale.
    Also using Locale.getDefault means that the same code will behave 
differently on different nodes, which adds extra complexity for 
trouble-shooting. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to