Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r194098947 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String @Since("1.5.0") def getCaseSensitive: Boolean = $(caseSensitive) - setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false) + /** + * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]] + * is true. + * Default: Locale.getDefault.toString --- End diff -- I understand that Locale.getDefault is the current behavior, yet since we're using english as default stopwords, I'm leaning towards using English as default locale. Also using Locale.getDefault means that the same code will behave differently on different nodes, which adds extra complexity for trouble-shooting.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org