[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor of `StopWordsRemover`. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. Here's an example full stack trace: ``` Py4JJavaError: An error occurred while calling None.org.apache.spark.ml.feature.StopWordsRemover. : java.lang.IllegalArgumentException: StopWordsRemover_daf8924a73f7 parameter locale given invalid value pl_US. at org.apache.spark.ml.param.Param.validate(params.scala:77) at org.apache.spark.ml.param.ParamPair.(params.scala:656) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:87) at org.apache.spark.ml.feature.StopWordsRemover.(StopWordsRemover.scala:109) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938 > Specifying the en-US locale directly in StopWordsRemover This isn't possible because the error is thrown in its constructor of `StopWordsRemover`. This PR actually targets to allow to set different locale (vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or OS only to use this API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-511655631 Changing locale would work in OS default or JVM default .. but as far as I remember, we tried to use `Locale.US` within Spark. So it might be fine to fall back to `Locale.US` by default .. Otherwise, we will have to let users to force the locale to another ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541 Seems like some locales like `en-TW` or `pl-US` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale: ```scala scala> val locale = java.util.Locale.forLanguageTag("a") locale: java.util.Locale = scala> java.text.NumberFormat.getInstance(locale).format(12345) res1: String = 12,345 ``` If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`) to use this particular API. For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541 Seems like some locales like `en-TW` or `pl-US` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale: ```scala scala> val locale = java.util.Locale.forLanguageTag("a") locale: java.util.Locale = scala> java.text.NumberFormat.getInstance(locale).format(12345) res1: String = 12,345 ``` If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`) to use this particular API. For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541 Seems like some locales like `en-TW` or `pl-US` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale: ```scala scala> val locale = java.util.Locale.forLanguageTag("a") locale: java.util.Locale = scala> java.text.NumberFormat.getInstance(locale).format(12345) res1: String = 12,345 ``` If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`). For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org