[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-18 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938
 
 
   > Specifying the en-US locale directly in StopWordsRemover
   
   This isn't possible because the error is thrown in its constructor of 
`StopWordsRemover`. This PR actually targets to allow to set different locale 
(vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into 
JVM or OS only to use this API.
   
   Here's an example full stack trace:
   
   ```
   Py4JJavaError: An error occurred while calling 
None.org.apache.spark.ml.feature.StopWordsRemover.
   : java.lang.IllegalArgumentException: StopWordsRemover_daf8924a73f7 
parameter locale given invalid value pl_US.
   at org.apache.spark.ml.param.Param.validate(params.scala:77)
   at org.apache.spark.ml.param.ParamPair.(params.scala:656)
   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:87)
   at 
org.apache.spark.ml.feature.StopWordsRemover.(StopWordsRemover.scala:109)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   at py4j.Gateway.invoke(Gateway.java:238)
   at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
   at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
   at py4j.GatewayConnection.run(GatewayConnection.java:238)
   at java.lang.Thread.run(Thread.java:748)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-18 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938
 
 
   > Specifying the en-US locale directly in StopWordsRemover
   
   This isn't possible because the error is thrown in its constructor. This PR 
actually targets to allow to set different locale (vis 
`StopWordsRemover.setLocale`). Otherwise, the locale should be set into JVM or 
OS only to use this API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-18 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-513056938
 
 
   > Specifying the en-US locale directly in StopWordsRemover
   
   This isn't possible because the error is thrown in its constructor of 
`StopWordsRemover`. This PR actually targets to allow to set different locale 
(vis `StopWordsRemover.setLocale`). Otherwise, the locale should be set into 
JVM or OS only to use this API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-15 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511655631
 
 
   Changing locale would work in OS default or JVM default .. but as far as I 
remember, we tried to use `Locale.US` within Spark. So it might be fine to fall 
back to `Locale.US` by default ..  Otherwise, we will have to let users to 
force the locale to another ...
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`) to use this 
particular API. For instance, if the locale specifies, " an English-speaking, 
Taiwanese locale." which I believe is a legitimate locale but not available in 
JVM, it seems not going to work. I found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`) to use this 
particular API.
   
   For instance, if the locale specifies, " an English-speaking, Taiwanese 
locale." which I believe is a legitimate locale but not available in JVM, it 
seems not going to work.
   
   I found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread GitBox
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale 
to en_US in StopWordsRemover if system default locale isn't in available 
locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US` is not available in Java - 
https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . 
Seems like not all locales are supported and in this cases the locale seems to 
be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or 
JVM locale, or access to private property in PySpark (`_jvm`). For instance, if 
the locale specifies, " an English-speaking, Taiwanese locale." which I believe 
is a legitimate locale but not available in JVM, it seems not going to work. I 
found one [StackOverFlow 
question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value)
 about `pl-US`. In addition, I found one similar fix 
(`https://github.com/godotengine/godot/pull/6910`) in this case.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org