Thank you! I tried again setting locale options in different ways but doesn’t propagate to the JVM. I tested these strategies (alone and all together): - bin/spark-submit --conf "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US -Duser.country=US -Duser.timezone=GMT” test.py - spark = SparkSession \ .builder \ .appName("My app") \ .config("spark.executor.extraJavaOptions", "-Duser.language=en -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \ .config("user.country", "US") \ .config("user.region", "US") \ .config("user.language", "en") \ .config("user.timezone", "GMT") \ .config("-Duser.country", "US") \ .config("-Duser.region", "US") \ .config("-Duser.language", "en") \ .config("-Duser.timezone", "GMT") \ .getOrCreate() - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US -Duser.timezone=GMT” - export LANG="en_US.UTF-8”
After running export LANG="en_US.UTF-8” from the same terminal session I use to launch spark-submit, if I run locale command I get correct values: LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= While running my pyspark script, from the Spark UI, under Environment -> Spark Properties the locale appear to be correctly set: - user.country: US - user.language: en - user.region: US - user.timezone: GMT but Environment -> System Properties still reports the System locale and not the session locale I previously set: - user.country: IT - user.language: it - user.timezone: Europe/Rome Am I wrong or the options don’t propagate to the JVM correctly? > Il giorno 24 ott 2016, alle ore 16:49, Sean Owen <so...@cloudera.com> ha > scritto: > > This is more of an OS-level thing, but I think that if you can manage to set > -Duser.language=en to the JVM, it might do the trick. > > I summarized what I think I know about this at > https://issues.apache.org/jira/browse/SPARK-18076 > <https://issues.apache.org/jira/browse/SPARK-18076> and so we can decide what > to do, if anything, there. > > Sean > > On Mon, Oct 24, 2016 at 3:08 PM Pietro Pugni <pietro.pu...@gmail.com > <mailto:pietro.pu...@gmail.com>> wrote: > Thank you, I’ll appreciate that. I have no experience with Python, Java and > Spark, so I the question can be translated to: “How can I set JVM locale when > using spark-submit and pyspark?”. Probably this is possible only by changing > the system defaul locale and not within the Spark session, right? > > Thank you > Pietro > >> Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon <gurwls...@gmail.com >> <mailto:gurwls...@gmail.com>> ha scritto: >> >> I am also interested in this issue. I will try to look into this too within >> coming few days.. >> >> 2016-10-24 21:32 GMT+09:00 Sean Owen <so...@cloudera.com >> <mailto:so...@cloudera.com>>: >> I actually think this is a general problem with usage of DateFormat and >> SimpleDateFormat across the code, in that it relies on the default locale of >> the JVM. I believe this needs to, at least, default consistently to >> Locale.US so that behavior is consistent; otherwise it's possible that >> parsing and formatting of dates could work subtly differently across >> environments. >> >> There's a similar question about some code that formats dates for the UI. >> It's more reasonable to let that use the platform-default locale, but, I'd >> still favor standardizing it I think. >> >> Anyway, let me test it out a bit and possibly open a JIRA with this change >> for discussion. >> >> On Mon, Oct 24, 2016 at 1:03 PM pietrop <pietro.pu...@gmail.com >> <mailto:pietro.pu...@gmail.com>> wrote: >> Hi there, >> I opened a question on StackOverflow at this link: >> http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972 >> >> <http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972> >> >> I didn’t get any useful answer, so I’m writing here hoping that someone can >> help me. >> >> In short, I’m trying to read a CSV containing data columns stored using the >> pattern “yyyyMMMdd”. What doesn’t work for me is “MMM”. I’ve done some >> testing and discovered that it’s a localization issue. As you can read from >> the StackOverflow question, I run a simple Java code to parse the date >> “1989Dec31” and it works only if I specify Locale.US in the >> SimpleDateFormat() function. >> >> I would like pyspark to work. I tried setting a different local from console >> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the >> locale package from Python. >> >> So, there’s a way to set locale in Spark when using pyspark? The issue is >> Java related and not Python related (the function that parses data is >> invoked by spark.read.load(dateFormat=“yyyyMMMdd”, …). I don’t want to use >> other solutions in order to encode data because they are slower (from what >> I’ve seen so far). >> >> Thank you >> Pietro >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html >> >> <http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html> >> Sent from the Apache Spark User List mailing list archive at Nabble.com >> <http://nabble.com/>. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >> >