Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-26 Thread Pietro Pugni
And what if the month abbreviation is upper-case? Java doesn’t parse the month-name, for example if it's “JAN" instead of “Jan” or “DEC” instead of “Dec". Is it possible to solve this issue without using UDFs? Many thanks again Pietro > Il giorno 24 ott 2016, alle ore 17:33, Pietro Pugni

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
This worked without setting other options: spark/bin/spark-submit --conf "spark.driver.extraJavaOptions=-Duser.language=en" test.py Thank you again! Pietro > Il giorno 24 ott 2016, alle ore 17:18, Sean Owen ha > scritto: > > I believe it will be too late to set it there,

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
I believe it will be too late to set it there, and these are JVM flags, not app or Spark flags. See spark.driver.extraJavaOptions and likewise for the executor. On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni wrote: > Thank you! > > I tried again setting locale options in

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
Thank you! I tried again setting locale options in different ways but doesn’t propagate to the JVM. I tested these strategies (alone and all together): - bin/spark-submit --conf "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US -Duser.country=US -Duser.timezone=GMT” test.py -

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
This is more of an OS-level thing, but I think that if you can manage to set -Duser.language=en to the JVM, it might do the trick. I summarized what I think I know about this at https://issues.apache.org/jira/browse/SPARK-18076 and so we can decide what to do, if anything, there. Sean On Mon,

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
Thank you, I’ll appreciate that. I have no experience with Python, Java and Spark, so I the question can be translated to: “How can I set JVM locale when using spark-submit and pyspark?”. Probably this is possible only by changing the system defaul locale and not within the Spark session,

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Hyukjin Kwon
I am also interested in this issue. I will try to look into this too within coming few days.. 2016-10-24 21:32 GMT+09:00 Sean Owen : > I actually think this is a general problem with usage of DateFormat and > SimpleDateFormat across the code, in that it relies on the default

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
I actually think this is a general problem with usage of DateFormat and SimpleDateFormat across the code, in that it relies on the default locale of the JVM. I believe this needs to, at least, default consistently to Locale.US so that behavior is consistent; otherwise it's possible that parsing

pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread pietrop
Hi there, I opened a question on StackOverflow at this link: http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972 I didn’t get any useful answer, so I’m writing here hoping that someone can

pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-13 Thread Pietro Pugni
Hi there, I opened a question on StackOverflow at this link: http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972 I didn’t get any useful answer, so I’m writing here hoping that someone can