Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-26 Thread Pietro Pugni
And what if the month abbreviation is upper-case? Java doesn’t parse the 
month-name, for example if it's “JAN" instead of “Jan” or “DEC” instead of 
“Dec". Is it possible to solve this issue without using UDFs? 

Many thanks again
 Pietro


> Il giorno 24 ott 2016, alle ore 17:33, Pietro Pugni  
> ha scritto:
> 
> This worked without setting other options:
> spark/bin/spark-submit --conf 
> "spark.driver.extraJavaOptions=-Duser.language=en" test.py
> 
> Thank you again!
>  Pietro
> 
>> Il giorno 24 ott 2016, alle ore 17:18, Sean Owen > > ha scritto:
>> 
>> I believe it will be too late to set it there, and these are JVM flags, not 
>> app or Spark flags. See spark.driver.extraJavaOptions and likewise for the 
>> executor.
>> 
>> On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni > > wrote:
>> Thank you!
>> 
>> I tried again setting locale options in different ways but doesn’t propagate 
>> to the JVM. I tested these strategies (alone and all together):
>> - bin/spark-submit --conf 
>> "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US 
>> -Duser.country=US -Duser.timezone=GMT” test.py
>> - spark = SparkSession \
>>  .builder \
>>  .appName("My app") \
>>  .config("spark.executor.extraJavaOptions", "-Duser.language=en 
>> -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \
>>  .config("user.country", "US") \
>>  .config("user.region", "US") \
>>  .config("user.language", "en") \
>>  .config("user.timezone", "GMT") \
>>  .config("-Duser.country", "US") \
>>  .config("-Duser.region", "US") \
>>  .config("-Duser.language", "en") \
>>  .config("-Duser.timezone", "GMT") \
>>  .getOrCreate()
>> - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US 
>> -Duser.timezone=GMT”
>> - export LANG="en_US.UTF-8”
>> 
>> After running export LANG="en_US.UTF-8” from the same terminal session I use 
>> to launch spark-submit, if I run locale command I get correct values:
>> LANG="en_US.UTF-8"
>> LC_COLLATE="en_US.UTF-8"
>> LC_CTYPE="en_US.UTF-8"
>> LC_MESSAGES="en_US.UTF-8"
>> LC_MONETARY="en_US.UTF-8"
>> LC_NUMERIC="en_US.UTF-8"
>> LC_TIME="en_US.UTF-8"
>> LC_ALL=
>> 
>> While running my pyspark script, from the Spark UI,  under Environment -> 
>> Spark Properties the locale appear to be correctly set:
>> - user.country: US
>> - user.language: en
>> - user.region: US
>> - user.timezone: GMT
>> 
>> but Environment -> System Properties still reports the System locale and not 
>> the session locale I previously set:
>> - user.country: IT
>> - user.language: it
>> - user.timezone: Europe/Rome
>> 
>> Am I wrong or the options don’t propagate to the JVM correctly?
>> 
>> 
> 



Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
This worked without setting other options:
spark/bin/spark-submit --conf 
"spark.driver.extraJavaOptions=-Duser.language=en" test.py

Thank you again!
 Pietro

> Il giorno 24 ott 2016, alle ore 17:18, Sean Owen  ha 
> scritto:
> 
> I believe it will be too late to set it there, and these are JVM flags, not 
> app or Spark flags. See spark.driver.extraJavaOptions and likewise for the 
> executor.
> 
> On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni  > wrote:
> Thank you!
> 
> I tried again setting locale options in different ways but doesn’t propagate 
> to the JVM. I tested these strategies (alone and all together):
> - bin/spark-submit --conf "spark.executor.extraJavaOptions=-Duser.language=en 
> -Duser.region=US -Duser.country=US -Duser.timezone=GMT” test.py
> - spark = SparkSession \
>   .builder \
>   .appName("My app") \
>   .config("spark.executor.extraJavaOptions", "-Duser.language=en 
> -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \
>   .config("user.country", "US") \
>   .config("user.region", "US") \
>   .config("user.language", "en") \
>   .config("user.timezone", "GMT") \
>   .config("-Duser.country", "US") \
>   .config("-Duser.region", "US") \
>   .config("-Duser.language", "en") \
>   .config("-Duser.timezone", "GMT") \
>   .getOrCreate()
> - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US 
> -Duser.timezone=GMT”
> - export LANG="en_US.UTF-8”
> 
> After running export LANG="en_US.UTF-8” from the same terminal session I use 
> to launch spark-submit, if I run locale command I get correct values:
> LANG="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_CTYPE="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_ALL=
> 
> While running my pyspark script, from the Spark UI,  under Environment -> 
> Spark Properties the locale appear to be correctly set:
> - user.country: US
> - user.language: en
> - user.region: US
> - user.timezone: GMT
> 
> but Environment -> System Properties still reports the System locale and not 
> the session locale I previously set:
> - user.country: IT
> - user.language: it
> - user.timezone: Europe/Rome
> 
> Am I wrong or the options don’t propagate to the JVM correctly?
> 
> 



Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
I believe it will be too late to set it there, and these are JVM flags, not
app or Spark flags. See spark.driver.extraJavaOptions and likewise for the
executor.

On Mon, Oct 24, 2016 at 4:04 PM Pietro Pugni  wrote:

> Thank you!
>
> I tried again setting locale options in different ways but doesn’t
> propagate to the JVM. I tested these strategies (alone and all together):
> - bin/spark-submit --conf
> "spark.executor.extraJavaOptions=-Duser.language=en -Duser.region=US
> -Duser.country=US -Duser.timezone=GMT” test.py
> - spark = SparkSession \
> .builder \
> .appName("My app") \
> .config("spark.executor.extraJavaOptions", "-Duser.language=en
> -Duser.region=US -Duser.country=US -Duser.timezone=GMT") \
> .config("user.country", "US") \
> .config("user.region", "US") \
> .config("user.language", "en") \
> .config("user.timezone", "GMT") \
> .config("-Duser.country", "US") \
> .config("-Duser.region", "US") \
> .config("-Duser.language", "en") \
> .config("-Duser.timezone", "GMT") \
> .getOrCreate()
> - export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US
> -Duser.timezone=GMT”
> - export LANG="en_US.UTF-8”
>
> After running export LANG="en_US.UTF-8” from the same terminal session I
> use to launch spark-submit, if I run locale command I get correct values:
> LANG="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_CTYPE="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_ALL=
>
> While running my pyspark script, from the Spark UI,  under Environment ->
> Spark Properties the locale appear to be correctly set:
> - user.country: US
> - user.language: en
> - user.region: US
> - user.timezone: GMT
>
> but Environment -> System Properties still reports the System locale and
> not the session locale I previously set:
> - user.country: IT
> - user.language: it
> - user.timezone: Europe/Rome
>
> Am I wrong or the options don’t propagate to the JVM correctly?
>
>
>


Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
Thank you!

I tried again setting locale options in different ways but doesn’t propagate to 
the JVM. I tested these strategies (alone and all together):
- bin/spark-submit --conf "spark.executor.extraJavaOptions=-Duser.language=en 
-Duser.region=US -Duser.country=US -Duser.timezone=GMT” test.py
- spark = SparkSession \
.builder \
.appName("My app") \
.config("spark.executor.extraJavaOptions", "-Duser.language=en 
-Duser.region=US -Duser.country=US -Duser.timezone=GMT") \
.config("user.country", "US") \
.config("user.region", "US") \
.config("user.language", "en") \
.config("user.timezone", "GMT") \
.config("-Duser.country", "US") \
.config("-Duser.region", "US") \
.config("-Duser.language", "en") \
.config("-Duser.timezone", "GMT") \
.getOrCreate()
- export JAVA_OPTS="-Duser.language=en -Duser.region=US -Duser.country=US 
-Duser.timezone=GMT”
- export LANG="en_US.UTF-8”

After running export LANG="en_US.UTF-8” from the same terminal session I use to 
launch spark-submit, if I run locale command I get correct values:
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

While running my pyspark script, from the Spark UI,  under Environment -> Spark 
Properties the locale appear to be correctly set:
- user.country: US
- user.language: en
- user.region: US
- user.timezone: GMT

but Environment -> System Properties still reports the System locale and not 
the session locale I previously set:
- user.country: IT
- user.language: it
- user.timezone: Europe/Rome

Am I wrong or the options don’t propagate to the JVM correctly?




> Il giorno 24 ott 2016, alle ore 16:49, Sean Owen  ha 
> scritto:
> 
> This is more of an OS-level thing, but I think that if you can manage to set 
> -Duser.language=en to the JVM, it might do the trick.
> 
> I summarized what I think I know about this at 
> https://issues.apache.org/jira/browse/SPARK-18076 
>  and so we can decide what 
> to do, if anything, there.
> 
> Sean
> 
> On Mon, Oct 24, 2016 at 3:08 PM Pietro Pugni  > wrote:
> Thank you, I’ll appreciate that. I have no experience with Python, Java and 
> Spark, so I the question can be translated to: “How can I set JVM locale when 
> using spark-submit and pyspark?”. Probably this is possible only by changing 
> the system defaul locale and not within the Spark session, right?
> 
> Thank you
>  Pietro
> 
>> Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon > > ha scritto:
>> 
>> I am also interested in this issue. I will try to look into this too within 
>> coming few days..
>> 
>> 2016-10-24 21:32 GMT+09:00 Sean Owen > >:
>> I actually think this is a general problem with usage of DateFormat and 
>> SimpleDateFormat across the code, in that it relies on the default locale of 
>> the JVM. I believe this needs to, at least, default consistently to 
>> Locale.US so that behavior is consistent; otherwise it's possible that 
>> parsing and formatting of dates could work subtly differently across 
>> environments.
>> 
>> There's a similar question about some code that formats dates for the UI. 
>> It's more reasonable to let that use the platform-default locale, but, I'd 
>> still favor standardizing it I think.
>> 
>> Anyway, let me test it out a bit and possibly open a JIRA with this change 
>> for discussion.
>> 
>> On Mon, Oct 24, 2016 at 1:03 PM pietrop > > wrote:
>> Hi there,
>> I opened a question on StackOverflow at this link:
>> http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972
>>  
>> 
>> 
>> I didn’t get any useful answer, so I’m writing here hoping that someone can
>> help me.
>> 
>> In short, I’m trying to read a CSV containing data columns stored using the
>> pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
>> testing and discovered that it’s a localization issue. As you can read from
>> the StackOverflow question, I run a simple Java code to parse the date
>> “1989Dec31” and it works only if I specify Locale.US in the
>> SimpleDateFormat() function.
>> 
>> I would like pyspark to work. I tried setting a different local from console
>> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the
>> locale package from Python.
>> 
>> So, there’s a way to set locale in Spark when using pyspark? The issue is
>> Java related and 

Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
This is more of an OS-level thing, but I think that if you can manage to
set -Duser.language=en to the JVM, it might do the trick.

I summarized what I think I know about this at
https://issues.apache.org/jira/browse/SPARK-18076 and so we can decide what
to do, if anything, there.

Sean

On Mon, Oct 24, 2016 at 3:08 PM Pietro Pugni  wrote:

> Thank you, I’ll appreciate that. I have no experience with Python, Java
> and Spark, so I the question can be translated to: “How can I set JVM
> locale when using spark-submit and pyspark?”. Probably this is possible
> only by changing the system defaul locale and not within the Spark session,
> right?
>
> Thank you
>  Pietro
>
> Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon 
> ha scritto:
>
> I am also interested in this issue. I will try to look into this too
> within coming few days..
>
> 2016-10-24 21:32 GMT+09:00 Sean Owen :
>
> I actually think this is a general problem with usage of DateFormat and
> SimpleDateFormat across the code, in that it relies on the default locale
> of the JVM. I believe this needs to, at least, default consistently to
> Locale.US so that behavior is consistent; otherwise it's possible that
> parsing and formatting of dates could work subtly differently across
> environments.
>
> There's a similar question about some code that formats dates for the UI.
> It's more reasonable to let that use the platform-default locale, but, I'd
> still favor standardizing it I think.
>
> Anyway, let me test it out a bit and possibly open a JIRA with this change
> for discussion.
>
> On Mon, Oct 24, 2016 at 1:03 PM pietrop  wrote:
>
> Hi there,
> I opened a question on StackOverflow at this link:
>
> http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972
>
> I didn’t get any useful answer, so I’m writing here hoping that someone can
> help me.
>
> In short, I’m trying to read a CSV containing data columns stored using the
> pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
> testing and discovered that it’s a localization issue. As you can read from
> the StackOverflow question, I run a simple Java code to parse the date
> “1989Dec31” and it works only if I specify Locale.US in the
> SimpleDateFormat() function.
>
> I would like pyspark to work. I tried setting a different local from
> console
> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the
> locale package from Python.
>
> So, there’s a way to set locale in Spark when using pyspark? The issue is
> Java related and not Python related (the function that parses data is
> invoked by spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use
> other solutions in order to encode data because they are slower (from what
> I’ve seen so far).
>
> Thank you
> Pietro
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com
> .
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>


Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Pietro Pugni
Thank you, I’ll appreciate that. I have no experience with Python, Java and 
Spark, so I the question can be translated to: “How can I set JVM locale when 
using spark-submit and pyspark?”. Probably this is possible only by changing 
the system defaul locale and not within the Spark session, right?

Thank you
 Pietro

> Il giorno 24 ott 2016, alle ore 14:51, Hyukjin Kwon  ha 
> scritto:
> 
> I am also interested in this issue. I will try to look into this too within 
> coming few days..
> 
> 2016-10-24 21:32 GMT+09:00 Sean Owen  >:
> I actually think this is a general problem with usage of DateFormat and 
> SimpleDateFormat across the code, in that it relies on the default locale of 
> the JVM. I believe this needs to, at least, default consistently to Locale.US 
> so that behavior is consistent; otherwise it's possible that parsing and 
> formatting of dates could work subtly differently across environments.
> 
> There's a similar question about some code that formats dates for the UI. 
> It's more reasonable to let that use the platform-default locale, but, I'd 
> still favor standardizing it I think.
> 
> Anyway, let me test it out a bit and possibly open a JIRA with this change 
> for discussion.
> 
> On Mon, Oct 24, 2016 at 1:03 PM pietrop  > wrote:
> Hi there,
> I opened a question on StackOverflow at this link:
> http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972
>  
> 
> 
> I didn’t get any useful answer, so I’m writing here hoping that someone can
> help me.
> 
> In short, I’m trying to read a CSV containing data columns stored using the
> pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
> testing and discovered that it’s a localization issue. As you can read from
> the StackOverflow question, I run a simple Java code to parse the date
> “1989Dec31” and it works only if I specify Locale.US in the
> SimpleDateFormat() function.
> 
> I would like pyspark to work. I tried setting a different local from console
> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the
> locale package from Python.
> 
> So, there’s a way to set locale in Spark when using pyspark? The issue is
> Java related and not Python related (the function that parses data is
> invoked by spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use
> other solutions in order to encode data because they are slower (from what
> I’ve seen so far).
> 
> Thank you
> Pietro
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html
>  
> 
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
> 



Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Hyukjin Kwon
I am also interested in this issue. I will try to look into this too within
coming few days..

2016-10-24 21:32 GMT+09:00 Sean Owen :

> I actually think this is a general problem with usage of DateFormat and
> SimpleDateFormat across the code, in that it relies on the default locale
> of the JVM. I believe this needs to, at least, default consistently to
> Locale.US so that behavior is consistent; otherwise it's possible that
> parsing and formatting of dates could work subtly differently across
> environments.
>
> There's a similar question about some code that formats dates for the UI.
> It's more reasonable to let that use the platform-default locale, but, I'd
> still favor standardizing it I think.
>
> Anyway, let me test it out a bit and possibly open a JIRA with this change
> for discussion.
>
> On Mon, Oct 24, 2016 at 1:03 PM pietrop  wrote:
>
> Hi there,
> I opened a question on StackOverflow at this link:
> http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-
> dateformat-pattern-in-spark-read-load-for-dates?
> noredirect=1#comment67297930_40007972
>
> I didn’t get any useful answer, so I’m writing here hoping that someone can
> help me.
>
> In short, I’m trying to read a CSV containing data columns stored using the
> pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
> testing and discovered that it’s a localization issue. As you can read from
> the StackOverflow question, I run a simple Java code to parse the date
> “1989Dec31” and it works only if I specify Locale.US in the
> SimpleDateFormat() function.
>
> I would like pyspark to work. I tried setting a different local from
> console
> (LANG=“en_US”), but it doesn’t work. I tried also setting it using the
> locale package from Python.
>
> So, there’s a way to set locale in Spark when using pyspark? The issue is
> Java related and not Python related (the function that parses data is
> invoked by spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use
> other solutions in order to encode data because they are slower (from what
> I’ve seen so far).
>
> Thank you
> Pietro
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-
> dateFormat-pattern-in-spark-read-load-for-dates-like-
> 1989Dec31-and-31D9-tp27951.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread Sean Owen
I actually think this is a general problem with usage of DateFormat and
SimpleDateFormat across the code, in that it relies on the default locale
of the JVM. I believe this needs to, at least, default consistently to
Locale.US so that behavior is consistent; otherwise it's possible that
parsing and formatting of dates could work subtly differently across
environments.

There's a similar question about some code that formats dates for the UI.
It's more reasonable to let that use the platform-default locale, but, I'd
still favor standardizing it I think.

Anyway, let me test it out a bit and possibly open a JIRA with this change
for discussion.

On Mon, Oct 24, 2016 at 1:03 PM pietrop  wrote:

Hi there,
I opened a question on StackOverflow at this link:
http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972

I didn’t get any useful answer, so I’m writing here hoping that someone can
help me.

In short, I’m trying to read a CSV containing data columns stored using the
pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
testing and discovered that it’s a localization issue. As you can read from
the StackOverflow question, I run a simple Java code to parse the date
“1989Dec31” and it works only if I specify Locale.US in the
SimpleDateFormat() function.

I would like pyspark to work. I tried setting a different local from console
(LANG=“en_US”), but it doesn’t work. I tried also setting it using the
locale package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is
Java related and not Python related (the function that parses data is
invoked by spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use
other solutions in order to encode data because they are slower (from what
I’ve seen so far).

Thank you
Pietro



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-24 Thread pietrop
Hi there,
I opened a question on StackOverflow at this link:
http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972

I didn’t get any useful answer, so I’m writing here hoping that someone can
help me.

In short, I’m trying to read a CSV containing data columns stored using the
pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some
testing and discovered that it’s a localization issue. As you can read from
the StackOverflow question, I run a simple Java code to parse the date
“1989Dec31” and it works only if I specify Locale.US in the
SimpleDateFormat() function.

I would like pyspark to work. I tried setting a different local from console
(LANG=“en_US”), but it doesn’t work. I tried also setting it using the
locale package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is
Java related and not Python related (the function that parses data is
invoked by spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use
other solutions in order to encode data because they are slower (from what
I’ve seen so far).

Thank you
Pietro



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-doesn-t-recognize-MMM-dateFormat-pattern-in-spark-read-load-for-dates-like-1989Dec31-and-31D9-tp27951.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



pyspark doesn't recognize MMM dateFormat pattern in spark.read.load() for dates like 1989Dec31 and 31Dec1989

2016-10-13 Thread Pietro Pugni
Hi there,
I opened a question on StackOverflow at this link: 
http://stackoverflow.com/questions/40007972/pyspark-doesnt-recognize-mmm-dateformat-pattern-in-spark-read-load-for-dates?noredirect=1#comment67297930_40007972

I didn’t get any useful answer, so I’m writing here hoping that someone can 
help me.

In short, I’m trying to read a CSV containing data columns stored using the 
pattern “MMMdd”. What doesn’t work for me is “MMM”. I’ve done some testing 
and discovered that it’s a localization issue. As you can read from the 
StackOverflow question, I run a simple Java code to parse the date “1989Dec31” 
and it works only if I specify Locale.US in the SimpleDateFormat() function.

I would like pyspark to work. I tried setting a different local from console 
(LANG=“en_US”), but it doesn’t work. I tried also setting it using the locale 
package from Python.

So, there’s a way to set locale in Spark when using pyspark? The issue is Java 
related and not Python related (the function that parses data is invoked by 
spark.read.load(dateFormat=“MMMdd”, …). I don’t want to use other solutions 
in order to encode data because they are slower (from what I’ve seen so far).

Thank you
 Pietro
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org