Re: json_tuple fails to parse string with emoji

2017-01-26 Thread Andrew Ehrlich
It looks like I'm hitting this bug in jackson-core 2.2.3 which is included
in the version of CDH I'm on:
https://github.com/FasterXML/jackson-core/issues/115

Jackson-core 2.3.0 has the fix.

On Tue, Jan 24, 2017 at 5:14 PM, Andrew Ehrlich  wrote:

> On Spark 1.6.0, calling json_tuple() with an emoji character in one of the
> values returns nulls:
>
> Input:
> """
> "myJsonBody": {
>   "field1": ""
> }
> """
>
> Query:
> """
> ...
> LATERAL VIEW JSON_TUPLE(e.myJsonBody,'field1') k AS field1,
> ...
>
> """
>
> This looks like a platform-dependent issue; the parsing works fine on my
> local computer (OSX, 1.6.3) and fails on the remote cluster(Centos7, 1.6.0)
>
> I noticed that in 1.6.0, json_tuple was implemented this way:
> https://github.com/apache/spark/pull/7946/files
>
> So far I have:
>
>- Checked all java system properties related to charsets on drivers
>and executors
>- Turned up logging to debug level and checked for relevant messages
>
> Any more input? Should I try the dev mailing list?
>


json_tuple fails to parse string with emoji

2017-01-24 Thread Andrew Ehrlich
On Spark 1.6.0, calling json_tuple() with an emoji character in one of the
values returns nulls:

Input:
"""
"myJsonBody": {
  "field1": ""
}
"""

Query:
"""
...
LATERAL VIEW JSON_TUPLE(e.myJsonBody,'field1') k AS field1,
...

"""

This looks like a platform-dependent issue; the parsing works fine on my
local computer (OSX, 1.6.3) and fails on the remote cluster(Centos7, 1.6.0)

I noticed that in 1.6.0, json_tuple was implemented this way:
https://github.com/apache/spark/pull/7946/files

So far I have:

   - Checked all java system properties related to charsets on drivers and
   executors
   - Turned up logging to debug level and checked for relevant messages

Any more input? Should I try the dev mailing list?