GitHub user jmchung opened a pull request: https://github.com/apache/spark/pull/19017
SPARK-21804: json_tuple returns null values within repeated columns except the first one ## What changes were proposed in this pull request? When json_tuple in extracting values from JSON it returns null values within repeated columns except the first one as below: ``` scala scala> spark.sql("""SELECT json_tuple('{"a":1, "b":2}', 'a', 'b', 'a')""").show() +---+---+----+ | c0| c1| c2| +---+---+----+ | 1| 2|null| +---+---+----+ ``` I think this should be consistent with Hive's implementation: ``` hive> SELECT json_tuple('{"a": 1, "b": 2}', 'a', 'a'); ... 1 1 ``` In this PR, we located all the matched indices in `fieldNames` instead of returning the first matched index, i.e., indexOf. ## How was this patch tested? Added test in JsonExpressionsSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-21804 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19017.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19017 ---- commit f04b896f3a8b3befdc1cbfb60464dfdcb019b684 Author: Jen-Ming Chung <jenmingi...@gmail.com> Date: 2017-08-22T06:38:40Z SPARK-21804: json_tuple returns null values within repeated columns except the first one ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org