[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
GitHub user Fokko opened a pull request: https://github.com/apache/spark/pull/21596 [SPARK-24601] Bump Jackson version Hi all, Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a more recent version of Jackson. This can be fixed by adding exclusions and then it works without any issues. So no breaking changes in the API's. I would also consider bumping the version of Jackson in Spark. I would suggest to keep up to date with the dependencies, since in the future this issue will pop up more frequently. ## What changes were proposed in this pull request? Bump Jackson to 2.9.6 ## How was this patch tested? Compiled and tested it locally to see if anything broke. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Fokko/spark fd-bump-jackson Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21596.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21596 commit ba454b0e945fcb16a41fbb9eccc1ea2a6941ace5 Author: Fokko Driesprong Date: 2018-06-20T08:15:59Z [SPARK-24601] Bump Jackson version Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197604227 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -244,6 +244,13 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with "1234") } + test("some big value") { +val value = "x" * 3000 +checkEvaluation( + GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")) +, NonFoldableLiteral("$.big")), value) --- End diff -- nit: -> ```scala checkEvaluation( GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")), NonFoldableLiteral("$.big")), value) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197604577 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -244,6 +244,13 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with "1234") } + test("some big value") { +val value = "x" * 3000 +checkEvaluation( + GetJsonObject(NonFoldableLiteral((s"""{"big": "$value"}""")) +, NonFoldableLiteral("$.big")), value) --- End diff -- Fixed, much nicer indeed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197661126 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- I think it's better to have more specific commands but let's avoid to have "2.4.0-SNAPSHOT" in this. Otherwise, we should change this when we make another release .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197707147 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- I agree, but running this benchmark is quite specific and should only happend when something is changed with the json functionality. Not sure if you want to have a specific build step for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197712172 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- But wouldn't this command be invalid if we have another release. I mean, I believe we can build by following https://spark.apache.org/docs/latest/building-spark.html and run the benchmark. Let's just say it like `bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ./sql/core/target/spark-sql_*.jar` or like `spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars ` Should better to make the steps independent of other possible ways. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197713648 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -112,12 +118,13 @@ object JSONBenchmarks { } /* - Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz + Java HotSpot(TM) 64-Bit Server VM 1.8.0_172-b11 on Mac OS X 10.13.5 + Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz JSON per-line parsing: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - No encoding 25947 / 26188 3.9 259.5 1.0X - UTF-8 is set46319 / 46417 2.2 463.2 0.6X + No encoding 9936 / 9973 10,1 99,4 1,0X + UTF-8 is set16018 / 16377 6,2 160,2 0,6X --- End diff -- BTW, was this `,` instead of `.`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197714976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -112,12 +118,13 @@ object JSONBenchmarks { } /* - Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz + Java HotSpot(TM) 64-Bit Server VM 1.8.0_172-b11 on Mac OS X 10.13.5 + Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz JSON per-line parsing: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - No encoding 25947 / 26188 3.9 259.5 1.0X - UTF-8 is set46319 / 46417 2.2 463.2 0.6X + No encoding 9936 / 9973 10,1 99,4 1,0X + UTF-8 is set16018 / 16377 6,2 160,2 0,6X --- End diff -- This is my European computer, I'll change to `.` ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r197715689 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- Ok, I'm fine with that. I've updated the comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198102188 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- Eh, seems missed to update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198112472 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- Oops, forgot to add: ``` MacBook-Pro-van-Fokko:spark fokkodriesprong$ git diff diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala index 3b273f81d7..0598406d33 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala @@ -29,8 +29,9 @@ import org.apache.spark.util.{Benchmark, Utils} * mvn clean package -pl sql/core -DskipTests * ./dev/make-distribution.sh --name local-dist * cd dist/ - * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ - * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt + * ./bin/spark-submit bin/spark-submit \ + * --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * --jars ./sql/core/target/spark-sql_*-tests.jar ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198686913 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,13 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * ../sql/core/target/spark-sql_2.11-2.4.0-SNAPSHOT-tests.jar > /tmp/output.txt --- End diff -- Let's take out other comments like `make-distribution.sh` and `cd dist/` too since they can be varied by how to build. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198762319 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,14 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit bin/spark-submit \ + * --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * --jars ./sql/core/target/spark-sql_*-tests.jar + * open /tmp/output.txt --- End diff -- Do we need to update it like this? I prefer original version... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r198772042 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,14 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit bin/spark-submit \ + * --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * --jars ./sql/core/target/spark-sql_*-tests.jar + * open /tmp/output.txt --- End diff -- yea.. similar comment at https://github.com/apache/spark/pull/21596#discussion_r197661126 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r199141020 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -25,8 +25,14 @@ import org.apache.spark.util.{Benchmark, Utils} /** * The benchmarks aims to measure performance of JSON parsing when encoding is set and isn't. - * To run this: - * spark-submit --class --jars + * To run: + * mvn clean package -pl sql/core -DskipTests + * ./dev/make-distribution.sh --name local-dist + * cd dist/ + * ./bin/spark-submit bin/spark-submit \ + * --class org.apache.spark.sql.execution.datasources.json.JSONBenchmarks \ + * --jars ./sql/core/target/spark-sql_*-tests.jar + * open /tmp/output.txt --- End diff -- I've reverted this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r199803209 --- Diff: pom.xml --- @@ -158,8 +158,8 @@ 2.11.12 2.11 1.9.13 -2.6.7 - 2.6.7.1 +2.9.6 + 2.9.6 --- End diff -- I suspect we can collapse these two versions; they were broken out to handle the fact that a few 2.6.x Jackson releases didn't publish all artifacts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user Fokko commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r200102648 --- Diff: pom.xml --- @@ -158,8 +158,8 @@ 2.11.12 2.11 1.9.13 -2.6.7 - 2.6.7.1 +2.9.6 + 2.9.6 --- End diff -- Good point, I've merged them --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org