[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20415 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86898/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20415 **[Test build #86898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86898/testReport)** for PR 20415 at commit [`5d70fd1`](https://github.com/apache/spark/commit/5d70fd1f939a67707f16c1afdefea6d4342c019e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20415 @mgaido91 thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20415 **[Test build #86898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86898/testReport)** for PR 20415 at commit [`5d70fd1`](https://github.com/apache/spark/commit/5d70fd1f939a67707f16c1afdefea6d4342c019e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20415 LGTM, only a minor comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86860/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20415 **[Test build #86860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86860/testReport)** for PR 20415 at commit [`e3e09d9`](https://github.com/apache/spark/commit/e3e09d98072bd39328a4e7d4de1ddd38594c6232). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20415 **[Test build #86860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86860/testReport)** for PR 20415 at commit [`e3e09d9`](https://github.com/apache/spark/commit/e3e09d98072bd39328a4e7d4de1ddd38594c6232). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20415 looks like a reasonable change to me. Although I don't think this will have some significant performance improvement, it makes the code more compact. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20415 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20415 @cloud-fan Can you help me to review it. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20415 @hvanhovell ,thank you for review it. I tested the code for this PR change, **in FileSourceScanExec->doExecute code:** ``` if (needsUnsafeRowConversion) { scan.mapPartitionsWithIndexInternal { (index, iter) => val proj = UnsafeProjection.create(schema) proj.initialize(index) iter.map( r => { numOutputRows += 1 proj(r) }) } } else { val scanOther = scan.mapPartitionsWithIndexInternal { (index, iter) => val proj = UnsafeProjection.create(schema) proj.initialize(index) iter.map(proj) } scanOther.map { r => numOutputRows += 1 r } } ``` **Start spark-shell:** > ./spark-shell --executor-memory 15G --total-executor-cores 1 --conf spark.executor.cores=1 **test code:** > val df4 = (0 until 50).map(i => (i % 2, i % 3, i % 4, i % 5, i % 6, i % 7, i % 8)).toDF("i2","i3","i4","i5","i6","i7","i8") > df4.write.format("parquet").partitionBy("i2", "i3", "i4").bucketBy(8, "i5","i6","i7","i8").saveAsTable("table50") > > def runBenchmark(name: String, cardinality: Int)(f: => Unit): Unit = { > val startTime = System.nanoTime > (0 to cardinality).foreach(i => f) > val endTime = System.nanoTime > println(s"Time taken in $name: " + (endTime - startTime).toDouble / 10 + " seconds") > } > > def benchmark(name: String, card: Int)(f: => Unit){ > (0 to card).foreach(i => f) > } > > After modified File SourceScan Exec: > benchmark("File SourceScan Exec", 2){ > runBenchmark("After modified File SourceScan Exec ", 200) { > spark.conf.set("spark.sql.codegen.maxFields", 2) > spark.conf.set("spark.sql.parquet.enableVectorizedReader", true) > spark.sql("select * from table50").count() > } > } > > Before modified File SourceScan Exec: > benchmark("File SourceScan Exec", 2){ > runBenchmark("Before modified File SourceScan Exec ", 200) { > spark.conf.set("spark.sql.codegen.maxFields", 2) > spark.conf.set("spark.sql.parquet.enableVectorizedReader", false) > spark.sql("select * from table50").count() > } > } > **test result:** > > Test 20 times: > *Test times:first times(s)second times(s) Third times(s) avg(s) > *- > *Before modified10.97 10.83 11.05 10.95 > *After modified 9.33 9.61 9.32 9.42 > > > Test 100 times: > *Test times:first times(s)second times(s) Third times(s) avg(s) > *- > *Before modified51.74 52.80 71.88 58.80 > *After modified 47.24 46.18 48.92 47.45 > > > Test 200 times: > *Test times:first times(s)second times(s) Third times(s) avg(s) > *- > *Before modified236.85325.97395.69 319.50 > *After modified 208.90 244.13261.18 238.07 > > thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/20415 @heary-cao have you benchmarked this? The reason I am asking is because Spark SQL chains iterators, these are pipelined and only materialized when we need to. Your PR effectively removes two virtual calls (hasNext/next) per tuple, so I don't see too much benefit here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20415: [SPARK-23247][SQL]combines Unsafe operations and statist...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20415 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org