[ https://issues.apache.org/jira/browse/SPARK-16461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-16461: ------------------------------- Assignee: Hyukjin Kwon > Support partition batch pruning with `<=>` (EqualNullSafe) predicate in > InMemoryTableScanExec > --------------------------------------------------------------------------------------------- > > Key: SPARK-16461 > URL: https://issues.apache.org/jira/browse/SPARK-16461 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Fix For: 2.1.0 > > > It seems `EqualNullSafe` filter was missed for batch pruneing partitions in > cached tables. > Supporting this improve the performance roughly ~75% (it will vary). > Running the codes below: > {code} > test("Null-safe equal comparison") { > val N = 20000000 > val df = spark.range(N).repartition(20) > val benchmark = new Benchmark("Null-safe equal comparison", N) > df.createOrReplaceTempView("t") > spark.catalog.cacheTable("t") > sql("select id from t where id <=> 1").collect() > benchmark.addCase("Null-safe equal comparison", 10) { _ => > sql("select id from t where id <=> 1").collect() > } > benchmark.run() > } > {code} > produces the results below: > Before: > {code} > Running benchmark: Null-safe equal comparison > Running case: Null-safe equal comparison > Stopped after 10 iterations, 2098 ms > Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5 > Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz > Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Null-safe equal comparison 204 / 210 98.1 > 10.2 1.0X > {code} > After > {code} > Running benchmark: Null-safe equal comparison > Running case: Null-safe equal comparison > Stopped after 10 iterations, 478 ms > Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5 > Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz > Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Null-safe equal comparison 42 / 48 474.1 > 2.1 1.0X > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org