[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-24706: ------------------------------------ Assignee: Apache Spark > Support ByteType and ShortType pushdown to parquet > -------------------------------------------------- > > Key: SPARK-24706 > URL: https://issues.apache.org/jira/browse/SPARK-24706 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Yuming Wang > Assignee: Apache Spark > Priority: Major > > Benchmark result: > {noformat} > ###############################[ Pushdown benchmark for tinyint > ]################################ > Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 > Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz > Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) > Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 4307 / 4575 3.7 > 273.8 1.0X > Parquet Vectorized (Pushdown) 227 / 241 69.4 > 14.4 19.0X > Native ORC Vectorized 3646 / 3727 4.3 > 231.8 1.2X > Native ORC Vectorized (Pushdown) 736 / 744 21.4 > 46.8 5.9X > Select 10% tinyint rows (value < 12): Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 5209 / 5843 3.0 > 331.2 1.0X > Parquet Vectorized (Pushdown) 1296 / 1759 12.1 > 82.4 4.0X > Native ORC Vectorized 4455 / 4594 3.5 > 283.2 1.2X > Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 > 110.4 3.0X > Select 50% tinyint rows (value < 63): Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 8362 / 8394 1.9 > 531.7 1.0X > Parquet Vectorized (Pushdown) 6303 / 6530 2.5 > 400.7 1.3X > Native ORC Vectorized 7962 / 8113 2.0 > 506.2 1.1X > Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 > 424.7 1.3X > Select 90% tinyint rows (value < 114): Best/Avg Time(ms) Rate(M/s) Per > Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 11572 / 11715 1.4 > 735.7 1.0X > Parquet Vectorized (Pushdown) 11198 / 11326 1.4 > 712.0 1.0X > Native ORC Vectorized 11041 / 11209 1.4 > 702.0 1.0X > Native ORC Vectorized (Pushdown) 11104 / 11472 1.4 > 706.0 1.0X > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org