[ https://issues.apache.org/jira/browse/SPARK-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25635: ------------------------------------ Assignee: Dongjoon Hyun (was: Apache Spark) > Support selective direct encoding in native ORC write > ----------------------------------------------------- > > Key: SPARK-25635 > URL: https://issues.apache.org/jira/browse/SPARK-25635 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.0.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Major > > Before ORC 1.5.3, `orc.dictionary.key.threshold` and > `hive.exec.orc.dictionary.key.size.threshold` is applied for all columns. > This is a big huddle to enable dictionary encoding. > From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct > encoding selectively in a column-wise manner. This issue aims to add that > feature by upgrading ORC from 1.5.2 to 1.5.3. > The followings are the patches in ORC 1.5.3 and this feature is the only one > related to Spark directly. > {code} > ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts > multi-byte data (gopalv) > ORC-403: [C++] Add checks to avoid invalid offsets in InputStream > ORC-405. Remove calcite as a dependency from the benchmarks. > ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places. > ORC-383: Parallel builds fails with ConcurrentModificationException > ORC-382: Apache rat exclusions + add rat check to travis > ORC-401: Fix incorrect quoting in specification. > ORC-385. Change RecordReader to extend Closeable. > ORC-384: [C++] fix memory leak when loading non-ORC files > ORC-391: [c++] parseType does not accept underscore in the field name > ORC-397. Allow selective disabling of dictionary encoding. Original patch was > by Mithun Radhakrishnan. > ORC-389: Add ability to not decode Acid metadata columns > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org