Alexander Petrossian (PAF) created ORC-1583:
-----------------------------------------------
Summary: BloomFilterColumns inside list/map
Key: ORC-1583
URL: https://issues.apache.org/jira/browse/ORC-1583
Project: ORC
Issue Type: Improvement
Affects Versions: 1.9.2
Reporter: Alexander Petrossian (PAF)
Currently when specifying names of columns to index we can use syntax:
* field.nestedField1.nestedField2
org.apache.orc.OrcUtils#findColumn is being used
But when specifying SearchArgument we can use extended syntax:
* field.nestedField1.nestedField2._elem.nestedField3._value.nestdField4
org.apache.orc.TypeDescription#findSubtype is used
This is unbalanced and does not allow to index only one column inside a
list/map.
Currently when there is a list/map in expression writer will activate bloom
filter for all columns, contained inside it, in my case -- hundreds of columns
we do not ever use to search = we do not need those be indexed.
Maybe findSubtype approach can be used in both cases: indexing+searching, this
way code will be balanced?
Offhand there seems to be nothing breaking in to just findColumn call with
findSubtype call.
Thanks for your attention!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)