Hello Tim Armstrong, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9134 to look at the new patch set (#8). Change subject: IMPALA-5717: Support for reading ORC data files ...................................................................... IMPALA-5717: Support for reading ORC data files This patch integrates the orc library into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. The ORC version we used is release-1.4.3. Currently, we only support reading primitive types. Writing into ORC table has not been supported neither. Tests - Most of the end-to-end tests can run on ORC format. - Add tpcds, tpch tests for ORC. - Add some ORC specific tests. - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library is not robust for corrupt files (ORC-315). Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/exec/CMakeLists.txt A be/src/exec/hdfs-orc-scanner.cc A be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindOrc.cmake M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/LineItemMultiBlock/README.dox A testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc A testdata/LineItemMultiBlock/lineitem_sixblocks.orc A testdata/LineItemMultiBlock/lineitem_threeblocks.orc M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/run-hive-server.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl A testdata/data/chars-formats.orc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_dimensions.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/common/test_vector.py M tests/comparison/cli_options.py M tests/query_test/test_chars.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_tpch_queries.py 55 files changed, 1,637 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/8 -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>