[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-09-02 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Reviewed-on: http://gerrit.cloudera.org:8080/17806
Tested-by: Impala Public Jenkins 
Reviewed-by: Attila Jeges 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,025 insertions(+), 33 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Attila Jeges: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-09-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Sep 2021 21:34:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-28 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,025 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/8
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..


Patch Set 6:

patch-set #6 contains the refactored flat buffer generation.


--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 27 Aug 2021 18:18:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,025 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/6
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,036 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/5
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,034 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/4
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.
- BE test for single-value serialization.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
A be/src/exec/parquet/serialize-single-value-test.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
19 files changed, 1,033 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/3
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 Aug 2021 15:09:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py@197
PS1, Line 197: # Query old snapshot
> We are using the local timezone of the machine that executes the test. I do
What I was thinking of is to set TIMEZONE query option to a specific timezone 
for the queries and get the current timestamp after each query with "select 
now();" (with TIMEZONE set to the same timezone).

This way we wouldn't depend on the local timezone of the machine.

A test could compare the results for the same timestamp in different TIMEZONEs, 
to prove that time travel uses the coordinator's local timezone.

Anyway, it was just a silly idea. Now that I think about it, it doesn't sound 
too useful. Feel free to ignore it.


http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py@197
PS1, Line 197: # Query old snapshot
> Currently querying the future behaves the same as querying by now(). I'm no
Sure, use your best judgement.



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 26 Aug 2021 15:08:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java@222
PS1, Line 222: timeTravelSpec_ = other.timeTravelSpec_;
> Maybe cloning the TimeTravelSpec object would be safer than just copying th
Alternatively, you could set timeTravelSpec_ to null in reset(), instead of 
calling timeTravelSpec_.reset()



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Aug 2021 17:55:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java@152
PS1, Line 152:   protected TimeTravelSpec timeTravelSpec_;
Please add a comment.


http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java@222
PS1, Line 222: timeTravelSpec_ = other.timeTravelSpec_;
Maybe cloning the TimeTravelSpec object would be safer than just copying the 
reference.

Perhaps it is not an issue, but if 2 TableRef instances share the same 
'timeTravelSpec_' reference and one of them is reset(), that will affect the 
other instance as well, right?


http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TimeTravelSpec.java
File fe/src/main/java/org/apache/impala/analysis/TimeTravelSpec.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/analysis/TimeTravelSpec.java@113
PS1, Line 113: asOfVersion_ = asOfExpr_.evalToInteger(analyzer, 
"SYSTEM_VERSION AS OF");
you could also check that asOfVersion_ > 0


http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@132
PS1, Line 132: } catch (IOException ex) {
Does 'ex' contain dataFile.path() ? If not, please add dataFile.path() to the 
exception in L133



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Aug 2021 16:08:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17765/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17765/1//COMMIT_MSG@9
PS1, Line 9: This patch adds support "FOR SYSTEM_TIME AS OF" and
Please clarify the the timestamp specified with "FOR SYSTEM_TIME AS OF" is 
interpreted to be in the local timezone. Local timezone meaning the coordinator 
node's local timezone.


http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py@197
PS1, Line 197: # Query old snapshot
> Maybe add another test to query with a timestamp in the future.
You could also test (if not too much work) that switching Impala to another 
timezone (e.g. using TIMEZONE query option) changes the results of the time 
travel query.



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Aug 2021 11:58:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-25 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py@197
PS1, Line 197: # Query old snapshot
Maybe add another test to query with a timestamp in the future.



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 25 Aug 2021 11:03:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10874: Upgrade impyla to the latest version

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17795 )

Change subject: IMPALA-10874: Upgrade impyla to the latest version
..


Patch Set 2:

I've added Joe McDonnell as a reviewer to approve the change.


--
To view, visit http://gerrit.cloudera.org:8080/17795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I990e5cdde4e98d6ab3581fe48f53a5d0590ce492
Gerrit-Change-Number: 17795
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 24 Aug 2021 18:43:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10874: Upgrade impyla to the latest version

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17795 )

Change subject: IMPALA-10874: Upgrade impyla to the latest version
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I990e5cdde4e98d6ab3581fe48f53a5d0590ce492
Gerrit-Change-Number: 17795
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 24 Aug 2021 18:40:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10840: Add support for "FOR SYSTEM TIME AS OF" and "FOR SYSTEM VERSION AS OF" for Iceberg tables

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17765 )

Change subject: IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR 
SYSTEM_VERSION AS OF" for Iceberg tables
..


Patch Set 1:

(3 comments)

The patch looks good. At first glance I found only minor issues.

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@515
PS1, Line 515: TableScan scan = createScanAsOf(
nit: Since 'baseTable' is not used anywhere else, you could move L514 inside 
createScanAsOf(). This way createScanAsOf() would need only 2 params : table 
and timeTravelSpec.


http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/17765/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@4884
PS1, Line 4884: iceT, "FOR SYSTEM_VERSION AS OF  must be an 
integer type but is");
The end of the error msg was left out intentionally?


http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17765/1/tests/query_test/test_iceberg.py@134
PS1, Line 134: ts
'snapshot_id' ?



--
To view, visit http://gerrit.cloudera.org:8080/17765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Gerrit-Change-Number: 17765
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 Aug 2021 18:01:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
17 files changed, 964 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/2
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17806 )

Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@24
PS1, Line 24: import avro.schema
> flake8: F401 'avro.schema' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@25
PS1, Line 25: from avro.datafile import DataFileReader, DataFileWriter
> flake8: F401 'avro.datafile.DataFileWriter' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@26
PS1, Line 26: from avro.io import DatumReader, DatumWriter
> flake8: F401 'avro.io.DatumWriter' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@199
PS1, Line 199:
> flake8: E202 whitespace before ']'
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@199
PS1, Line 199:
> flake8: E201 whitespace after '['
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@260
PS1, Line 260: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@268
PS1, Line 268: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@282
PS1, Line 282: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@290
PS1, Line 290: ,
> flake8: E231 missing whitespace after ','
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@292
PS1, Line 292: :
> flake8: E231 missing whitespace after ':'
Done


http://gerrit.cloudera.org:8080/#/c/17806/1/tests/query_test/test_iceberg.py@307
PS1, Line 307: :
> flake8: E231 missing whitespace after ':'
Done



--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 24 Aug 2021 12:01:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10879: Add parquet stats to iceberg manifest

2021-08-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17806


Change subject: IMPALA-10879: Add parquet stats to iceberg manifest
..

IMPALA-10879: Add parquet stats to iceberg manifest

This patch adds parquet stats to iceberg manifest as per-datafile
metrics.

The following metrics are supported:
- column_sizes :
  Map from column id to the total size on disk of all regions that
  store the column. Does not include bytes necessary to read other
  columns, like footers.

- null_value_counts :
  Map from column id to number of null values in the column.

- lower_bounds :
  Map from column id to lower bound in the column serialized as
  binary. Each value must be less than or equal to all non-null,
  non-NaN values in the column for the file.

- upper_bounds :
  Map from column id to upper bound in the column serialized as
  binary. Each value must be greater than or equal to all non-null,
  non-Nan values in the column for the file.

The corresponding parquet stats are collected by 'ColumnStats'
(in 'min_value_', 'max_value_', 'null_count_' members) and
'HdfsParquetTableWriter::BaseColumnWriter' (in
'total_compressed_byte_size_' member).

Testing:
- New e2e test was added to verify that the metrics are written to the
  Iceberg manifest upon inserting data.
- New e2e test was added to verify that lower_bounds/upper_bounds
  metrics are used to prune data files on querying iceberg tables.
- Existing e2e tests were updated to work with the new behavior.

Relevant Iceberg documentation:
- Manifest:
  https://iceberg.apache.org/spec/#manifests
- Values in lower_bounds and upper_bounds maps should be Single-value
  serialized to binary:
  https://iceberg.apache.org/spec/#appendix-d-single-value-serialization

Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/util/bit-util.h
M common/fbs/IcebergObjects.fbs
M common/protobuf/control_service.proto
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M infra/python/deps/requirements.txt
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
17 files changed, 965 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17806/1
--
To view, visit http://gerrit.cloudera.org:8080/17806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic31f2260bc6f6a7f307ac955ff05eb154917675b
Gerrit-Change-Number: 17806
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10874: Upgrade impyla to the latest version

2021-08-23 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17795 )

Change subject: IMPALA-10874: Upgrade impyla to the latest version
..


Patch Set 1:

@Bikramjeet If I remember correctly, impyla doesn't rely on 'sasl' anymore, so 
that dependency can be removed from requirements.txt. I think it is safe to 
upgrade 'bitarray'.


--
To view, visit http://gerrit.cloudera.org:8080/17795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I990e5cdde4e98d6ab3581fe48f53a5d0590ce492
Gerrit-Change-Number: 17795
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 23 Aug 2021 19:27:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10874: Upgrade impyla to the latest version

2021-08-23 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17795 )

Change subject: IMPALA-10874: Upgrade impyla to the latest version
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17795/1/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

http://gerrit.cloudera.org:8080/#/c/17795/1/infra/python/deps/requirements.txt@40
PS1, Line 40:   sasl == 0.3.1
I think impyla doesn't rely on sasl package anymore. Could you please check it?



--
To view, visit http://gerrit.cloudera.org:8080/17795
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I990e5cdde4e98d6ab3581fe48f53a5d0590ce492
Gerrit-Change-Number: 17795
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 23 Aug 2021 19:24:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10784 (part 2): Fix retaining cookies for impala-shell

2021-08-23 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17796 )

Change subject: IMPALA-10784 (part 2): Fix retaining cookies for impala-shell
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17796
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65432b952929c1c96a081bb87fd4a096624d711b
Gerrit-Change-Number: 17796
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 23 Aug 2021 19:17:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10741: Set engine.hive.enabled=true table property for Iceberg tables

2021-08-05 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17750 )

Change subject: IMPALA-10741: Set engine.hive.enabled=true table property for 
Iceberg tables
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17750
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6aa0240829697a27f48d0defcce48920a5d6f49b
Gerrit-Change-Number: 17750
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 05 Aug 2021 08:50:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10739: Support setting new partition spec for Iceberg tables

2021-08-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17723 )

Change subject: IMPALA-10739: Support setting new partition spec for Iceberg 
tables
..

IMPALA-10739: Support setting new partition spec for Iceberg tables

With this patch Impala will support partition evolution for
Iceberg tables.

The DDL statement to change the default partition spec is:
ALTER TABLE  SET PARTITION SPEC()

Hive uses the same SQL syntax.

Testing:
- Added FE test to exercise parsing various well-formed and ill-formed
  ALTER TABLE SET PARTITION SPEC statements.

- Added e2e tests for:
  - ALTER TABLE SET PARTITION SPEC works for tables with HadoopTables
and HadoopCatalog Catalog.
  - When evolving partition spec, the old data written with an earlier
spec remains unchanged. New data is written using the new spec in
a new layout. Data written with earlier spec and new spec can be
fetched in a single query.
  - Invalid ALTER TABLE SET PARTITION SPEC statements yield the
expected analysis error messages.

Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
---
M be/src/exec/hdfs-table-sink.cc
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
10 files changed, 306 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17723/3
--
To view, visit http://gerrit.cloudera.org:8080/17723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
Gerrit-Change-Number: 17723
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10739: Support setting new partition spec for Iceberg tables

2021-08-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17723 )

Change subject: IMPALA-10739: Support setting new partition spec for Iceberg 
tables
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17723/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test:

http://gerrit.cloudera.org:8080/#/c/17723/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@309
PS2, Line 309: .*.0.parq','.*',''
> This matches to all data files. Can we exclude '=' from the first .*?
Good catch! Done.



--
To view, visit http://gerrit.cloudera.org:8080/17723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
Gerrit-Change-Number: 17723
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 03 Aug 2021 08:55:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10739: Support setting new partition spec for Iceberg tables

2021-08-02 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17723 )

Change subject: IMPALA-10739: Support setting new partition spec for Iceberg 
tables
..

IMPALA-10739: Support setting new partition spec for Iceberg tables

With this patch Impala will support partition evolution for
Iceberg tables.

The DDL statement to change the default partition spec is:
ALTER TABLE  SET PARTITION SPEC()

Hive uses the same SQL syntax.

Testing:
- Added FE test to exercise parsing various well-formed and ill-formed
  ALTER TABLE SET PARTITION SPEC statements.

- Added e2e tests for:
  - ALTER TABLE SET PARTITION SPEC works for tables with HadoopTables
and HadoopCatalog Catalog.
  - When evolving partition spec, the old data written with an earlier
spec remains unchanged. New data is written using the new spec in
a new layout. Data written with earlier spec and new spec can be
fetched in a single query.
  - Invalid ALTER TABLE SET PARTITION SPEC statements yield the
expected analysis error messages.

Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
---
M be/src/exec/hdfs-table-sink.cc
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
10 files changed, 306 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17723/2
--
To view, visit http://gerrit.cloudera.org:8080/17723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
Gerrit-Change-Number: 17723
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10739: Support setting new partition spec for Iceberg tables

2021-08-02 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17723 )

Change subject: IMPALA-10739: Support setting new partition spec for Iceberg 
tables
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17723/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17723/1//COMMIT_MSG@15
PS1, Line 15: rhe
> the
Done


http://gerrit.cloudera.org:8080/#/c/17723/1//COMMIT_MSG@16
PS1, Line 16:
> Please add section about testing.
Done


http://gerrit.cloudera.org:8080/#/c/17723/1/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/17723/1/common/thrift/JniCatalog.thrift@469
PS1, Line 469: from
> for
Done


http://gerrit.cloudera.org:8080/#/c/17723/1/fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java:

http://gerrit.cloudera.org:8080/#/c/17723/1/fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java@50
PS1, Line 50: sb.append(getTbl()).append(" SET PARTITION SPEC ")
> Do we need to add parenthesis here, or are they added by icebergPartSpec_.t
The parentheses are added in icebergPartSpec_.toSql()


http://gerrit.cloudera.org:8080/#/c/17723/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test:

http://gerrit.cloudera.org:8080/#/c/17723/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test@303
PS1, Line 303:  TYPES
> Could you please add a SHOW FILES statement at the end so we can see the di
Done



--
To view, visit http://gerrit.cloudera.org:8080/17723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
Gerrit-Change-Number: 17723
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 02 Aug 2021 11:57:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10739: Support setting new partition spec for Iceberg tables

2021-07-23 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17723


Change subject: IMPALA-10739: Support setting new partition spec for Iceberg 
tables
..

IMPALA-10739: Support setting new partition spec for Iceberg tables

With this patch Impala will support partition evolution for
Iceberg tables.

The DDL statement to change the default partition spec is:
ALTER TABLE  SET PARTITION SPEC()

Hive uses rhe same SQL syntax.

Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
---
M be/src/exec/hdfs-table-sink.cc
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetPartitionSpecStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
10 files changed, 295 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/17723/1
--
To view, visit http://gerrit.cloudera.org:8080/17723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9bd935b8a82e977df9ee90d464b5fe2a7acc83f2
Gerrit-Change-Number: 17723
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10820: Fix calculating default block size for parquest files

2021-07-23 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17719 )

Change subject: IMPALA-10820: Fix calculating default block size for parquest 
files
..

IMPALA-10820: Fix calculating default block size for parquest files

This patch fixes a bug introduced in IMPALA-10627. Because of the bug
the wrong default block size was used for parquet files which broke
TestInsertWideTable.test_insert_wide_table e2e test.

Testing:
- Run test_insert_wide_table with exhaustive strategy.

Change-Id: Iac8c6dd80dfe84cb7b3d2106713eae87ce923934
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
2 files changed, 12 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/17719/2
--
To view, visit http://gerrit.cloudera.org:8080/17719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iac8c6dd80dfe84cb7b3d2106713eae87ce923934
Gerrit-Change-Number: 17719
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10820: Fix calculating default block size for parquest files

2021-07-23 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17719


Change subject: IMPALA-10820: Fix calculating default block size for parquest 
files
..

IMPALA-10820: Fix calculating default block size for parquest files

This patch fixes a bug introduced in IMPALA-10627. Because of the bug
the wrong default block size was used for parquet files which broke
TestInsertWideTable.test_insert_wide_table e2e test.

Testing:
- Run test_insert_wide_table with exhaustive strategy.

Change-Id: Iac8c6dd80dfe84cb7b3d2106713eae87ce923934
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
2 files changed, 8 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/17719/1
--
To view, visit http://gerrit.cloudera.org:8080/17719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iac8c6dd80dfe84cb7b3d2106713eae87ce923934
Gerrit-Change-Number: 17719
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-20 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..


Patch Set 8: Code-Review+2

Carry +2 after rebase.


--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 20 Jul 2021 10:14:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-20 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
20 files changed, 1,147 insertions(+), 115 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/8
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-16 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17654/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17654/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@1150
PS6, Line 1150:
  :   columns_.resize(num_co
> Now the implementation of these could be moved to Configure()/ConfigureForI
Done


http://gerrit.cloudera.org:8080/#/c/17654/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17654/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@325
PS6, Line 325: able.getParameters();
> For backward compatibility we might also want to search for "iceberg.file_f
Done



--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 16 Jul 2021 14:11:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-16 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
19 files changed, 1,147 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/7
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-16 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
19 files changed, 1,134 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/6
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-15 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
19 files changed, 1,134 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/5
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

2021-07-14 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg 
partitions
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 14 Jul 2021 14:07:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
20 files changed, 1,088 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/4
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10732: Use consistent DDL for specifying Iceberg partitions

2021-07-07 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17575 )

Change subject: IMPALA-10732: Use consistent DDL for specifying Iceberg 
partitions
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17575/4//COMMIT_MSG@33
PS4, Line 33: makes Impala to use
typo: makes Impala use


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@291
PS4, Line 291: transformType.startsWit
Not your change, but why is startsWith() used instead of equals() for BUCKET 
and TRUNCATE transports?


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@302
PS4, Line 302: "Unsupported iceberg partition type: "
Do we have a test that exercises this error message?


http://gerrit.cloudera.org:8080/#/c/17575/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@296
PS4, Line 296:   switch (transformType) {
 :   case "HOUR":  case "HOURS":  return 
TIcebergPartitionTransformType.HOUR;
 :   case "DAY":   case "DAYS":   return 
TIcebergPartitionTransformType.DAY;
 :   case "MONTH": case "MONTHS": return 
TIcebergPartitionTransformType.MONTH;
 :   case "YEAR":  case "YEARS":  return 
TIcebergPartitionTransformType.YEAR;
 :   default:
 : throw new TableLoadingException("Unsupported iceberg 
partition type: " +
 : transformType);
 : }
nit: Maybe adding these transform type strings and the ones above to a String 
-> TIcebergPartitionTransformType immutable map would make the code shorter and 
simpler.



--
To view, visit http://gerrit.cloudera.org:8080/17575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib72ae445fd68fb0ab75d87b34779dbab922bbc62
Gerrit-Change-Number: 17575
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 07 Jul 2021 14:04:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). Setting it to 0 signals that the table
  property should be ignored. The table property will also be ignored
  if PARQUET_FILE_SIZE query option is set.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB). Setting it to 0 signals that
  the table property should be ignored.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB). Setting it to 0
  signals that the table property should be ignored.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
20 files changed, 1,123 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/3
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10627: Use standard parquet-related Iceberg table properties

2021-07-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17654


Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
..

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). Setting it to 0 signals that the table
  property should be ignored. The table property will also be ignored
  if PARQUET_FILE_SIZE query option is set.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB). Setting it to 0 signals that
  the table property should be ignored.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB). Setting it to 0
  signals that the table property should be ignored.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
20 files changed, 1,123 insertions(+), 82 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17654/2
--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10750: Impala-shell changes for HS2 compatibility

2021-06-15 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17590 )

Change subject: IMPALA-10750: Impala-shell changes for HS2 compatibility
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17590/2/shell/impala_client.py
File shell/impala_client.py:

http://gerrit.cloudera.org:8080/#/c/17590/2/shell/impala_client.py@846
PS2, Line 846: is_null.frombytes(tcol.nulls)
Would it be possible for tcol.nulls to be None (no NULL values)? If so, this 
line will raise an exception.



--
To view, visit http://gerrit.cloudera.org:8080/17590
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3a4c4ce8a5d60db136df1743f32dba22172ee13
Gerrit-Change-Number: 17590
Gerrit-PatchSet: 2
Gerrit-Owner: Steve Carlin 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 15 Jun 2021 13:26:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5121: Fix AVG() on timestamp col with use local tz for unix timestamp conversions

2021-05-14 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17412 )

Change subject: IMPALA-5121: Fix AVG() on timestamp col with 
use_local_tz_for_unix_timestamp_conversions
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I999099de8e07269b96b75d473f5753be4479cecd
Gerrit-Change-Number: 17412
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 14 May 2021 08:09:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] POC: use puresasl instead of sasl in impala-shell

2021-04-29 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17351 )

Change subject: POC: use puresasl instead of sasl in impala-shell
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17351/4/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

http://gerrit.cloudera.org:8080/#/c/17351/4/infra/python/deps/requirements.txt@39
PS4, Line 39: pure-sasl == 0.6.2
Note that there's also a shell/packaging/requirements.txt that refers to 
sasl==0.2.1



--
To view, visit http://gerrit.cloudera.org:8080/17351
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba5a15e867969938792d120cd8f1ad1ed6370906
Gerrit-Change-Number: 17351
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 29 Apr 2021 16:11:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10662: Change EE tests to return the same results for HS2 as Beeswax

2021-04-20 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17325 )

Change subject: IMPALA-10662: Change EE tests to return the same results for 
HS2 as Beeswax
..


Patch Set 4: Code-Review+2

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/17325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If69ae90c6333ff245c2b951af5689e3071f85cb2
Gerrit-Change-Number: 17325
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 20 Apr 2021 16:33:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10662: Change EE tests to return the same results for HS2 as Beeswax

2021-04-20 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17325 )

Change subject: IMPALA-10662: Change EE tests to return the same results for 
HS2 as Beeswax
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17325/3/tests/common/impala_connection.py
File tests/common/impala_connection.py:

http://gerrit.cloudera.org:8080/#/c/17325/3/tests/common/impala_connection.py@301
PS3, Line 301: convert_types=False
According to the impyla comments:

convert_types : bool, optional
When `False`, timestamps and decimal values will not be converted
to Python `datetime` and `Decimal` values. (These conversions are
expensive.) Only applies when using HS2 protocol versions > 6.

The comment mentions DECIMAL and TIMESTAMP values but it doesn't mention FLOAT 
& DOUBLE values.
I'm just curious, why were FLOAT/DOUBLE values previously converted to a lower 
precision value?



--
To view, visit http://gerrit.cloudera.org:8080/17325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If69ae90c6333ff245c2b951af5689e3071f85cb2
Gerrit-Change-Number: 17325
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 20 Apr 2021 16:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10536: Fix saml2 callback token ttl's description

2021-02-22 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17107 )

Change subject: IMPALA-10536: Fix saml2_callback_token_ttl's description
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17107
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1057f0c5694883d1b1e14075876c780d6c942a8
Gerrit-Change-Number: 17107
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 22 Feb 2021 20:18:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10496: Remove checking port in FLAGS saml2 sp callback url

2021-02-19 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17087 )

Change subject: IMPALA-10496: Remove checking port in 
FLAGS_saml2_sp_callback_url
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2534b7a1a2bf16bf48ba533dc13fd300f690f4e5
Gerrit-Change-Number: 17087
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 19 Feb 2021 13:24:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-11-17 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..


Patch Set 6: Code-Review+2

Fixed test failures. Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 17 Nov 2020 13:53:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-11-17 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
M shell/make_shell_tarball.sh
M shell/packaging/make_python_package.sh
A tests/shell/test_cookie_util.py
8 files changed, 286 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/6
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-11-16 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
A tests/shell/test_cookie_util.py
6 files changed, 284 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/5
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-11-11 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually with nginx HTTP proxy.
TODO:
- Test with Knox HTTP proxy as well.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M fe/src/test/java/org/apache/impala/customcluster/LdapImpalaShellTest.java
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
A tests/shell/test_cookie_util.py
6 files changed, 284 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/4
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-10-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
A tests/shell/test_cookie_util.py
5 files changed, 314 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/3
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-10-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16660 )

Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
A tests/shell/test_cookie_util.py
5 files changed, 314 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/2
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10234: Add support for cookie authentication to impala-shell

2020-10-27 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16660


Change subject: IMPALA-10234: Add support for cookie authentication to 
impala-shell
..

IMPALA-10234: Add support for cookie authentication to impala-shell

IMPALA-8584 added support for cookie authentication to Impala.
This change adds cookie authentication support to impala-shell
as well when using 'hs2-http' protocol.

Testing:
- Unit tests were added to test cookie handling methods.
- Tested e2e manually.

Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
---
M shell/ImpalaHttpClient.py
A shell/cookie_util.py
M shell/impala_client.py
M shell/impala_shell.py
A tests/shell/test_cookie_util.py
5 files changed, 307 insertions(+), 56 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16660/1
--
To view, visit http://gerrit.cloudera.org:8080/16660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb0bc6e0f58f236866ca9913a2e63d97d5148f51
Gerrit-Change-Number: 16660
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10224: Add startup flag not to expose debug web url to clients

2020-10-13 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16573 )

Change subject: IMPALA-10224: Add startup flag not to expose debug web url to 
clients
..

IMPALA-10224: Add startup flag not to expose debug web url to clients

This patch introduces a new startup flag
--ping_expose_webserver_url (true by default) to control whether
PingImpalaService, PingImpalaHS2Service RPC calls should expose
the debug web url to the client or not.

This is necessary as the debug web UI is not something that
end-users will necessarily have access to.

If the flag is set to false, the RPC calls will return an empty
string instead of the real url signalling that the debug web ui
is not available.

Note that if the webserver is disabled (--enable_webserver flag
is set to false) the RPC calls will behave the same and return an
empty string for the url.

Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M shell/impala_client.py
M shell/impala_shell.py
M tests/custom_cluster/test_web_pages.py
7 files changed, 62 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/16573/2
--
To view, visit http://gerrit.cloudera.org:8080/16573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Gerrit-Change-Number: 16573
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10224: Add startup flag not to expose debug web url to clients

2020-10-13 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16573 )

Change subject: IMPALA-10224: Add startup flag not to expose debug web url to 
clients
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16573/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16573/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-10224: Add startup flag not to expose debug web url to 
clients
> It looks like one of the calls to print the query link in impala_shell is g
Good catch, thanks!

I've also noticed that there's another get_query_link() call in 
impala_client.py. I've added the guard there too.



--
To view, visit http://gerrit.cloudera.org:8080/16573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Gerrit-Change-Number: 16573
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 13 Oct 2020 09:54:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10224: Add startup flag not to expose debug web url to clients

2020-10-09 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16573


Change subject: IMPALA-10224: Add startup flag not to expose debug web url to 
clients
..

IMPALA-10224: Add startup flag not to expose debug web url to clients

This patch introduces a new startup flag
--ping_expose_webserver_url (true by default) to control whether
PingImpalaService, PingImpalaHS2Service RPC calls should expose
the debug web url to the client or not.

This is necessary as the debug web UI is not something that
end-users will necessarily have access to.

If the flag is set to false, the RPC calls will return an empty
string instead of the real url signalling that the debug web ui
is not available.

Note that if the webserver is disabled (--enable_webserver flag
is set to false) the RPC calls will behave the same and return an
empty string for the url.

Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M tests/custom_cluster/test_web_pages.py
5 files changed, 48 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/16573/1
--
To view, visit http://gerrit.cloudera.org:8080/16573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7ec3e92764d712b8fee63c1f45b038c31c184cfc
Gerrit-Change-Number: 16573
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-10225: bump impyla version to 0.17a1

2020-10-09 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16562 )

Change subject: IMPALA-10225: bump impyla version to 0.17a1
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I70a0e883275f3c29e2b01fd5bab7725857c8a1ed
Gerrit-Change-Number: 16562
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 09 Oct 2020 14:09:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10054: Fix flakiness in test multiple sort run bytes limits

2020-08-09 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16301 )

Change subject: IMPALA-10054: Fix flakiness in 
test_multiple_sort_run_bytes_limits
..


Patch Set 3: Code-Review+2

Thanks for the explanation!


--
To view, visit http://gerrit.cloudera.org:8080/16301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I84a8b579c943cddba4432cf183f7f002ef8ec6ad
Gerrit-Change-Number: 16301
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sun, 09 Aug 2020 11:47:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10054: Fix flakiness in test multiple sort run bytes limits

2020-08-07 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16301 )

Change subject: IMPALA-10054: Fix flakiness in 
test_multiple_sort_run_bytes_limits
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16301/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/16301/1/tests/query_test/test_sort.py@90
PS1, Line 90: '   - SpilledRuns:.*'
> nit: Perhaps you could use a more complete regex pattern here:
Also, please use raw strings for regex patterns, e.g.:
r'\s+\- SpilledRuns: %s' % spilled_runs



--
To view, visit http://gerrit.cloudera.org:8080/16301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I84a8b579c943cddba4432cf183f7f002ef8ec6ad
Gerrit-Change-Number: 16301
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 07 Aug 2020 11:02:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10054: Fix flakiness in test multiple sort run bytes limits

2020-08-07 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16301 )

Change subject: IMPALA-10054: Fix flakiness in 
test_multiple_sort_run_bytes_limits
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16301/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/16301/1/tests/query_test/test_sort.py@90
PS1, Line 90: '   - SpilledRuns:.*'
nit: Perhaps you could use a more complete regex pattern here:

'\s+\- SpilledRuns: %s.*' % spilled_runs

and then you can remove the extra check in L92.

You can also use re.search() instead of re.findall() since you don't need to 
scan the whole runtime profile after the first match.



--
To view, visit http://gerrit.cloudera.org:8080/16301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I84a8b579c943cddba4432cf183f7f002ef8ec6ad
Gerrit-Change-Number: 16301
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 07 Aug 2020 10:02:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs

2020-07-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16237 )

Change subject: IMPALA-10006: handle non-writable /opt/impala/logs
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/16237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Gerrit-Change-Number: 16237
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 16:34:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP IMPALA-9482 Support for BINARY columns

2020-06-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16066 )

Change subject: WIP IMPALA-9482 Support for BINARY columns
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16066/2/be/src/runtime/types.cc
File be/src/runtime/types.cc:

http://gerrit.cloudera.org:8080/#/c/16066/2/be/src/runtime/types.cc@122
PS2, Line 122: ToThrift(PrimitiveType ptype)
Should this function work as the inverse of ThriftToType() ?
If so, shouldn't it take a AuxColumnType parameter as well?


http://gerrit.cloudera.org:8080/#/c/16066/2/be/src/runtime/types.cc@222
PS2, Line 222: ColumnType::ToThrift(TColumnType* thrift_type)
Same as above, maybe it needs now an additional AuxColumnType parameter?


http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@126
PS2, Line 126:// No built-in function needed for BINARY <-> STRING 
conversion.
 : if (fromType.getPrimitiveType() == PrimitiveType.BINARY 
||
 : toType.getPrimitiveType() == PrimitiveType.BINARY){
 :   continue;
 : }
BINARY<->STRING conversions are no-op conversins, so maybe this block should be 
moved after L191.


http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/catalog/Type.java
File fe/src/main/java/org/apache/impala/catalog/Type.java:

http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/catalog/Type.java@820
PS2, Line 820: // STRING <->  BINARY conversion is not lossy, but implicit cast 
is not allowed.
I'm probably misunderstanding something but the commit msg suggests that BINARY 
to STRING implicit conversion is supported.

"UDF/UDAFs that expect STRING argument accept BINARY too, while in
Hive explicit cast is needed in this case."

Please clarify.


http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/util/AvroSchemaConverter.java
File fe/src/main/java/org/apache/impala/util/AvroSchemaConverter.java:

http://gerrit.cloudera.org:8080/#/c/16066/2/fe/src/main/java/org/apache/impala/util/AvroSchemaConverter.java@154
PS2, Line 154: case BINARY: return Schema.create(Schema.Type.STRING);
I'm not sure about this, maybe binary should be converted to avro bytes. The 
avro documentation on primitive types states that:
bytes: sequence of 8-bit unsigned bytes
string: unicode character sequence


http://gerrit.cloudera.org:8080/#/c/16066/2/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16066/2/testdata/bin/generate-schema-statements.py@213
PS2, Line 213: string
Again, avro has bytes type which might be a better fit for binary.



--
To view, visit http://gerrit.cloudera.org:8080/16066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Gerrit-Change-Number: 16066
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Jun 2020 15:24:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9555 part 2: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-31 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15618


Change subject: IMPALA-9555 part 2: [Hive3] Fix test failure introduced by 
HIVE-22589
..

IMPALA-9555 part 2: [Hive3] Fix test failure introduced by HIVE-22589

This patch is a continuation of IMPALA-9555. It makes Avro DATE
tests more resilient by using regex for expected error messages
instead of using concrete error messages.

Change-Id: I36340be70a37b75997cf49625a173ec2690ed9b8
---
M testdata/workloads/functional-query/queries/QueryTest/avro_date.test
1 file changed, 8 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/15618/1
--
To view, visit http://gerrit.cloudera.org:8080/15618
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I36340be70a37b75997cf49625a173ec2690ed9b8
Gerrit-Change-Number: 15618
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 1:

> >. the test is skipped for ORC (not sure if this is on purpose or
 > by accident).
 > My guess is that updating this test was forgotten in the quite
 > recent https://gerrit.cloudera.org/#/c/14982/
 >
 > I think that in the ideal case we should test both: Julian to test
 > that invalid dates are handled properly (this probably has to be
 > file format specific, as error messages are different) and
 > Gregorian to have a more extended suite of tests that can run on
 > more file formats.
 >
 > The change itself looks good to me, but I am worried about the back
 > and forth changes in Hive.

Thanks for the review. Let's merge this in now to unblock the core test suite.

I agree that DATE testing across different fileformats and Hive versions is 
pretty messy now and we don't cover all the different scenarios. There's a lot 
of room for improvement, but we should address that in a separate patch-set.


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 27 Mar 2020 12:39:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-26 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15564 )

Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..


Patch Set 1:

> I have a basic design question: couldn't we set hive.avro.proleptic.gregorian
 > to true during dataload instead of changing the tests? As other
 > formats use gregorian as far as I know, this seems a better to me,
 > at least to test interop with Impala.

Parquet and Orc fileformats have the same issues with the DATE type as Avro. 
They may also use Gregorian or Julian Calendar depending on which version of 
Hive they were written by.

The failing test is failing only for Avro because:
1. the test is skipped for ORC (not sure if this is on purpose or by accident).
2. the Parquet test table has been written by Impala (instead of Hive) during 
the data load.

We also have tests for ORC and Parquet to demonstrate the issues related to the 
Julian vs Gregorian Calendars, but they use pre-created ORC/Parquet files 
(written by Hive2) and are not affected by HIVE-22589.

I don't see much value in forcing Gregorian Calendar for writing Avro tables. 
The rewritten tests show the default behavior users can expect: pre -1582-10-12 
DATEs are incorrect, but everything after that is working fine.


--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 26 Mar 2020 20:38:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

2020-03-26 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15564


Change subject: IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589
..

IMPALA-9555: [Hive3] Fix test failure introduced by HIVE-22589

With HIVE-22589 Hive3 switched back to using Julian Calendar for
historical dates by default which caused an Impala test failure
around Avro DATE values.

Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
---
M testdata/workloads/functional-query/queries/QueryTest/avro_date.test
M tests/query_test/test_date_queries.py
2 files changed, 33 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/15564/1
--
To view, visit http://gerrit.cloudera.org:8080/15564
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I51dd933867ea7877235e7f6e1f2b56711dca107e
Gerrit-Change-Number: 15564
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[native-toolchain-CR] IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain

2020-02-24 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15265 )

Change subject: IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain
..

IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain

This commit adds a bugfix patch that blocks IMPALA-9226.

Tests:
 - Run query_test for orc/def/block locally.
 - Builds succeeded in all supported platforms.

Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457
Reviewed-on: http://gerrit.cloudera.org:8080/15265
Reviewed-by: Quanlong Huang 
Tested-by: Attila Jeges 
---
M buildall.sh
A 
source/orc/orc-1.6.2-patches/0007-ORC-600-Fix-StringDictionaryColumnReader-to-update-i.patch
2 files changed, 154 insertions(+), 1 deletion(-)

Approvals:
  Quanlong Huang: Looks good to me, approved
  Attila Jeges: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457
Gerrit-Change-Number: 15265
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Quanlong Huang 


[native-toolchain-CR] IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain

2020-02-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15265 )

Change subject: IMPALA-9226: Add patch to ORC-1.6.2 in the toolchain
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15265
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f86d9493d3907e51a8d559adeb4f4b042379457
Gerrit-Change-Number: 15265
Gerrit-PatchSet: 1
Gerrit-Owner: Norbert Luksa 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 24 Feb 2020 11:33:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9395: fix duplicate broadcast SetFilter() calls

2020-02-21 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15242 )

Change subject: IMPALA-9395: fix duplicate broadcast SetFilter() calls
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95d0620c4dbb5e4066702db48442cebee7389f5a
Gerrit-Change-Number: 15242
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 21 Feb 2020 13:46:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

2020-02-21 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 21 Feb 2020 13:45:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9036: Fix CTRL+C a multiline query in impala-shell

2020-02-19 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15233 )

Change subject: IMPALA-9036: Fix CTRL+C a multiline query in impala-shell
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/15233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f
Gerrit-Change-Number: 15233
Gerrit-PatchSet: 4
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 Feb 2020 15:24:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

2020-02-19 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352: // TODO The timezone depends on flag 
use_local_tz_for_unix_timestamp_conversions.
 : //  Check if this is the intended behaviour.
 : RETURN_IF_ERROR(MaterializeNextRow(
 : state->time_zone_for_unix_time_conversions(), 
tuple_pool, tuple));
> I was thinking about UTCPTR instead. Using local_time_zone() would mean tha
Yes, you're correct, it should be UTCPTR.



--
To view, visit http://gerrit.cloudera.org:8080/15222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e
Gerrit-Change-Number: 15222
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 19 Feb 2020 13:05:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9036: Fix CTRL+C a multiline query in impala-shell

2020-02-19 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15233 )

Change subject: IMPALA-9036: Fix CTRL+C a multiline query in impala-shell
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15233/2/tests/shell/test_shell_interactive.py
File tests/shell/test_shell_interactive.py:

http://gerrit.cloudera.org:8080/#/c/15233/2/tests/shell/test_shell_interactive.py@252
PS2, Line 252:"wrong\n1", "[1]: incorrect\n2",
 : "select 3 --comment\n", "[2]: select 4 --comment",
 : "select 5 --comment\n\n\n", "[3]: select 6 --comment",
 : "select /*comment*/\n7", "[4]: select /*comment*/\n8",
 : "select\n/*comm\nent*/\n9", "[5]: 
select\n/*comm\nent*/\n10"
I'd use input lines like "line 1", "line 2', "one", "two" or something similar, 
with newlines scattered through. You can throw in some SQL- or C-style comments 
too for good measure but keep them short.

In general the idea is to make these input lines as simple as possible to make 
the intent clear, which is that we expect that these erroneous lines will be 
ignored because of Ctrl-C.


http://gerrit.cloudera.org:8080/#/c/15233/2/tests/shell/test_shell_interactive.py@256
PS2, Line 256: "[5]: select\n/*comm\nent*/\n10"
I think there are two test cases here that we should address:
1. When the last line before Ctrl-C ends with newline.
2. When it doesn't.


http://gerrit.cloudera.org:8080/#/c/15233/2/tests/shell/test_shell_interactive.py@259
PS2, Line 259: child_proc.sendintr()
If the very last line before Ctrl-C ends with a newline, you should add before 
L259 a child_proc.expect(' >') to make it clear that impala-shell is 
waiting for more input.



--
To view, visit http://gerrit.cloudera.org:8080/15233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8d8bdaee929e2655eb66e886ae92a02d3fbd83f
Gerrit-Change-Number: 15233
Gerrit-PatchSet: 2
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 19 Feb 2020 10:35:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9385: Unix time conversion cleanup + ORC fix

2020-02-18 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15222 )

Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix
..


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@13
PS6, Line 13: is was
was no ?


http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@26
PS6, Line 26: that
nit: not necessary


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h
File be/src/common/global-types.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h@31
PS6, Line 31: #define UTCPTR nullptr
Why no define a const Timezone* instead?


http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352
PS7, Line 352: // TODO The timezone depends on flag 
use_local_tz_for_unix_timestamp_conversions.
 : //  Check if this is the intended behaviour.
 : RETURN_IF_ERROR(MaterializeNextRow(
 : state->time_zone_for_unix_time_conversions(), 
tuple_pool, tuple));
You're raising a good point in the comment. I think here we should just pass 
state->local_time_zone().


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc@170
PS6, Line 170:
Returning const Timezone* here might simplify things a bit.


http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h@102
PS6, Line 102: 'unix_time' is assumed to be UTC
nit: The comment is a bit confusing. 'unix_time' is always in UTC (by 
definition) not just when 'local_tz' is set to non-UTC.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test
File 
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test@3
PS6, Line 3: This test is also called with 
convert_legacy_hive_parquet_utc_timestamps=true.
Comment is confusing: I think the test is only called with 
convert_legacy_hive_parquet_utc_timestamps=true.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test
File 
testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test:

http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@18
PS6, Line 18: 
:  QUERY
Move the new sections to a separate .test file or rename this file to something 
more appropriate.
Originally this test file was meant for testing UTC timestamp functions only.


http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@45
PS6, Line 45:  QUERY
: SET timezone=CET;
: select min(timestamp_col) from functional_avro.alltypestiny;
:  TYPES
: STRING
:  RESULTS
: '2009-01-01 00:00:00'
Since functional_avro.alltypestiny.timestamp_col is a string so probably you 
can remove this section.


http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@74
PS6, Line 74: self.check_sanity(True)
: # Test with UTC too to check the optimizations added in 
IMPALA-9385.
: for tz_name in ["PST8PDT", "UTC"]:
:   # The value read from the Hive table should be the same as 
reading a UTC converted
:   # value from the Impala table.
:   data = self.execute_query_expect_success(self.client, """
:   SELECT h.id, h.day, h.timestamp_col, i.timestamp_col
:   FROM functional_parquet.alltypesagg_hive_13_1 h
:   JOIN functional_parquet.alltypesagg
: i ON i.id = h.id AND i.day = h.day  -- serves as a 
unique key
:   WHERE
: (h.timestamp_col IS NULL AND i.timestamp_col IS NOT 
NULL)
: OR (h.timestamp_col IS NOT NULL AND i.timestamp_col 
IS NULL)
: OR h.timestamp_col != 

[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-12 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 9: Code-Review+2

Carry +2


--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 12 Feb 2020 08:33:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 8:

> Uploaded patch set 7.

Bumped Kudu version and got rid of installing libcurl3 dependency.


--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 11 Feb 2020 13:30:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-11 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..

IMPALA-9279: Update the Kudu version to include VARCHAR support

Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M impala-parent/pom.xml
3 files changed, 42 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/8
--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-11 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 8:

> Uploaded patch set 8.

Rebased patch-set.


--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 11 Feb 2020 13:29:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-11 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..

IMPALA-9279: Update the Kudu version to include VARCHAR support

Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M impala-parent/pom.xml
3 files changed, 42 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/7
--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[native-toolchain-CR] IMPALA-9279: part 2: Bump Kudu version to 5c610bf40

2020-02-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15192 )

Change subject: IMPALA-9279: part 2: Bump Kudu version to 5c610bf40
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ff92cc5e1de220a4c140cf6c0117b5fa1e89226
Gerrit-Change-Number: 15192
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Tue, 11 Feb 2020 06:40:20 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9279: part 2: Bump Kudu version to 5c610bf40

2020-02-10 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15192 )

Change subject: IMPALA-9279: part 2: Bump Kudu version to 5c610bf40
..

IMPALA-9279: part 2: Bump Kudu version to 5c610bf40

This pulls in a Kudu change that links Kudu executables
statically to libcurl in Kudu's thirdparty directory instead of
relying on the dynamic linker to find libcurl at runtime.

Testing:
- Ran the C6 toolchain build job with the Kudu version bump for
  native toolchain to make sure that it builds on all supported
  platforms.

Change-Id: I3ff92cc5e1de220a4c140cf6c0117b5fa1e89226
Reviewed-on: http://gerrit.cloudera.org:8080/15192
Reviewed-by: Joe McDonnell 
Tested-by: Attila Jeges 
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Attila Jeges: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3ff92cc5e1de220a4c140cf6c0117b5fa1e89226
Gerrit-Change-Number: 15192
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-10 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 6:

> I just saw this commit in Kudu, and I'm wondering if it helps with
 > the libcurl situation: https://gerrit.cloudera.org/#/c/15180/
 >
 > If they link libcurl, then we might not need to install it.

Correct, this Kudu change eliminates the runtime dependency on the libcurl 
shared library.

Here's a native-toolchain CR to bump kudu version once again to include the fix:
https://gerrit.cloudera.org/#/c/15192/


--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Mon, 10 Feb 2020 15:43:46 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9279: part 2: Bump Kudu version to 5c610bf40

2020-02-10 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15192


Change subject: IMPALA-9279: part 2: Bump Kudu version to 5c610bf40
..

IMPALA-9279: part 2: Bump Kudu version to 5c610bf40

This pulls in a Kudu change that links Kudu executables
statically to libcurl in Kudu's thirdparty directory instead of
relying on the dynamic linker to find libcurl at runtime.

Testing:
- Ran the C6 toolchain build job with the Kudu version bump for
  native toolchain to make sure that it builds on all supported
  platforms.

Change-Id: I3ff92cc5e1de220a4c140cf6c0117b5fa1e89226
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/92/15192/1
--
To view, visit http://gerrit.cloudera.org:8080/15192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3ff92cc5e1de220a4c140cf6c0117b5fa1e89226
Gerrit-Change-Number: 15192
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-03 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 2:

(2 comments)

> > > Patch Set 2:
 > > > The verify job failed because kudu-3ba5ec5d0 (kudu-1.12.0-SNAPSHOT)
 > > has a new run-time dependency: libcurl.so.4 which is not
 > available
 > > in the ubuntu-16.04-configured jenkins worker label. I'm
 > discussing
 > > with laszlog the possibility of adding libcurls.so.4 to the
 > worker
 > > labe;.
 > > >
 > >
 > > If we decide to take this new Kudu version as a dependency, then
 > > the correct way to handle libcurl.so.4 as a new runtime
 > dependency
 > > is to add it to the list of packages we install in
 > > bin/bootstrap_system.sh.
 > > The worker image referenced above is only minimally preconfigured
 > > to allow fast startup times; Impala runtime/development time
 > > dependencies should be managed in the bootstrap scripts.
 > >
 > > Additionally, the dependency on libcurl.so.4 should be evaluated
 > > for all OS platforms we claim to have support for: e.g. a brief
 > > scan of this article[1] claims that running both libcurl.so.3 and
 > > libcurl.so.4 on Ubuntu 18.04 is at least non-trivial to set up.
 > >
 > > [1]: 
 > > https://dev.to/jake/using-libcurl3-and-libcurl4-on-ubuntu-1804-bionic-184g,
 > > "Using libcurl3 and libcurl4 on Ubuntu 18.04 (Bionic)"
 >
 > In bin/bootstrap_system.sh, I don't see us installing curl for
 > ubuntu, but I see us installing it for centos. I would try adding
 > it and see if that helps. (We have curl installed in all the docker
 > images we use to build kudu for the native toolchain.)
 >
 > We can run a ubuntu-18.04-from-scratch job to see if it works.

Installing curl on Ubuntu 16.04 installs libcurl-gnutls.so.4 but it doesn't 
install the required libcurl.so.4.

"apt install libcurl3" on the other hand works for all supported Ubuntu 
releases, so I've added that to bin/bootstrap_system.sh.

http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh@719
PS2, Line 719:   export 
IMPALA_TOOLCHAIN_KUDU_MAVEN_REPOSITORY="file://${IMPALA_TOOLCHAIN}"
> Since this is disabled, I think we can set it to an empty string. If that w
Setting url to an empty string results in an error but I can set it to 
something like "file:///non/existing/repo"

What do you think?


http://gerrit.cloudera.org:8080/#/c/15134/2/bin/impala-config.sh@722
PS2, Line 722:   export IMPALA_KUDU_VERSION="3ba5ec5d0"
 :   export IMPALA_KUDU_JAVA_VERSION="1.12.0-SNAPSHOT"
> One use case that we want to support is for someone to be able to override
Done



--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Mon, 03 Feb 2020 14:49:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-02-03 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..

IMPALA-9279: Update the Kudu version to include VARCHAR support

Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
---
M bin/bootstrap_system.sh
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M impala-parent/pom.xml
4 files changed, 43 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/3
--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-01-31 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..


Patch Set 2:

> (2 comments)
 >
 > Thanks for working on this. This is looking pretty good. I'm
 > thinking through the edge cases where we want to override some
 > versions, so I may have a couple more comments.
 >
 > In the meantime, I'm going to run an upstream verify job on this
 > review.

The verify job failed because kudu-3ba5ec5d0 (kudu-1.12.0-SNAPSHOT) has a new 
run-time dependency: libcurl.so.4 which is not available in the 
ubuntu-16.04-configured jenkins worker label. I'm discussing with laszlog the 
possibility of adding libcurls.so.4 to the worker labe;.

As far as I know, both CDH GBN Kudu and CDP GBN Kudu are based on kudu-1.11.0 
which doesn't depend on libcurl.so.4. I considered updating toolchain Kudu to 
1.11.0 or 1.11.1 (which is the latest upstream Kudu release) instead of 
3ba5ec5d0, but kudu-1.11.x doesn't have support for VARCHAR yet.


--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Fri, 31 Jan 2020 14:11:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-01-30 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/15134 )

Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..

IMPALA-9279: Update the Kudu version to include VARCHAR support

Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M impala-parent/pom.xml
3 files changed, 43 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/2
--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-9279: Update the Kudu version to include VARCHAR support

2020-01-30 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15134


Change subject: IMPALA-9279: Update the Kudu version to include VARCHAR support
..

IMPALA-9279: Update the Kudu version to include VARCHAR support

Before this change the preferred way of getting Kudu was to pull
it in from the specified CDH build (even if USE_CDP_HIVE was set
to true). Optionally by setting USE_CDH_KUDU to false, one could
force Impala to use the native toolchain Kudu. But even then, the
Kudu Java artifacts would be downloaded from CDH.

Since Kudu VARCHAR support won't be backported to CDH, this
behavior blocks the Impala side of the Kudu/Impala VARCHAR
integration.

With this change:
1. Using the native toolchain Kudu (including the Java artifacts)
   is the default behavior. From now on USE_CDH_KUDU will be set
   to false by default. Impala can be forced to fall back on
   using the CDH Kudu by explicitly setting USE_CDH_KUDU to true.
2. Kudu version is updated to include the VARCHAR support.

Testing:
Ran exhaustive tests with USE_CDH_KUDU=true and
USE_CDH_KUDU=false.

Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M impala-parent/pom.xml
3 files changed, 42 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/15134/1
--
To view, visit http://gerrit.cloudera.org:8080/15134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iafe56342d43cb63e35c0bbb1b4a99327dda0a44a
Gerrit-Change-Number: 15134
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0

2020-01-29 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15119 )

Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0
..

IMPALA-9279: Bump Kudu version to 3ba5ec5d0

This pulls in Kudu VARCHAR support which is needed for the Impala
side of the Kudu/Impala VARCHAR integration.

Testing:
- Ran the C6 toolchain build job with the kudu version bump for
  native toolchain to make sure that it builds on all supported
  platforms.
- Built Impala locally with kudu-3ba5ec5d0 and ran test_kudu.py
  E2E and AnalyzeKuduDDLTest FE tests.

Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Reviewed-on: http://gerrit.cloudera.org:8080/15119
Reviewed-by: Joe McDonnell 
Tested-by: Attila Jeges 
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Attila Jeges: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Gerrit-Change-Number: 15119
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 


[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0

2020-01-29 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15119 )

Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Gerrit-Change-Number: 15119
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 29 Jan 2020 08:13:30 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9279: Bump Kudu version to 3ba5ec5d0

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15119


Change subject: IMPALA-9279: Bump Kudu version to 3ba5ec5d0
..

IMPALA-9279: Bump Kudu version to 3ba5ec5d0

This pulls in Kudu VARCHAR support which is needed for the Impala
side of the Kudu/Impala VARCHAR integration.

Testing:
- Ran the C6 toolchain build job with the kudu version bump for
  native toolchain to make sure that it builds on all supported
  platforms.
- Built Impala locally with kudu-3ba5ec5d0 and ran test_kudu.py
  E2E and AnalyzeKuduDDLTest FE tests.

Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/19/15119/1
--
To view, visit http://gerrit.cloudera.org:8080/15119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibc3fd6f0c7d31f1f80753402adc0ca5b3c5759a0
Gerrit-Change-Number: 15119
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Jeges 


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 28 Jan 2020 14:51:30 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-28 Thread Attila Jeges (Code Review)
Attila Jeges has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..

IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

The build script was modified to generate Kudu JARs and add them
to the Kudu tarball.

redhat6 and redhat7 docker images were modified to update Java 8
to a newer version that is suitable for building the Java
artifacts.

ubuntu1404 docker image was modified to include CA certificate
file for Java.

Testing:
Ran the C6 toolchain build job to verify that native-toolchain
builds on all supported platforms.

Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Reviewed-on: http://gerrit.cloudera.org:8080/15072
Reviewed-by: Joe McDonnell 
Tested-by: Attila Jeges 
---
M docker/all/postinstall.sh
A docker/redhat/Centos7-Vault.repo
M docker/redhat6.df
M docker/redhat7.df
M docker/ubuntu1404.df
M source/kudu/build.sh
6 files changed, 45 insertions(+), 6 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Attila Jeges: Verified

--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-27 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15072/3/source/kudu/build.sh
File source/kudu/build.sh:

http://gerrit.cloudera.org:8080/#/c/15072/3/source/kudu/build.sh@137
PS3, Line 137:   local JAVA_INSTALL_DIR="$LOCAL_INSTALL/java"
 :   mkdir -p "$JAVA_INSTALL_DIR"
 :   pushd java
 :   export GRADLE_USER_HOME="$(pwd)"
 :   wrap ./gradlew :kudu-hive:assemble :kudu-client:assemble
 :   # Copy kudu-hive jars to JAVA_INSTALL_DIR.
 :   local F
 :   for F in kudu-hive/build/libs/kudu-hive-*.jar; do
 : cp "$F" "$JAVA_INSTALL_DIR"
 :   done
 :   # Install kudu-client artifacts to the Local Maven Repository:
 :   wrap ./gradlew 
-Dmaven.repo.local="${JAVA_INSTALL_DIR}/repository" :kudu-client:install
 :   popd
I've also tested Impala quickly with kudu-hive and kudu-client built like this 
locally on my dev machine.

It looks good, impala builds without an issue and the kudu-related E2E tests 
that I've tried passed.



--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Mon, 27 Jan 2020 14:38:17 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-24 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..

IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

The build script was modified to generate Kudu JARs and add them
to the Kudu tarball.

redhat6 and redhat7 docker images were modified to update Java 8
to a newer version that is suitable for building the Java
artifacts.

ubuntu1404 docker image was modified to include CA certificate
file for Java.

Testing:
Ran the C6 toolchain build job to verify that native-toolchain
builds on all supported platforms.

Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
---
M docker/all/postinstall.sh
A docker/redhat/Centos7-Vault.repo
M docker/redhat6.df
M docker/redhat7.df
M docker/ubuntu1404.df
M source/kudu/build.sh
6 files changed, 45 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/72/15072/3
--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-24 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..


Patch Set 2:

(3 comments)

> (3 comments)
 >
 > I think the changes to the docker side make sense. I have a few
 > small comments, but it sounds good.
 >
 > For the actual kudu build steps, I think we'll need to see what
 > Impala needs from the Kudu java side to get this right. I left a
 > comment about what I think we would need, but I don't think our
 > progress needs to be blocked on everything being perfect. One way
 > forward is that we put the Kudu java artifacts in a subdirectory
 > (which we know will not conflict with the existing implementation)
 > and then it becomes fairly harmless to check in something that is
 > not perfect. Then, as we do the Impala change and find what we
 > need, we do an additional change to get anything we missed. Another
 > way forward is to merge the docker stuff (which lets us update the
 > docker images) and then do the Kudu part in concert with the Impala
 > change.

Thanks for the help. I decided to generate the Java artifacts and put them into 
a subdirectory in this patch-set.

I'll do a kudu version bump in a separate patch and finally change Impala to 
consume the generated Java artifacts in a third patch.

http://gerrit.cloudera.org:8080/#/c/15072/2/docker/redhat6.df
File docker/redhat6.df:

http://gerrit.cloudera.org:8080/#/c/15072/2/docker/redhat6.df@15
PS2, Line 15: # Install a newer java-1.8.0-openjdk-devel from centos:6.8.
: # The java-1.8.0-openjdk-devel version shipped with centos:6.6 is 
unable to handle ECDHE
: # ciphers.
: RUN yum-install --disablerepo='*' --enablerepo=C6.8-base 
java-1.8.0-openjdk-devel
> The way I think about this is that there are some libraries we use newer ve
Done


http://gerrit.cloudera.org:8080/#/c/15072/2/docker/redhat7.df
File docker/redhat7.df:

http://gerrit.cloudera.org:8080/#/c/15072/2/docker/redhat7.df@9
PS2, Line 9: # We get a newer java-1.8.0-openjdk-devel from centos:7.4.
   : # The java-1.8.0-openjdk version shipped with centos:7.2 is unable 
to handle ECDHE
   : # ciphers.
   : RUN yum-install --disablerepo='*' --enablerepo=C7.4-base 
java-1.8.0-openjdk-devel
> Same as the redhat6.df comment, move this below the big install command.
Done


http://gerrit.cloudera.org:8080/#/c/15072/2/source/kudu/build.sh
File source/kudu/build.sh:

http://gerrit.cloudera.org:8080/#/c/15072/2/source/kudu/build.sh@140
PS2, Line 140:   wrap ./gradlew :kudu-hive:assemble
 :   for F in kudu-hive/build/libs/kudu-*.jar; do
 : cp "$F" "$JAVA_INSTALL_DIR"
 :   done
> My thought on this part is that we are going to want to test it with the co
I've changed the code to put the kudu-hive jars and the kudu-client maven repo 
to $LOCAL_INSTALL/java. Done.



--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Sat, 25 Jan 2020 07:21:59 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-9265: Support for toolchain Kudu to provide Java artifacts

2020-01-21 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15072 )

Change subject: IMPALA-9265: Support for toolchain Kudu to provide Java 
artifacts
..


Patch Set 2:

> Took a first pass through it, it looks pretty good.
 > There is one issue that makes me wonder:
 > 1.we explicitly include the C/C++ compiler version in the resulting
 > tarballs' name
 > 2. We now start including Java binaries in the same tarballs. Java
 > binaries can (in theory at least, if not in current practice) be
 > produced by different JDK versions,
 > so should we start including the JDK version (or distro+version)
 > string in the artifact names?
 > Currently we build only with JDK 8, but as JDK 8 is nearing its End
 > of Support Date, this may change one day.
 > We can also defer the decision and establish the convention that no
 > explicit Java version means JDK 8, and everything else is marked;
 > this can be left to our future selves.

Distro names are already included in the tarball names uploaded to the S3 
bucket.

I agree that JDK version should be added to the tarball names eventually. I'm 
not sure if we should do it now or as a separate change. Let's see what Joe 
thinks about it.


--
To view, visit http://gerrit.cloudera.org:8080/15072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba03dfe9c302513b825cbed7146c582e7d97c3af
Gerrit-Change-Number: 15072
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Tue, 21 Jan 2020 15:10:06 +
Gerrit-HasComments: No


  1   2   3   4   5   >