[Impala-ASF-CR] IMPALA-4033,IMPALA-4105: Improvements of partition DDL.

2016-11-17 Thread Amos Bird (Code Review)
Amos Bird has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5137

Change subject: IMPALA-4033,IMPALA-4105: Improvements of partition DDL.
..

IMPALA-4033,IMPALA-4105: Improvements of partition DDL.

This commit makes add partition op treat string partition-key values as
case sensitive in consistency with other related partition DDL
operations. REFRESH statements now support general partition exprs.

Change-Id: I0f4e46ec0a63b46e485141290268d019c3dd15c7
---
M bin/start-catalogd.sh
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
11 files changed, 80 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/5137/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5137
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0f4e46ec0a63b46e485141290268d019c3dd15c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Amos Bird 


[Impala-ASF-CR] IMPALA-2890: Support ALTER TABLE statements for Kudu tables

2016-11-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5136

Change subject: IMPALA-2890: Support ALTER TABLE statements for Kudu tables
..

IMPALA-2890: Support ALTER TABLE statements for Kudu tables

With this commit, we add support for additional ALTER TABLE statements
against Kudu tables. The new supported ALTER TABLE operations for Kudu are:
- ADD/DROP range partitions. Syntax:
ALTER TABLE  ADD [IF NOT EXISTS] RANGE 
ALTER TABLE  DROP [IF EXISTS] RANGE 
- ADD/DROP/RENAME column. Syntax:
ALTER TABLE  ADD COLUMNS (col_spec, [col_spec, ...])
ALTER TABLE  DROP COLUMN 
ALTER TABLE  CHANGE COLUMN   
- Rename Kudu table using the 'kudu.table_name' table property. Example:
  ALTER TABLE  SET TBLPROPERTY ('kudu.tbl_name'=''),
  will change the underlying Kudu table name to .
- Renaming the HMS/Catalog table entry of a Kudu table is supported using the
  existing ALTER TABLE  RENAME TO  syntax.

Not supported:
- ALTER TABLE  REPLACE COLUMNS

Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
M tests/query_test/test_kudu.py
17 files changed, 815 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/5136/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I04bc87e04e05da5cc03edec79d13cedfd2012896
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..


Patch Set 3:

s/binary/dictionary in last comment.

-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

2016-11-17 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#3).

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet 
scanner
..

IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

 * Extend metadata checks to allow more than one possible physical type
   for a given logical type.
 * Change decimal decoding to handle non-fixed-length format in same path
   as fixed-length encoding.

Testing:

 * Query test that decodes both plain and dictionary-encoded decimals
   using binary encoding.

Perf:

 * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
   decimal(12,2) values using FIXED_BYTE_ARRAY physical type encoding.
 * The overhead of decoding with the extra branch was measured at 1s;
   i.e. the per-decode overhead is 1ns.

Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A testdata/workloads/functional-query/queries/QueryTest/decimal-encodings.test
M tests/query_test/test_scanners.py
7 files changed, 135 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5115/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options

2016-11-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3726: Add support for Kudu-specific column options
..


Patch Set 8: Code-Review+2

Minor test fix and rebase. Carry Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options

2016-11-17 Thread Dimitris Tsirogiannis (Code Review)
Hello Internal Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5026

to look at the new patch set (#8).

Change subject: IMPALA-3726: Add support for Kudu-specific column options
..

IMPALA-3726: Add support for Kudu-specific column options

This commit adds support for Kudu-specific column options in CREATE
TABLE statements. The syntax is:
CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ])
where option is:
| NULL
| NOT NULL
| ENCODING encoding_val
| COMPRESSION compression_algorithm
| DEFAULT expr
| BLOCK_SIZE num

The output of the SHOW CREATE TABLE statement was altered to include all the 
specified
column options for Kudu tables.

Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaParser.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
M tests/shell/test_shell_commandline.py
31 files changed, 973 insertions(+), 254 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5026/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4510: Selectively filter args for metric verification tests

2016-11-17 Thread David Knupp (Code Review)
David Knupp has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5135

Change subject: IMPALA-4510: Selectively filter args for metric verification 
tests
..

IMPALA-4510: Selectively filter args for metric verification tests

run-tests.py is a wrapper around impala-pytest. It abstracts away
the need to invoke separate individual runs for serial tests,
parallel tests, and metric verification tests.

Because it's possible for a user to specify certain test suites,
or even specific tests, on the command line when calling
run-tests.py, it had been necessary to override the command line
args when it came time to run the metric verification tests --
otherwise those other tests/suites would be rerun. Before this
patch, we had simply been stripping away all command line args.

However, that blanket approach causes problems when running tests
against a remote cluster, because we need to retain those command
line args that pertain to the remote cluster.

This patch selectively prunes unwanted command line args for the
last metric verification test stage, keeping the ones that we
need, and also adds extensive documentation for explaining why we
have to go through this fairly odd and elaborate step.

Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e
---
M tests/run-tests.py
1 file changed, 55 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/5135/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5135
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I069172f44c1307d55f85779cdb01fecc0ba1799e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 


[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Don't overwrite user's .ssh/config file when bootstrapping
..


Patch Set 3:

> Build failed: 
> http://ec2-35-161-220-160.us-west-2.compute.amazonaws.com:8080/job/gerrit-verify-dryrun/30/

This was just me testing the verification system

-- 
To view, visit http://gerrit.cloudera.org:8080/4967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

This change introduces a new catalogd startup option
(init_first_metastore_client_timeout_seconds) that specifies the
time in seconds catalogd should spend on retrying to establish a
connection to HMS the first time on startup before giving up and
exiting fatally.

Setting this startup option to a value that is greater than the HMS
startup time will allow CM to start Impala at the same time or even
before HMS.

The default value of init_first_metastore_client_timeout_seconds is
120 seconds.

Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Reviewed-on: http://gerrit.cloudera.org:8080/5095
Reviewed-by: Henry Robinson 
Tested-by: Internal Jenkins
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
A tests/experiments/test_catalog_hms_failures.py
10 files changed, 285 insertions(+), 49 deletions(-)

Approvals:
  Henry Robinson: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping

2016-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Don't overwrite user's .ssh/config file when bootstrapping
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Don't overwrite user's .ssh/config file when bootstrapping

2016-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Don't overwrite user's .ssh/config file when bootstrapping
..


Don't overwrite user's .ssh/config file when bootstrapping

>From bash's manual page on redirecting with '>'

Redirection of output causes the file whose name results from the
expansion of word to be opened for writing on file descriptor n,
or the standard output (file descriptor 1) if n is not specified.
If the file does not exist it is created; if it does exist it is
truncated to zero size.

Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a
Reviewed-on: http://gerrit.cloudera.org:8080/4967
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M bin/bootstrap_development.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0d1a56441fcb5a2a2aed043fc1ece866c5d8287a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#5).

Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and 
fixes filter stats updates
..

IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats 
updates

This change codegens HdfsParquetScanner::EvalRuntimeFilters()
by unrolling its loop, codegen'ing the expression evaluation
of the runtime filter and replacing some type information with
constants in the hashing function of runtime filter to avoid
branching at runtime.

This change also fixes IMPALA-4495 by not counting a row as
'considered' in the filter stats before the filter arrives.
This avoids unnecessarily marking a runtime filter as
ineffective before it's even used.

With this change, TPCDS-Q88 improves by 13-14%.
primitive_broadcast_join_1 improves by 24%.

Change-Id: I27114869840e268d17e91d6e587ef811628e3837
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.h
A be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/types.h
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M tests/query_test/test_tpch_queries.py
27 files changed, 507 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..


Patch Set 4: Code-Review+1

Carry Tim's +1 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5105/3/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 273:   if (runtime_state_->CodegenDisabled() && 
!runtime_state_->HasScalarFnToCodegen()) return;
> Long line
Fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-17 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5105

to look at the new patch set (#4).

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..

IMPALA-4432: Handle internal codegen disabling properly

There are some conditions in which codegen is disabled internally
even if it's enabled in the query option. For instance, the single
node optimization or the expression evaluation requests sent from
the FE to the BE. These internal disabling of codegen are advisory
as their purposes are to reduce the latency for tables with no or
very few rows. The internal disabling of codegen doesn't interact
well with UDFs which cannot be interpreted (e.g. IR UDF) as it
conflates the 'disable_codegen' query option set by the user.
As a result, it's hard to differentiate between when codegen is
disabled explicitly by users and when it is disabled internally.

This change fixes the problem above by adding an explicit flag in
TQueryCtx to indicate that codegen is disabled internally. This flag
is only advisory. For cases in which codegen is needed to function,
this internal flag is ignored and if codegen is disabled via query
option, an error is thrown. For this new flag to work with ScalarFnCall,
codegen needs to happen after ScalarFnCall::Prepare() because it's
hard to tell if a fragment contains any UDF that cannot be interpreted
until after ScalarFnCall::Prepare() is called. However, Prepare() needs
the codegen object to codegen so it needs to be created before Prepare().
We can either always create the codegen module or defer codegen to a point
after ScalarFnCall::Prepare(). The former has the downside of introducing
unnecessary latency for say single-node optimization so the latter is
implemented. It is needed as part of IMPALA-4192 any way.

After this change, ScalarFnCall expressions which need to be codegen'd
are inserted into a vector in RuntimeState in ScalarFnCall::Prepare().
Later in the codegen phase, these expressions' GetCodegendComputeFn()
will be called after codegen for operators is done. If any of these
expressions are already codegen'd indirectly by the operators,
GetCodegendComputeFn() will be a no-op. This preserves the behavior
that ScalarFnCall will always be codegen'd even if the fragment
doesn't contain any codegen enabled operators.

Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
---
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/service/fe-support.cc
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M tests/query_test/test_udfs.py
24 files changed, 169 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/5105/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and 
fixes filter stats updates
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4833/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 716:   ctx.stats->IncrCounters(stats_name, 1, 1, !passed_filter);
It seems inconsistent that we're not counting partitions processed before the 
filter arrived but we are counting rows. Would be good to clarify in 
filter-context.h what the counters mean more exactly. I feel like maybe we 
don't need 'total_possible' - 'considered' seems more informative.


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3200: Implement suballocator for splitting buffers
..

IMPALA-3200: Implement suballocator for splitting buffers

This is useful for situations like hash tables, where we want to
make multiple non-spillable allocations of variable size from buffer
pool memory and not incur the overhead of interacting with the global
buffer pool. The allocator subdivides buffers to service allocations
and uses a buddy allocation algorithm to merge freed allocations into
larger chunks. This helps avoid external fragmentation and is quite
effective at reusing memory given the typical doubling allocation
patterns of hash tables in partitioned aggs and joins.

Testing:
The allocator has fairly robust internal consistency checks via
assertions and unique_ptrs. Includes a unit test that exercises
various allocation patterns.

I tested porting hash tables over to allocate memory using an earlier
version of the suballocator, which worked well (was able to run a
wide range of queries successfully with good performance).

Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788
---
M be/src/bufferpool/CMakeLists.txt
A be/src/bufferpool/suballocator-test.cc
A be/src/bufferpool/suballocator.cc
A be/src/bufferpool/suballocator.h
M be/src/common/names.h
5 files changed, 830 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and 
fixes filter stats updates
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4833/3//COMMIT_MSG
Commit Message:

PS3, Line 7: IMPALA-3838, 
> Need to update commit message to mention IMPALA-4495.
Done


http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

PS3, Line 137: LlvmCodeGen::FnPrototype prototype(codegen, "FilterContextEval",
> long line.
Done


http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS3, Line 286: total_poss
> total_possible
Done


PS3, Line 700: noinline
> noinline
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3200: Implement suballocator for splitting buffers
..


Patch Set 4:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/4715/4//COMMIT_MSG
Commit Message:

PS4, Line 13:  buddy allocation
> Buddy allocators tend to have high internal fragmentation.Did you consider 
The main use case is to allocate power-of-two buffers, so currently this isn't 
a problem. Maybe there are some other use cases down the line but for now it 
seems better to go with a simple algorithm.


http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator-test.cc
File be/src/bufferpool/suballocator-test.cc:

Line 28: 
> why the gap?
Done


Line 44:   virtual void SetUp() {
> override, here and below
Done


PS4, Line 63:  != NULL
> elide
I think this is a matter of taste, but we've generally standardised on explicit 
null checks.


PS4, Line 68: Random
> 'Randomness' or 'PRNG'; the seed itself is not random
Wikipedia knows what I mean :) https://en.wikipedia.org/wiki/Random_seed


Line 75:   void VerifyMemoryValid(const vector& 
allocs);
> static
Done


Line 78:   void FreeAllocations(
> static
Done


PS4, Line 80: unique_ptr
> auto
Done


PS4, Line 88: default_random_engine
> It might be good to pick one for repeatability reasons
Done


http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.cc
File be/src/bufferpool/suballocator.cc:

Line 40: free_lists_[i] = NULL;
> This is the default. If you elide this loop, this should happen anyway.
Done


Line 59:   const int target_list_idx = Bits::Log2CeilingNonZero64(bytes);
> How do you know bytes is non-zero?
This is now handled with the minimum allocation size logic.


Line 60:   for (int i = target_list_idx; i < ALLOCATION_SIZES; ++i) {
> This can be O(1) with ctz and a bitmap for the freelist emptiness
It's O(1) now just with a higher constant factor assuming an upper bound on 
allocation sizes. I'd say we should defer this unless we actually see it become 
a bottleneck.


Line 88:   int64_t buffer_len = max(min_buffer_len_, 
BitUtil::RoundUpToPowerOfTwo(bytes));
> const
Done


Line 127: if (!status.ok()) {
> if (!Suballocation::Create([i]).ok()) {...
Done


Line 144: int64_t child_len = free_node->len_ / 2;
> const
Done


PS4, Line 233: != NULL
> elide
We wouldn't normally elide it.


http://gerrit.cloudera.org:8080/#/c/4715/4/be/src/bufferpool/suballocator.h
File be/src/bufferpool/suballocator.h:

Line 31: enum class ExpandReservations {
> ExpansionStrategy, maybe?
Done


Line 37: };
> This can live inside Suballocator, right?
It can but we end up with Suballocator::ExpansionStrategy everywhere, which is 
pretty verbose.


PS4, Line 39: buddy
> There are a few different variants of buddy systems. It might be good to sp
Elaborated slightly in this comment.


PS4, Line 43: relatively large allocations
> Yet free_lists_ goes all the way down to one byte? Why not enforce this at 
I think my main concern there would be that the memory overhead of the 
allocations isn't tracked, so could overwhelm the actual tracked memory.

I added logic to enforce a minimum allocation size and round up allocations to 
that amount (DCHECKing or similar seems a little awkward).


Line 50:   ExpandReservations expand_reservations);
> Can you annotate these params in a comment
Done


Line 54:   /// Allocate bytes from BufferPool. The allocation is NULL if 
unsuccessful because
> nullptr, here and elsewhere
Done


Line 71:   /// Allocate a buffer of size 'bytes' from the buffer pool and 
initialize 'result'
> bytes must be less than MAX_ALLOCATION_BYTES
Done


PS4, Line 79: Can fail if malloc() fails
> if new returns nullptr
Reworded to somethign more generic about memory allocation failing.


PS4, Line 109: by
> 'but'
Done


PS4, Line 127: must
> or else leaked memory or some kind of segfault or nasal demons?
Done


PS4, Line 132: Destructor
> Elide
Done


Line 138:  private:
> also disallow_copy_and_assign?
Done


Line 152:   /// The buffer backing the suballocation, if the suballocation is 
backed by an entire
> capitalization, here and elsewhere: Suballocation and Allocation
Not sure why we'd capitalise allocation since it's not a class.


Line 153:   /// buffer. Otherwise uninitialized. 'buffer_' is open iff buddy is 
NULL.
> buddy_
Done


Line 156:   /// If this is a left child, the parent of this and its buddy. The 
parent's allocation
> It would be helpful to explain the tree structure first.
Added some explanation to the class comment.


Line 159:   std::unique_ptr parent_;
> Can the parent be a buddy, thus being pointed to by its buddy and one of it
Yes. I think the explanation in the class comment should cover it.


Line 171:   Suballocation* prev_free_;
> so next_free_ is not really unique? I'm troubled by this design pattern. Di
unique_ptr to me means unique ownership, which doesn't preclude a raw 

[Impala-ASF-CR] IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats updates

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#4).

Change subject: IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and 
fixes filter stats updates
..

IMPALA-3838, IMPALA-4495: Codegen EvalRuntimeFilters() and fixes filter stats 
updates

This change codegens HdfsParquetScanner::EvalRuntimeFilters()
by unrolling its loop, codegen'ing the expression evaluation
of the runtime filter and replacing some type information with
constants in the hashing function of runtime filter to avoid
branching at runtime.

This change also fixes IMPALA-4495 by not counting a row as
'considered' in the filter stats before the filter arrives.
This avoids unnecessarily marking a runtime filter as
ineffective before it's even used.

With this change, TPCDS-Q88 improves by 13-14%.
primitive_broadcast_join_1 improves by 24%.

Change-Id: I27114869840e268d17e91d6e587ef811628e3837
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.h
A be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/types.h
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M tests/query_test/test_tpch_queries.py
27 files changed, 509 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3838: Codegen EvalRuntimeFilters().
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

PS3, Line 137: LlvmCodeGen::FnPrototype prototype(codegen, "FilterContextEval", 
codegen->boolean_type());
long line.


http://gerrit.cloudera.org:8080/#/c/4833/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS3, Line 286: considered
total_possible


PS3, Line 700: alwaysinline
noinline


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3838: Codegen EvalRuntimeFilters().
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4833/3//COMMIT_MSG
Commit Message:

PS3, Line 7: IMPALA-3838: 
Need to update commit message to mention IMPALA-4495.


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-3838: Codegen EvalRuntimeFilters().
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4833/1/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 73:   if (stats->enabled && ctx->filter->HasBloomFilter()) {
> I had it reversed - as far as I can see, before filter arrival, it will inc
Thanks for filing the bug. It's fixed in the new patch.


http://gerrit.cloudera.org:8080/#/c/4833/2/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

PS2, Line 61: ow against
> Is it a tuple row? I though it was a raw value pointer?
It means to say the value is computed by evaluating a tuple row against the 
expression. Comment rephrased.


-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3838: Codegen EvalRuntimeFilters().

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-3838: Codegen EvalRuntimeFilters().
..

IMPALA-3838: Codegen EvalRuntimeFilters().

This change codegens HdfsParquetScanner::EvalRuntimeFilters()
by unrolling its loop, codegen'ing the expression evaluation
of the runtime filter and replacing some type information with
constants.

With this change, TPCDS-Q88 improves by 13-14%.
primitive_broadcast_join_1 improves by 24%.

Change-Id: I27114869840e268d17e91d6e587ef811628e3837
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.h
A be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/runtime/types.h
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M tests/query_test/test_tpch_queries.py
27 files changed, 509 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/4833/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4833
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I27114869840e268d17e91d6e587ef811628e3837
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3726: Add support for Kudu-specific column options
..


Patch Set 7: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/499/

-- 
To view, visit http://gerrit.cloudera.org:8080/5026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4410: Safer tear-down of RuntimeState
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc
File be/src/runtime/test-env.cc:

Line 109:   for (shared_ptr& runtime_state : runtime_states_) {
> Any thoughts?
The second alternative makes most sense to me since it preserves the original 
intent.


-- 
To view, visit http://gerrit.cloudera.org:8080/4893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


IMPALA-4490: Only generate runtime filters for hash join nodes.

Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Reviewed-on: http://gerrit.cloudera.org:8080/5117
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
2 files changed, 34 insertions(+), 5 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState

2016-11-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4410: Safer tear-down of RuntimeState
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc
File be/src/runtime/test-env.cc:

Line 109:   for (shared_ptr& runtime_state : runtime_states_) {
> What do you propose - summing over the distinct query IDs? Or checking in C
Any thoughts?


-- 
To view, visit http://gerrit.cloudera.org:8080/4893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> For context: do you also think in most cases it is a bug if we allow unsign
Mostly yes, although we don't use unsigned ints in many places in the codebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> I'm assuming we'll disable the UBSAN checking temporarily, but later come b
For context: do you also think in most cases it is a bug if we allow unsigned 
integers to overflow?


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> -fwrapv and the ubsan check are independent, so we can enable one and disab
I'm assuming we'll disable the UBSAN checking temporarily, but later come back 
and fix it. I think in most cases it's a bug if we allow signed integers to 
overflow. It would be good to identify the subset of places where it is 
expected and make it explicit that overflow is expected.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> Not sure if that's sufficient, may require some runtime configuration too. 
-fwrapv and the ubsan check are independent, so we can enable one and disable 
the other.

What kind of bugs do you think -fwrapv will mask? Will those bugs still be 
masked even if there is no ubsan checking?


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4863/15/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

PS15, Line 329: key
Could do std:move() to avoid copy of string


http://gerrit.cloudera.org:8080/#/c/4863/15/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

PS15, Line 168: form
from


Line 214:   /// Appends all rows in batch to the temporary Hdfs files their 
respective partitions.
You're missing a verb in here I think


Line 269:   PartitionPair* current_clustered_partition_;
"Only set if 'input_is_clustered_' is true"


Line 272:   /// batches.
"Only set if 'input_is_clustered_' is true"


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-17 Thread Tim Armstrong (Code Review)
Hello Michael Ho,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4956

to look at the new patch set (#8).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,467 insertions(+), 165 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS7, Line 602: codegen
> It still doesn't track the memory consumed by the LLVM module (e.g. Instruc
The optimisation memory estimate does include the IR, but only after it's 
constructed, since we don't really know how large it will be until it's 
generated.


http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS7, Line 809: ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD
> nit: May be worth refactoring it into a singe variable with meaningful name
Done


http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS7, Line 1706:  if (aggregate_evaluators_.size() > 
Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) {
> As discussed in person, it may be worth double checking whether min / max /
I experimented with a code change that disabled inlining always, and looked at 
the agg node time. I saw: 25% regression for a single sum(decimal), 3% 
regression for a single ndv(decimal) and a 2% speedup for compute stats on 
lineitem. For sum() of only numeric columns in lineitem (8 columns) I saw a 23% 
slowdown.

It could be worth trying to special-case those aggregate functions, but it 
seems like affected queries would be pretty rare. It's also a little tricky 
since the input expressions could be arbitrarily complex.


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> Thanks. Similar comments about the need to spell out the difference between
There wasn't a particularly natural place to document this general diea. I 
chose to clarify in the thrift structure that disabling codegen means that the 
backend should do it if it supports it. That seemed the best place since it's 
the interface between the FE and BE.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has submitted this change and it was merged.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


IMPALA-3398: Add docs to main Impala branch.

These are refugees from doc_prototype. They can be rendered with the
DITA Open Toolkit version 2.3.3 by:

/tmp/dita-ot-2.3.3/bin/dita \
  -i impala.ditamap \
  -f html5 \
  -o $(mktemp -d) \
  -filter impala_html.ditaval

Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Reviewed-on: http://gerrit.cloudera.org:8080/5014
Reviewed-by: John Russell 
Tested-by: John Russell 
---
A docs/Cloudera-Impala-Release-Notes.ditamap
A docs/generatingImpalaDoc.md
A docs/images/howto_access_control.png
A docs/images/howto_per_node_peak_memory_usage.png
A docs/images/howto_show_histogram.png
A docs/images/howto_static_server_pools_config.png
A docs/images/impala_arch.jpeg
A docs/images/support_send_diagnostic_data.png
A docs/impala.ditamap
A docs/impala_html.ditaval
A docs/impala_pdf.ditaval
A docs/impala_sqlref.ditamap
A docs/shared/ImpalaVariables.xml
A docs/shared/impala_common.xml
A docs/topics/impala.xml
A docs/topics/impala_abort_on_default_limit_exceeded.xml
A docs/topics/impala_abort_on_error.xml
A docs/topics/impala_admin.xml
A docs/topics/impala_admission.xml
A docs/topics/impala_aggregate_functions.xml
A docs/topics/impala_aliases.xml
A docs/topics/impala_allow_unsupported_formats.xml
A docs/topics/impala_alter_function.xml
A docs/topics/impala_alter_table.xml
A docs/topics/impala_alter_view.xml
A docs/topics/impala_analytic_functions.xml
A docs/topics/impala_appx_count_distinct.xml
A docs/topics/impala_appx_median.xml
A docs/topics/impala_array.xml
A docs/topics/impala_auditing.xml
A docs/topics/impala_authentication.xml
A docs/topics/impala_authorization.xml
A docs/topics/impala_avg.xml
A docs/topics/impala_avro.xml
A docs/topics/impala_batch_size.xml
A docs/topics/impala_bigint.xml
A docs/topics/impala_bit_functions.xml
A docs/topics/impala_boolean.xml
A docs/topics/impala_breakpad.xml
A docs/topics/impala_cdh.xml
A docs/topics/impala_char.xml
A docs/topics/impala_cluster_sizing.xml
A docs/topics/impala_cm_installation.xml
A docs/topics/impala_comments.xml
A docs/topics/impala_complex_types.xml
A docs/topics/impala_components.xml
A docs/topics/impala_compression_codec.xml
A docs/topics/impala_compute_stats.xml
A docs/topics/impala_concepts.xml
A docs/topics/impala_conditional_functions.xml
A docs/topics/impala_config.xml
A docs/topics/impala_config_options.xml
A docs/topics/impala_config_performance.xml
A docs/topics/impala_connecting.xml
A docs/topics/impala_conversion_functions.xml
A docs/topics/impala_count.xml
A docs/topics/impala_create_data_source.xml
A docs/topics/impala_create_database.xml
A docs/topics/impala_create_function.xml
A docs/topics/impala_create_role.xml
A docs/topics/impala_create_table.xml
A docs/topics/impala_create_view.xml
A docs/topics/impala_data_sources.xml
A docs/topics/impala_databases.xml
A docs/topics/impala_datatypes.xml
A docs/topics/impala_date.xml
A docs/topics/impala_datetime_functions.xml
A docs/topics/impala_ddl.xml
A docs/topics/impala_debug_action.xml
A docs/topics/impala_decimal.xml
A docs/topics/impala_default_order_by_limit.xml
A docs/topics/impala_delegation.xml
A docs/topics/impala_delete.xml
A docs/topics/impala_describe.xml
A docs/topics/impala_development.xml
A docs/topics/impala_disable_cached_reads.xml
A docs/topics/impala_disable_codegen.xml
A docs/topics/impala_disable_outermost_topn.xml
A docs/topics/impala_disable_row_runtime_filtering.xml
A docs/topics/impala_disable_streaming_preaggregations.xml
A docs/topics/impala_disable_unsafe_spills.xml
A docs/topics/impala_disk_space.xml
A docs/topics/impala_distinct.xml
A docs/topics/impala_dml.xml
A docs/topics/impala_double.xml
A docs/topics/impala_drop_data_source.xml
A docs/topics/impala_drop_database.xml
A docs/topics/impala_drop_function.xml
A docs/topics/impala_drop_role.xml
A docs/topics/impala_drop_stats.xml
A docs/topics/impala_drop_table.xml
A docs/topics/impala_drop_view.xml
A docs/topics/impala_errata.xml
A docs/topics/impala_exec_single_node_rows_threshold.xml
A docs/topics/impala_explain.xml
A docs/topics/impala_explain_level.xml
A docs/topics/impala_explain_plan.xml
A docs/topics/impala_faq.xml
A docs/topics/impala_faq_base.xml
A docs/topics/impala_features.xml
A docs/topics/impala_file_formats.xml
A docs/topics/impala_fixed_issues.xml
A docs/topics/impala_float.xml
A docs/topics/impala_functions.xml
A docs/topics/impala_functions_overview.xml
A docs/topics/impala_glossary.xml
A docs/topics/impala_grant.xml
A docs/topics/impala_group_by.xml
A docs/topics/impala_group_concat.xml
A docs/topics/impala_hadoop.xml
A docs/topics/impala_having.xml
A docs/topics/impala_hbase.xml
A docs/topics/impala_hbase_cache_blocks.xml
A docs/topics/impala_hbase_caching.xml
A docs/topics/impala_hints.xml
A docs/topics/impala_howto_rm.xml

[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 2: Code-Review+2 Verified+1

Got clean build of both SQL Ref and entire Impala doc bundle.

-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap
File docs/impala_sqlref.ditamap:

PS1, Line 6: 
> conref="shared/ImpalaVariables.xml#impala_vars/prodinfo_for_html"
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#2).

Change subject: IMPALA-3398: Add docs to main Impala branch.
..

IMPALA-3398: Add docs to main Impala branch.

These are refugees from doc_prototype. They can be rendered with the
DITA Open Toolkit version 2.3.3 by:

/tmp/dita-ot-2.3.3/bin/dita \
  -i impala.ditamap \
  -f html5 \
  -o $(mktemp -d) \
  -filter impala_html.ditaval

Change-Id: I8861e99adc446f659a04463ca78c79200669484f
---
A docs/Cloudera-Impala-Release-Notes.ditamap
A docs/generatingImpalaDoc.md
A docs/images/howto_access_control.png
A docs/images/howto_per_node_peak_memory_usage.png
A docs/images/howto_show_histogram.png
A docs/images/howto_static_server_pools_config.png
A docs/images/impala_arch.jpeg
A docs/images/support_send_diagnostic_data.png
A docs/impala.ditamap
A docs/impala_html.ditaval
A docs/impala_pdf.ditaval
A docs/impala_sqlref.ditamap
A docs/shared/ImpalaVariables.xml
A docs/shared/impala_common.xml
A docs/topics/impala.xml
A docs/topics/impala_abort_on_default_limit_exceeded.xml
A docs/topics/impala_abort_on_error.xml
A docs/topics/impala_admin.xml
A docs/topics/impala_admission.xml
A docs/topics/impala_aggregate_functions.xml
A docs/topics/impala_aliases.xml
A docs/topics/impala_allow_unsupported_formats.xml
A docs/topics/impala_alter_function.xml
A docs/topics/impala_alter_table.xml
A docs/topics/impala_alter_view.xml
A docs/topics/impala_analytic_functions.xml
A docs/topics/impala_appx_count_distinct.xml
A docs/topics/impala_appx_median.xml
A docs/topics/impala_array.xml
A docs/topics/impala_auditing.xml
A docs/topics/impala_authentication.xml
A docs/topics/impala_authorization.xml
A docs/topics/impala_avg.xml
A docs/topics/impala_avro.xml
A docs/topics/impala_batch_size.xml
A docs/topics/impala_bigint.xml
A docs/topics/impala_bit_functions.xml
A docs/topics/impala_boolean.xml
A docs/topics/impala_breakpad.xml
A docs/topics/impala_cdh.xml
A docs/topics/impala_char.xml
A docs/topics/impala_cluster_sizing.xml
A docs/topics/impala_cm_installation.xml
A docs/topics/impala_comments.xml
A docs/topics/impala_complex_types.xml
A docs/topics/impala_components.xml
A docs/topics/impala_compression_codec.xml
A docs/topics/impala_compute_stats.xml
A docs/topics/impala_concepts.xml
A docs/topics/impala_conditional_functions.xml
A docs/topics/impala_config.xml
A docs/topics/impala_config_options.xml
A docs/topics/impala_config_performance.xml
A docs/topics/impala_connecting.xml
A docs/topics/impala_conversion_functions.xml
A docs/topics/impala_count.xml
A docs/topics/impala_create_data_source.xml
A docs/topics/impala_create_database.xml
A docs/topics/impala_create_function.xml
A docs/topics/impala_create_role.xml
A docs/topics/impala_create_table.xml
A docs/topics/impala_create_view.xml
A docs/topics/impala_data_sources.xml
A docs/topics/impala_databases.xml
A docs/topics/impala_datatypes.xml
A docs/topics/impala_date.xml
A docs/topics/impala_datetime_functions.xml
A docs/topics/impala_ddl.xml
A docs/topics/impala_debug_action.xml
A docs/topics/impala_decimal.xml
A docs/topics/impala_default_order_by_limit.xml
A docs/topics/impala_delegation.xml
A docs/topics/impala_delete.xml
A docs/topics/impala_describe.xml
A docs/topics/impala_development.xml
A docs/topics/impala_disable_cached_reads.xml
A docs/topics/impala_disable_codegen.xml
A docs/topics/impala_disable_outermost_topn.xml
A docs/topics/impala_disable_row_runtime_filtering.xml
A docs/topics/impala_disable_streaming_preaggregations.xml
A docs/topics/impala_disable_unsafe_spills.xml
A docs/topics/impala_disk_space.xml
A docs/topics/impala_distinct.xml
A docs/topics/impala_dml.xml
A docs/topics/impala_double.xml
A docs/topics/impala_drop_data_source.xml
A docs/topics/impala_drop_database.xml
A docs/topics/impala_drop_function.xml
A docs/topics/impala_drop_role.xml
A docs/topics/impala_drop_stats.xml
A docs/topics/impala_drop_table.xml
A docs/topics/impala_drop_view.xml
A docs/topics/impala_errata.xml
A docs/topics/impala_exec_single_node_rows_threshold.xml
A docs/topics/impala_explain.xml
A docs/topics/impala_explain_level.xml
A docs/topics/impala_explain_plan.xml
A docs/topics/impala_faq.xml
A docs/topics/impala_faq_base.xml
A docs/topics/impala_features.xml
A docs/topics/impala_file_formats.xml
A docs/topics/impala_fixed_issues.xml
A docs/topics/impala_float.xml
A docs/topics/impala_functions.xml
A docs/topics/impala_functions_overview.xml
A docs/topics/impala_glossary.xml
A docs/topics/impala_grant.xml
A docs/topics/impala_group_by.xml
A docs/topics/impala_group_concat.xml
A docs/topics/impala_hadoop.xml
A docs/topics/impala_having.xml
A docs/topics/impala_hbase.xml
A docs/topics/impala_hbase_cache_blocks.xml
A docs/topics/impala_hbase_caching.xml
A docs/topics/impala_hints.xml
A docs/topics/impala_howto_rm.xml
A docs/topics/impala_identifiers.xml
A docs/topics/impala_impala_shell.xml
A docs/topics/impala_incompatible_changes.xml
A docs/topics/impala_insert.xml
A 

[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 1:

(1 comment)

Appended the updated attribute key/value pair.

http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap
File docs/impala_sqlref.ditamap:

PS1, Line 6: 
> What is the most recent URI?
conref="shared/ImpalaVariables.xml#impala_vars/prodinfo_for_html"


-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5014/1/docs/impala_sqlref.ditamap
File docs/impala_sqlref.ditamap:

PS1, Line 6: 
> Stale URI. Causes build error.
What is the most recent URI?


-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


IMPALA-3202: refactor scratch file management into TmpFileMgr

This is a pure refactoring patch that moves all of the logic
for allocating scratch file ranges into TmpFileMgr in anticipation of
this logic being used by the new BufferPool.

There should be no behavioural changes.

Also remove a bunch of TODOs that we're not going to fix.

Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Reviewed-on: http://gerrit.cloudera.org:8080/4898
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/runtime/buffered-block-mgr-test.cc
M be/src/runtime/buffered-block-mgr.cc
M be/src/runtime/buffered-block-mgr.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
6 files changed, 441 insertions(+), 432 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py

2016-11-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> There are 22 places alone in the backend tests where this happens, and they
Not sure if that's sufficient, may require some runtime configuration too. 

My feeling is that -fwrapv is the wrong fix anyway and will mask genuine bugs. 
Instead we should fix all instances where we expect it to overflow (if we want 
2s-complement overflow in a particular place, we can explicitly cast to 
unsigned then back).

Maybe the best course forward it to disable this UBSAN check for now and deal 
with it in a follow-up patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-17 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 14: date
year*


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation

2016-11-17 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/6//COMMIT_MSG
Commit Message:

PS6, Line 12: did't
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 7: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS7, Line 602: codegen
It still doesn't track the memory consumed by the LLVM module (e.g. 
Instruction, BasicBlock objects), right ?


http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

PS7, Line 809: ctxs.size() > Expr::CODEGEN_INLINE_EXPRS_THRESHOLD
nit: May be worth refactoring it into a singe variable with meaningful name for 
easier understanding.


http://gerrit.cloudera.org:8080/#/c/4956/7/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

PS7, Line 1706:  if (aggregate_evaluators_.size() > 
Expr::CODEGEN_INLINE_EXPRS_THRESHOLD) {
As discussed in person, it may be worth double checking whether min / max / sum 
on wide table doesn't regress as UpdateSlot() is rather small for these 
aggregate evaluators. It probably doesn't but it'd be good to avoid surprises.

For instance, select 
min(col1),max(col1),sum(col1),,,min(col26),max(col26),sum(col26) from 
wide_table makes a difference without inlining for UpdateSlot() into 
UpdateTuple().


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> What are some examples of where we overflow signed ints?
/home/jbapple/Impala/be/src/gutil/strings/numbers.cc:954:17: runtime error: 
signed integer overflow: 42 * 1 cannot be represented in type 'int'
#0 0x2b30a5f3ecfd in FastUInt32ToBufferLeft(unsigned int, char*) 
/home/jbapple/Impala/be/src/gutil/strings/numbers.cc:954:17
#1 0x2b30a5f3fb7e in FastUInt64ToBufferLeft(unsigned long long, char*) 
/home/jbapple/Impala/be/src/gutil/strings/numbers.cc:1037:24
#2 0x2b30a5f3e3ac in FastInt64ToBufferLeft(long long, char*) 
/home/jbapple/Impala/be/src/gutil/strings/numbers.cc:1080:10
#3 0x9cf0f8 in strings::internal::SubstituteArg::SubstituteArg(long) 
/home/jbapple/Impala/be/src/gutil/strings/substitute.h:106:35
#4 0x2b309d7988a3 in 
impala::HdfsAvroScanner::SetStatusValueOverflow(impala::TErrorCode::type, long, 
long) /home/jbapple/Impala/be/src/exec/hdfs-avro-scanner.cc:677:53
#5 0x2b309d7be0bc in 
impala::HdfsAvroScanner::ReadAvroString(impala::PrimitiveType, unsigned char**, 
unsigned char*, bool, void*, impala::MemPool*) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:241:7
#6 0x7e2e47 in void 
impala::HdfsAvroScannerTest::TestReadAvroType(bool 
(impala::HdfsAvroScanner::*)(impala::PrimitiveType, unsigned char**, unsigned 
char*, bool, void*, impala::MemPool*), impala::PrimitiveType, unsigned char*, 
long, impala::StringValue, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:85:20
#7 0x7c9a66 in impala::HdfsAvroScannerTest::TestReadAvroString(unsigned 
char*, long, impala::StringValue, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:179:5
#8 0x7bab3d in 
impala::HdfsAvroScannerTest_StringLengthOverflowTest_Test::TestBody() 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:348:5





/home/jbapple/Impala/be/src/exprs/expr-test.cc:819:36: runtime error: signed 
integer overflow: -9223372036854775807 + -9223372036854775807 cannot be 
represented in type 'long'
#0 0xb901cd in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:819:36
#1 0x80be58 in impala::ExprTest_ArithmeticExprs_Test::TestBody() 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:1180:3







/home/jbapple/Impala/be/src/exprs/expr-test.cc:823:36: runtime error: signed 
integer overflow: -9223372036854775807 * -9223372036854775807 cannot be 
represented in type 'long'
#0 0xb90783 in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:823:36
#1 0x80be58 in impala::ExprTest_ArithmeticExprs_Test::TestBody() 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:1180:3







/home/jbapple/Impala/be/src/exprs/expr-test.cc:821:36: runtime error: signed 
integer overflow: -9223372036854775807 - 9223372036854775807 cannot be 
represented in type 'long'
#0 0xb9049d in void impala::ExprTest::TestFixedResultTypeOps(long, long, impala::ColumnType const&) 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:821:36
#1 0x80c1a5 in impala::ExprTest_ArithmeticExprs_Test::TestBody() 
/home/jbapple/Impala/be/src/exprs/expr-test.cc:1184:3


[ RUN  ] ExprTest.MathConversionFunctions
/home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:370:24: runtime error: 
signed integer overflow: 2 * 4738381338321616896 cannot be repr
esented in type 'long'





/home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:371:13: runtime error: 
signed integer overflow: 4738381338321616896 * 36 cannot be represented in type 
'long'
#0 0x2b1855eb2f88 in impala::MathFunctions::DecimalInBaseToDecimal(long, 
signed char, long*) 
/home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:371:13
#1 0x2b1855eb2bb3 in 
impala::MathFunctions::ConvInt(impala_udf::FunctionContext*, 
impala_udf::BigIntVal const&, impala_udf::TinyIntVal const&, 
impala_udf::TinyIntVal const&) 
/home/jbapple/Impala/be/src/exprs/math-functions-ir.cc:293:10
#2 0x2b18560f4152 in impala_udf::StringVal 
impala::ScalarFnCall::InterpretEval(impala::ExprContext*,
 impala::TupleRow const*) 
/home/jbapple/Impala/be/src/exprs/scalar-fn-call.cc:579:580
#3 0x2b185602e0d5 in 
impala::ScalarFnCall::GetStringVal(impala::ExprContext*, impala::TupleRow 
const*) 

[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options

2016-11-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3726: Add support for Kudu-specific column options
..


Patch Set 7: Code-Review+2

(2 comments)

Rebase and fixing tests, carry Alex's +2

http://gerrit.cloudera.org:8080/#/c/5026/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 129:   public ColumnDef(String colName, TypeDef typeDef) {
> Seems like we'd often need a C'tor that also takes a comment string. You ca
Not as simple as it sounds, we need to make sure we don't insert nulls and then 
populate the map. But because the call to the delegating constructor needs to 
be the first statement that needs to happen outside the constructor.


http://gerrit.cloudera.org:8080/#/c/5026/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 250:   def test_primary_key_and_distribution(self, cursor):
> AUTO and DEFAULT still seem uninformative. Let's revisit later like you sug
I agree for encoding, not so much for compression since it can have 
no_compression. Anyways, as you said, let's revisit this later.


-- 
To view, visit http://gerrit.cloudera.org:8080/5026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3726: Add support for Kudu-specific column options

2016-11-17 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5026

to look at the new patch set (#7).

Change subject: IMPALA-3726: Add support for Kudu-specific column options
..

IMPALA-3726: Add support for Kudu-specific column options

This commit adds support for Kudu-specific column options in CREATE
TABLE statements. The syntax is:
CREATE TABLE tbl_name ([col_name type [PRIMARY KEY] [option [...]]] [, ])
where option is:
| NULL
| NOT NULL
| ENCODING encoding_val
| COMPRESSION compression_algorithm
| DEFAULT expr
| BLOCK_SIZE num

The output of the SHOW CREATE TABLE statement was altered to include all the 
specified
column options for Kudu tables.

Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
---
M be/src/exec/kudu-table-sink.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddReplaceColsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableChangeColStmt.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaParser.java
M fe/src/main/java/org/apache/impala/util/AvroSchemaUtils.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
30 files changed, 968 insertions(+), 249 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5026/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I727b9ae1b7b2387db752b58081398dd3f3449c02
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

(1 comment)

It might be best to special-case the local backend. At a high level this is the 
most intuitive thing to do. It seems like we already special-case the local 
backend to some extent in UpdateMembership(). Otherwise waiting until the 
membership update is received may make sense.

http://gerrit.cloudera.org:8080/#/c/5127/1/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

Line 300:   BackendConfigPtr backend_config = GetBackendConfig();
I think it would make sense to detect at this point if the backend config 
doesn't have the coordinator in it. Then we could wait for the config to be 
updated (maybe with a timeout) before failing the query.

Even if we don't crash, it seems very strange for a coordinator to not schedule 
a query on itself in the !exec_on_coord case.


-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> There are 22 places alone in the backend tests where this happens, and they
What are some examples of where we overflow signed ints?


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> My preference is not to add this option and to fix any cases where values a
There are 22 places alone in the backend tests where this happens, and they all 
look intentional to me. I'd suggest that this is revealed intent.

I'm fine with adding -fwrapv to the codegen flags. That's CLANG_IR_CXX_FLAGS?


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4490: Only generate runtime filters for hash join nodes.

2016-11-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4490: Only generate runtime filters for hash join nodes.
..


Patch Set 3:

Any more comments?

-- 
To view, visit http://gerrit.cloudera.org:8080/5117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I167725e260bd0f91c2bfc164eb044321192d5b95
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

Thanks for your feedback. I agree that this is not optimal, and if there was a 
way to send the equivalent of EAGAIN to the client and make it try again either 
with this or a different coordinator that could be a workaround. I'd also be 
happy to add a small timeout of a few seconds before returning an error. I'm 
rather reluctant to change the way we register the local backend with itself, 
since that way we will have two mechanisms of updating the membership 
information and need to handle conflicts as well.

That being said, between a) Relying on the client to try again, b) waiting for 
5 seconds (or whatever value), and c) reworking the cluster membership 
information management code in the scheduler, which one do you prefer?

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions

2016-11-17 Thread Dan Hecht (Code Review)
Dan Hecht has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5129

Change subject: IMPALA-4440: lineage timestamps can go backwards across 
daylight savings transitions
..

IMPALA-4440: lineage timestamps can go backwards across daylight savings 
transitions

Using TimestampValue (or equivalent string representation) for
timestamps that require a point in time doesn't work because the same
time can represent multiple point in times.  For example, the timestamp:
'2016-11-13 01:01 AM' occurred twice last weekend.

Instead, we should use unix time directly rather than trying to derive
unix time from a (timezone-less) timestamp.

Note that there are other questionable uses of TimestampValue for
internal Impala service stuff, but I want to fix them separately as they
are not as important and fixing does add some risk.

While I'm here, remove a template TimestampValue constructor that was
unused and is confusing.

We don't have any end-to-end tests that exercise column lineage, so add
a simple custom cluster test that enables lineage and verifes the start
and end unix times are within appropriate bounds.  The other column
lineage graph fields are at least tested via planner tests.

Automated regression testing for the specifc daylight savings issue is
difficult as we'd have to cross the daylight savings boundary at just
the right time during query execution in order to reproduce
reliably. But open to ideas.

Testing:
- loop the new test overnight without any failures.
- exhaustive run.

Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9
---
M be/src/runtime/timestamp-value.h
M be/src/service/impala-server.cc
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A tests/custom_cluster/test_lineage.py
6 files changed, 87 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/5129/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

Line 40: #  -fwrapv: force signed integer arithmetic operations to wrap
> This was already assumed, and a number of our tests depend on it holding.
My preference is not to add this option and to fix any cases where values are 
actually wrapping at runtime. 

This could have performance implications, since a lot of "obvious" loop 
optimisations can depend on array indices, etc not wrapping.

Also we're not enabling this for codegen, which potentially add a gap to our 
test coverage.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..


Patch Set 1:

This is better than crashing, but it also seems like we shouldn't fail queries 
for this reason. Probably we should wait until we've gotten the statestore 
update message before trying to schedule queries (either that or make sure the 
local backend is always present).

-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4494: Fix crash in SimpleScheduler

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5127

Change subject: IMPALA-4494: Fix crash in SimpleScheduler
..

IMPALA-4494: Fix crash in SimpleScheduler

The scheduler maintains a local list of active backends, which is
updated through messages from the statestore. Even the local backend
enters this list by registering with the statestore and being included
in a statestore update message. Thus, during restarts it can happen that
a query gets scheduled with exec_at_coord set to true, while the local
backend has not been registered with the scheduler. In this case the IP
address lookup in the internal BackendConfig fails and an empty IP
address is returned, leading to a nullptr dereference down the line.

This change adds an additional check to ensure that the local backend
has been registered with the scheduler before issuing scan ranges when
exec_at_coord is set.

Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
---
M be/src/scheduling/simple-scheduler-test.cc
M be/src/scheduling/simple-scheduler.cc
M common/thrift/generate_error_codes.py
3 files changed, 32 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/5127/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e1196a2fa47e5954c4a190aa326c135d039a77f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4956/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

PS4, Line 190: CODEGEN_WRITE_TUPLES_MAX_COLUMNS = 250;
> I agree that it's more actionable if it's based on number of tuples.
This is more about avoiding codegen blow-ups instead of making a cost-based 
decision (that's tricky because the codegen cost is non-linear and 
unpredictable and we'd need to build a cost model for the scanners factoring in 
# rows,# columns, predicates, etc).

I don't think there's much point in modelling the cost unless a) there is a 
substantial benefit to justify the complexity and b) the cost model is going to 
be fairly accurate. Otherwise we end up in an uncomfortable middle ground with 
a half-baked code model where the code is more complex but we don't get 
measurable benefit or have confidence in the cost model's accuracy.

I'd rather not do this from the planner because it's tied up in the 
implementation details of codegen for the scanners. The only reason we have a 
hard threshold is because the backend codegen logic builds everything in a 
single function. If that changed we could switch to the approach we use 
elsewhere of disabling inlining. It makes more sense to me for the planner to 
make a high-level decision about enabling codegen for the node, then the 
backend implements that decision (or bails out if it can't do the codegen 
without blowing up).


http://gerrit.cloudera.org:8080/#/c/4956/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS4, Line 823: if (!hasGrouping) mergeAggNode.setDisableCodegen(true);
> This isn't exactly what I had in mind. If you look at ExchangeNode::Codegen
I think it's different from the agg node. For the agg node we know based on the 
plan shape that the number of input rows is small, so even though we could 
codegen the agg, it's not worth it. But as far as I can see the only reason we 
don't codegen non-merging exchanges in general is that there's no backend 
support for it.

There's no reason I see that the planner would always disable codegen for 
non-merging exchanges if backend support existed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py
..


Patch Set 2: Code-Review+2

(1 comment)

Carry +2

http://gerrit.cloudera.org:8080/#/c/5124/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

Line 94: self.client.execute(create_fn_stmt)
> can get rid of try/catch since unique_database will take care of that
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py

2016-11-17 Thread Jim Apple (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5124

to look at the new patch set (#2).

Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py
..

IMPALA-4476: Use unique_database to stop races in test_udfs.py

These tests have been failing nondeterministically in larger machines
with 16 cores. This change should stop races in haddop fs -put and
drop/create function.

Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
---
M tests/query_test/test_udfs.py
1 file changed, 30 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/5124/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..


Patch Set 1:

(26 comments)

> (22 comments)
 > 
 > what's the plan for keeping this compiling?

I have a Jenkins job in progress to build in a variety of different ways (-so, 
-asan, -ninja, -releast). I will modify to add this. I intend eventually to add 
the job to a continually-monitored dashboard and perhaps to have it send email 
alerts on failure.

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

PS1, Line 95: synbol
> typo
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

PS1, Line 150: )
> add "!= 0" as we try to avoid implicit comparisons against zero conversion 
switched to std::fill


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 276:   *decimal >>= bytes_to_fill * 8;
> since len comes from the data file, we could also have bytes_to_fill < 0, n
Yes, that is also undefined, but it is precluded by ReadFieldLen being .ok, 
above.


Line 278:   *decimal = 0;
> can this happen with well-formed avro? if not, this be a return false case.
Here is a stack trace:
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20: runtime 
error: shift exponent 32 is too large for 32-bit type 'int32_t' (aka 'int')
#0 0x2af0805477df in impala::HdfsAvroScanner::ReadAvroDecimal(int, unsigned 
char**, unsigned char*, bool, void*, impala::MemPool*) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-ir.cc:276:20
#1 0x7ec68c in void 
impala::HdfsAvroScannerTest::TestReadAvroType(bool (impala::HdfsAvroScanner::*)(int, 
unsigned char**, unsigned char*, bool, void*, impala::MemPool*), unsigned long, 
unsigned char*, long, impala::DecimalValue, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:85:20
#2 0x7c8e9a in void 
impala::HdfsAvroScannerTest::TestReadAvroDecimal(unsigned char*, long, 
impala::DecimalValue, int, impala::TErrorCode::type) 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:187:5
#3 0x7bbb3f in impala::HdfsAvroScannerTest_DecimalTest_Test::TestBody() 
/home/jbapple/Impala/be/src/exec/hdfs-avro-scanner-test.cc:393:3


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1379: double percent = total_rows * 100 / std::max(1L, num_input_rows);
> if that were the case, we would have seen crashes.
1. "if that were the case, we would have seen crashes." - I think that because 
total_rows is a double, the denominator gets converted to a double, making this 
not a crash.

UBSAN is a dynamic analysis tool; when it triggers, the behavior actually 
occurred.

2. Done.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

PS1, Line 97: *
> nit: we put the * next to type.
Done.


Line 100:   memcpy(, , sizeof(integer));
> using memcpy doesn't seem necessary. can't we just write:
It has different semantics: for 8-bit ints and integer = -2, my expression has 
value 3, yours has value 253, and yours with the | replaced by ^ has value 251.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 157: if (LIKELY(dst->ptr && src.ptr)) memcpy(dst->ptr, src.ptr, 
src.len);
> Prefer explicit NULL checks.
memcpy is UB if either ptr is null, even if len is 0.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc
File be/src/exprs/bit-byte-functions-ir.cc:

PS1, Line 151: v * 
> why is this necessary?  and does it generate the same code (let's make sure
UBSAN caught a case where v << shift was not representable as a T. That's UB, 
but since this patch sets the -fwrapv, the multiply behavior is defined.

It generates the same code, but it if we don't do it this way and set fwrapv, 
the compiler can break the code and do bizarre things. Historically, this is a 
real risk with GCC.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 820:   if (!haystack) return NULL;
> okay but this check is still unnecessary - no loop iterations will be taken
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

PS1, Line 804: t
> nit: "t != nullptr" per our style.
Done


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 1684: 

[Impala-ASF-CR] Add a build flag for the undefined behavior sanitizer, aka "ubsan".

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#2).

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
..

Add a build flag for the undefined behavior sanitizer, aka "ubsan".

Ubsan checks for undefined behavior according to the C++
standard. Some of this behavior has been known to be exploited by
optimizing compilers to produce bizarre results, like taking both
branches of a conditional.

Although a number of cases of UB were not possible to fix, as they
were caused by libraries like Thrift or libstdc++, this patch fixes
the UB that was feasible to fix.

Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/exec/aggregation-node.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/delimited-text-parser-test.cc
M be/src/exec/hash-join-node-ir.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/read-write-util.cc
M be/src/exec/select-node.cc
M be/src/exec/unnest-node.cc
M be/src/exec/zigzag-test.cc
M be/src/experiments/tuple-splitter-test.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions-test.cc
M be/src/exprs/bit-byte-functions-ir.cc
M be/src/exprs/expr-context.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/string-functions-ir.cc
M be/src/rpc/authentication.cc
M be/src/rpc/thrift-util-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/parallel-executor-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/sorter.cc
M be/src/runtime/string-value.inline.h
M be/src/runtime/tuple.h
M be/src/thirdparty/mustache/mustache.cc
M be/src/udf/udf-test-harness.h
M be/src/udf/udf.cc
M be/src/util/bit-packing-test.cc
M be/src/util/bitmap.h
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M be/src/util/decompress-test.cc
M be/src/util/dict-encoding.h
M be/src/util/internal-queue-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/streaming-sampler.h
M be/src/util/tuple-row-compare.h
M be/src/util/uid-util.h
M be/src/util/webserver.cc
M bin/make_impala.sh
M bin/run-backend-tests.sh
M bin/start-impalad.sh
M buildall.sh
62 files changed, 218 insertions(+), 163 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/5082/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4432: Handle internal codegen disabling properly

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4432: Handle internal codegen disabling properly
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5105/3/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 273:   if (runtime_state_->CodegenDisabled() && 
!runtime_state_->HasScalarFnToCodegen()) return;
Long line


-- 
To view, visit http://gerrit.cloudera.org:8080/5105
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b6a9ed723c64ba21b861608583cc9b6607d3397
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3202: refactor scratch file management into TmpFileMgr

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3202: refactor scratch file management into TmpFileMgr
..


Patch Set 6: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0c56c195f3f28d520034f8c384494e566635fc62
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next
..


Patch Set 1: Code-Review+2 Verified+1

Merge from master. Confirmed that it compiled locally on my machine.

-- 
To view, visit http://gerrit.cloudera.org:8080/5126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: hadoop-next
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged.

Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next
..


Merge remote-tracking branch 'origin/master' into hadoop-next

Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5
---
D common/thrift/cli_service.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
2 files changed, 3 insertions(+), 1,166 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: hadoop-next
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR](hadoop-next) Merge remote-tracking branch 'origin/master' into hadoop-next

2016-11-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5126

Change subject: Merge remote-tracking branch 'origin/master' into hadoop-next
..

Merge remote-tracking branch 'origin/master' into hadoop-next

Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5
---
D common/thrift/cli_service.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
2 files changed, 3 insertions(+), 1,166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5126/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I493d811bb72949f315fd131ae7cbf886a3d0efd5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: hadoop-next
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py

2016-11-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5124/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

Line 94: try:
can get rid of try/catch since unique_database will take care of that


-- 
To view, visit http://gerrit.cloudera.org:8080/5124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.

2016-11-17 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5125

Change subject: IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.
..

IMPALA-3809: Show Kudu-specific column metadata in DESCRIBE.

TODO:
- Show column encoding and compression once available
- Corresponding changes to DESCRIBE EXTENDED/FORMATTED will
  follow in a separate patch.

Testing:
A private core/hdfs run passed.

Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e
---
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_stats.test
M tests/query_test/test_kudu.py
9 files changed, 180 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/5125/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5125
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I83c91b540bc6d27cb4f21535fe12f3f8658c233e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch(
> yes.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Lars Volker (Code Review)
Hello Alex Behm, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4863

to look at the new patch set (#15).

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..

IMPALA-2523: Make HdfsTableSink aware of clustered input

IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-writer.cc
M be/src/exec/hbase-table-writer.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M bin/impala-config.sh
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_insert.py
M tests/query_test/test_insert_behaviour.py
26 files changed, 424 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-17 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change.

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..


Patch Set 3:

> > Thanks!Do you think I should add some code to catalog to validate
 > the
 > > value of 'initial_hms_cnxn_timeout_s'? What would be the
 > acceptable
 > > range?
 > 
 > You could validate that it was > 0, but otherwise I don't think you
 > should put a bound on it. Users may want to wait effectively
 > forever, and I can see them setting 99 as a value to
 > simulate that.
 > 
 > > Adding this config parameter to the CM UI is another issue. Do
 > you think it should  be added for the 5.10 release or it is not
 > that urgent? I guess, the default value is reasonable for most
 > users and they can always set it as a safety valve if they have to.
 > 
 > Sounds like you're talking about a Cloudera release. Reminder that
 > this is the Apache Impala project - vendor specific concerns should
 > be discussed elsewhere.

Thanks for the review, I've added the > 0 check. Could you add a +2 once more?

-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-17 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#3).

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..

IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

This change introduces a new catalogd startup option
(init_first_metastore_client_timeout_seconds) that specifies the
time in seconds catalogd should spend on retrying to establish a
connection to HMS the first time on startup before giving up and
exiting fatally.

Setting this startup option to a value that is greater than the HMS
startup time will allow CM to start Impala at the same time or even
before HMS.

The default value of init_first_metastore_client_timeout_seconds is
120 seconds.

Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
A tests/experiments/test_catalog_hms_failures.py
10 files changed, 285 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5095/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

2016-11-17 Thread Attila Jeges (Code Review)
Hello Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5095

to look at the new patch set (#3).

Change subject: IMPALA-4278: Don't abort Catalog startup quickly if HMS is not 
present
..

IMPALA-4278: Don't abort Catalog startup quickly if HMS is not present

This change introduces a new catalogd startup option
(init_first_metastore_client_timeout_seconds) that specifies the
time in seconds catalogd should spend on retrying to establish a
connection to HMS the first time on startup before giving up and
exiting fatally.

Setting this startup option to a value that is greater than the HMS
startup time will allow CM to start Impala at the same time or even
before HMS.

The default value of init_first_metastore_client_timeout_seconds is
120 seconds.

Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java
A tests/experiments/test_catalog_hms_failures.py
10 files changed, 285 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/5095/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I546d8fe9836004832ae40110c9fe22b3e704e11b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch(
> This is part of the HdfsTableWriter virtual interface. Should I change it h
yes.


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 13:

(10 comments)

Thanks for the review. Please see PS14.

http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 66: tsink.table_sink.hdfs_table_sink.skip_header_line_count :
> if you break this up, then
This has been formatted by clang-format since I touched these lines. While I 
agree that the formatting could be nicer, I'd like to keep the automatic format 
here for simplicity.


Line 212: GetHashTblKey(NULL, dynamic_partition_key_value_ctxs, );
> we're standardizing on nullptr now
I changed all occurrences in lines I touched. Once the change has a +2 I will 
rebase the change and then replace all other occurrences in the file to keep it 
consistent (and keep the risk for merge conflicts low until then).


Line 266:   // The rows of this batch may span across multiple files. We 
repeatedly pass the row
> remove "across"
Done


Line 273: RETURN_IF_ERROR(output_partition->writer->AppendRowBatch(
> rename this function, it's not appending the entire row batch (AppendRows?)
This is part of the HdfsTableWriter virtual interface. Should I change it here 
and in all table writer classes?


Line 283: Status HdfsTableSink::WriteClusteredRowBatch(RuntimeState* state, 
RowBatch* batch) {
> to speed this up, you should:
Done. AppendRowBatch() already appends the whole batch if the row index vector 
is empty, so there was no change necessary.


http://gerrit.cloudera.org:8080/#/c/4863/13/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 170:   const HdfsPartitionDescriptor& partition_descriptor, const 
TupleRow* current_row,
> mention that current_row provides the partition key values.
Done


Line 181:   /// GetHashTblKey(). Maps to an OutputPartition, which are owned by 
the object pool and
> "pool,"
Done


Line 189:   void GetHashTblKey(const TupleRow* current_row, const 
std::vector& ctxs,
> current_row -> row (it doesn't matter to this function that the row is 'cur
Done


Line 213:   /// Appends all rows in batch to the temporary Hdfs files 
corresponding to partitions.
> what "corresponding to partitions" refers to is unclear.
Rephrased it.


Line 214:   /// The input must be ordered by the partition key expressions.
> the rows are expected to be ordered...?
Done. Though I think "must be" makes it more clear that it's a  prerequisite to 
be guaranteed by the caller.


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-17 Thread Lars Volker (Code Review)
Hello Alex Behm, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4863

to look at the new patch set (#14).

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..

IMPALA-2523: Make HdfsTableSink aware of clustered input

IMPALA-2521 introduced clustering for insert statements. This change
makes the HdfsTableSink aware of clustered inputs, so that partitions
are opened, written, and closed one by one.

This change also adds/modifies tests in several ways:

- clustered insert tests switch from selecting all rows from
  alltypessmall to alltypes. Together with varying settings for
  batch_size, this results in a larger number of row batches being
  written.
- clustered insert tests select from alltypes instead of
  functional.alltypes to make sure we also select from various input
  formats.
- clustered insert tests have been added to select from alltypestiny to
  create inserts with 1 and 2 rows per partition respectively.
- exhaustive insert tests now use different values for batch_size: 1,
  16, 0 (meaning default, 1024). This is limited to uncompressed parquet
  files, to maintain a reasonable runtime. On my machine execution of
  test.insert took 1778 seconds, compared to 1002 seconds with the just
  default row batch size.
- There is additional testing in test_insert_behaviour.py to make sure
  that insertion over several row batches only creates one file per
  partition.
- It renames the test_insert method to make it unique in the file and
  allow for effective filtering with -k.

Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M bin/impala-config.sh
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M tests/query_test/test_insert.py
M tests/query_test/test_insert_behaviour.py
14 files changed, 391 insertions(+), 98 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/4863/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-4476: Use unique database to stop races in test udfs.py

2016-11-17 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5124

Change subject: IMPALA-4476: Use unique_database to stop races in test_udfs.py
..

IMPALA-4476: Use unique_database to stop races in test_udfs.py

These tests have been failing nondeterministically in larger machines
with 16 cores. This change should stop races in haddop fs -put and
drop/create function.

Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
---
M tests/query_test/test_udfs.py
1 file changed, 25 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/5124/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5124
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I520a8b817ad7e32dba299c2535033f55f1bd1c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] Improve message output from run-step.sh

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: Improve message output from run-step.sh
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] Improve message output from run-step.sh

2016-11-17 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: Improve message output from run-step.sh
..


Improve message output from run-step.sh

run-step prints a message to tell the reader what it's doing. However,
that message wasn't flushed so that run-step could print OK or FAILED on
the same line. The result was that long-running steps wouldn't print
anything to the log until they were done, at least in Jenkins contexts.

This patch changes it so that the message is flushed, and then the
result is printed on a separate line (including the time it took to run
the step).

  $ run-step "Hello world!" helloworld.out sleep 5
  Hello world! (logging to /tmp/helloworld.out)...
  OK (Took: 0 min 5 sec)

Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Reviewed-on: http://gerrit.cloudera.org:8080/5116
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M testdata/bin/run-step.sh
1 file changed, 7 insertions(+), 4 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaced729f0ef6aa93174cd90b1516d3c34fe41a22
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins