[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-05-25 Thread Vincent Tran (Code Review)
Vincent Tran has uploaded a new patch set (#3).

Change subject: IMPALA-5316: Adds last_day() function
..

IMPALA-5316: Adds last_day() function

This change adds last_day() function.
The function takes exactly one DATE/TIMESTAMP argument
and returns a TIMESTAMP that is the last date of the
input date's calendar month.

Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
---
M be/src/exprs/expr-test.cc
M be/src/exprs/timestamp-functions-ir.cc
M be/src/exprs/timestamp-functions.h
M common/function-registry/impala_functions.py
4 files changed, 70 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/6991/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-05-25 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
> can you check the results of all of these tests are the same as a reference
MariaDB returns only the DATE portion of the timestamp. That would be neat to 
do here as well. But I don't see a way to do it unless we convert the output to 
StringVal.


Line 281: TestTimestampValue("last_day('2003-01-02')",
> can you add times to some of these?
Done


Line 282:   TimestampValue::Parse("2003-01-31 00:00:00", 19));
> for all of the below, use the Parse() overload that takes std::string, i.e.
Done


Line 316:   TimestampValue::Parse("2100-02-28 00:00:00", 19));
> I pointed out a few special cases to consider in the .cc file
Done


http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS2, Line 562: end_of_month()
> 1) I wonder if you can get a TimestampValue without a Date or with a "speci
Done


PS2, Line 562:   ptime pdate(timestamp.date().end_of_month());
 :   TimestampValue tsv(pdate);
> I think you can avoid converting to ptime and just use the TimestampValue c
Done


http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions.h
File be/src/exprs/timestamp-functions.h:

Line 198:   static TimestampVal LastDay(FunctionContext* context, const 
TimestampVal& ts);
> please add a comment. look to other fns in the file for examples.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/650/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 8: Code-Review+2

The problem with the last run was that testMtDopValidation() resets the 
RuntimeEnv.isTestEnv_ flag temporary causing the table loading to hit the 
preconditions check. I adjusted the check to not rely on that. Simple change, 
carrying +2. Ran a pvt build to make sure the tests pass.

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6970

to look at the new patch set (#8).

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..

IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd

We need not account these hdfs table metrics on the Catalog
server as they are eventually calculated again on the Impalads
while unpacking the corresponding thrift table.

This fix can potentially affect the frontend tests that directly load
the Catalog's version of HdfsTable without the loadFromThrift() call
paths that do the accounting. That is fixed by adding a separate call
that computes these values and is called from
ImpaladTestCatalog.getTable().

Testing: Enough tests already cover these code paths like show stats/
table or partition refresh tests etc. No new tests are added.

Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
2 files changed, 27 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 19: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/

-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/649/

-- 
To view, visit http://gerrit.cloudera.org:8080/6998
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6998
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Loads all TPC-DS tables

2017-05-25 Thread Michael Ho (Code Review)
Hello Mostafa Mokhtar,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6877

to look at the new patch set (#8).

Change subject: IMPALA-5376: Loads all TPC-DS tables
..

IMPALA-5376: Loads all TPC-DS tables

This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.

Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.

Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
---
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M testdata/bin/compute-table-stats.sh
M testdata/bin/generate-schema-statements.py
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M testdata/workloads/tpcds/queries/count.test
A testdata/workloads/tpcds/queries/tpcds-q1.test
M testdata/workloads/tpcds/queries/tpcds-q19.test
A testdata/workloads/tpcds/queries/tpcds-q2.test
A testdata/workloads/tpcds/queries/tpcds-q23-1.test
A testdata/workloads/tpcds/queries/tpcds-q23-2.test
M testdata/workloads/tpcds/queries/tpcds-q27.test
A testdata/workloads/tpcds/queries/tpcds-q27a.test
A testdata/workloads/tpcds/queries/tpcds-q28.test
M testdata/workloads/tpcds/queries/tpcds-q3.test
M testdata/workloads/tpcds/queries/tpcds-q34.test
A testdata/workloads/tpcds/queries/tpcds-q4.test
M testdata/workloads/tpcds/queries/tpcds-q42.test
M testdata/workloads/tpcds/queries/tpcds-q43.test
M testdata/workloads/tpcds/queries/tpcds-q46.test
M testdata/workloads/tpcds/queries/tpcds-q47.test
M testdata/workloads/tpcds/queries/tpcds-q52.test
M testdata/workloads/tpcds/queries/tpcds-q53.test
M testdata/workloads/tpcds/queries/tpcds-q55.test
M testdata/workloads/tpcds/queries/tpcds-q59.test
M testdata/workloads/tpcds/queries/tpcds-q6.test
M testdata/workloads/tpcds/queries/tpcds-q61.test
M testdata/workloads/tpcds/queries/tpcds-q63.test
M testdata/workloads/tpcds/queries/tpcds-q65.test
M testdata/workloads/tpcds/queries/tpcds-q68.test
M testdata/workloads/tpcds/queries/tpcds-q7.test
M testdata/workloads/tpcds/queries/tpcds-q73.test
M testdata/workloads/tpcds/queries/tpcds-q79.test
M testdata/workloads/tpcds/queries/tpcds-q88.test
M testdata/workloads/tpcds/queries/tpcds-q89.test
M testdata/workloads/tpcds/queries/tpcds-q96.test
M testdata/workloads/tpcds/queries/tpcds-q98.test
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_tpcds_queries.py
41 files changed, 10,581 insertions(+), 3,960 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4164: Use inline hints instead of always-inline
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6941/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 163:   target_cpu_attr_ = 
fn->getFnAttribute("target-cpu").getValueAsString();
> Will see if I can get a Jenkins machine without avx2. My suspicion is that 
The Jenkins VM apparently doesn't have avx2. I updated PhjBuilder to inline 
some cross-compiled bloom-filter functions which use AVX2 instructions and it 
failed during compilation as expected.

Hard-coding "-avx2" into "cpu_attrs_" passed to the execution engine didn't 
cause the same failure on a machine with avx2 instructions support.


-- 
To view, visit http://gerrit.cloudera.org:8080/6941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: Loads all TPC-DS tables
..


Patch Set 7:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6877/7//COMMIT_MSG
Commit Message:

PS7, Line 7: Loads all TPC-DS tables
> Jira?
There is only a downstream JIRA tracking this. I will file an upstream JIRA for 
it.


http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

PS7, Line 365:impala_home = 
os.getenv("IMPALA_HOME"),
 :hint = insert_hint)
> Please remove spaces around = as above.
Done


http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/datasets/tpcds/tpcds_schema_template.sql
File testdata/datasets/tpcds/tpcds_schema_template.sql:

PS7, Line 27: cc_rec_start_date string
: cc_rec_end_date   string
> Here and likely elsewhere, there are column type differences between the sp
The schema mostly matches what we use for nightly perf testing. I agree that we 
should probably change some of these to timestamp but that it may require 
changing some existing table such as date_dim so I didn't want to do it in this 
change.

Some other notable difference from the spec is that we use bigint for certain 
columns such as ss_ticket_number as it may overflow a 32-bit integer when the 
scale gets large enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-25 Thread Alex Behm (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6840

to look at the new patch set (#10).

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..

IMPALA-2373: Extrapolate row counts for HDFS tables.

The main idea of this patch is to use table stats to
extrapolate the row counts for new/modified partitions.

Existing behavior:
- Partitions that lack the row count stat are ignored
  when estimating the cardinality of HDFS scans. Such
  partitions effectively have an estimated row count
  of zero.
- We always use the row count stats for partitions that
  have one. The row count may be innaccurate if data in
  such partitions has changed significantly.

Summary of changes:
- Enhance COMPUTE STATS to also store the total number
  of file bytes in the table.
- Use the table-level row count and file bytes stats
  to estimate the number of rows in a scan.
- A new impalad startup flag is added to enable/disable
  the extrapolation behavior. The feature is disabled by
  default. Note that even with the feature disabled,
  COMPUTE STATS stores the file bytes so you can enable
  the feature without having to run COMPUTE STATS again.

Testing:
- Added new FE unit test
- Added new EE test

Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
---
M be/src/common/global-flags.cc
M be/src/exec/catalog-op-executor.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
A tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_explain.py
30 files changed, 817 insertions(+), 206 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables.
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6840/9/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 141: | stats-rows=7300 extrapolated-rows=disabled
> why disabled everywhere?
There are several remaining concerns around stats extrapolation (e.g. 
IMPALA-5360) and compute stats with sampling. I don't think it is reasonably 
possible to address the remaining concerns in this patch, so I propose to 
commit this foundational change with extrapolation disabled by default and 
continue working on discussing/addressing the remaining concerns which are 
mostly add-ons to this change.
We can still decide to enable extrapolation by default later on.


Line 268: |  tuple-ids=3,2 row-size=61B cardinality=7300
> ?
Test infra issue. The 61B is definitely correct. See IMPALA-5341.


http://gerrit.cloudera.org:8080/#/c/6840/9/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
File 
testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test:

Line 99: 'WARNING: The following tables are missing relevant table and/or 
column statistics.'
> i think we need to do something about this warning.
Nice catch. Agree. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has abandoned this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Abandoned

The updated patch somehow got a new change ID. The new patch is here:
https://gerrit.cloudera.org/#/c/6998/

-- 
To view, visit http://gerrit.cloudera.org:8080/6995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6998

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..

IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

Builds on CentOS 6.4 fail due to dependencies not met for the new
'cryptography' python package.

The ADLS commit states that the new packages are only required for ADLS
and that ADLS on a dev environment is only supported from CentOS 6.7.

This patch moves the compiled requirements for ADLS from
compiled-requirements.txt to adls-requirements.txt and passing a
compiler to the Pip environment while installing the ADLS
requirements.

Testing: Tested it on a machine that with TARGET_FILESYSTEM='adls'
and also tested it on a CentOS 6.4 machine with the default
configuration.

Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
---
M infra/python/bootstrap_virtualenv.py
M infra/python/deps/adls-requirements.txt
M infra/python/deps/compiled-requirements.txt
3 files changed, 13 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/6998/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6998
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Patch Set 1:

(3 comments)

Tim mentioned a good point. Instead of splitting the requirements into a 
seperate adls-compiled-requirements.txt, we can just move these requirements to 
adls-requirements.txt and set the compiler environment 'CC' while installing 
those. So, I've made that change instead as it keeps the patch much simpler.

http://gerrit.cloudera.org:8080/#/c/6995/1/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

PS1, Line 228: ret = install_compiled_deps_if_possible(COMPILED_REQS_PATH)
 :   if ret == False: return False
 :   if os.environ.get('TARGET_FILESYSTEM') == "adls":
 : ret = 
install_compiled_deps_if_possible(ADLS_COMPILED_REQS_PATH)
 : if ret == False: return False
> this might be more elegantly expressed as:
Not necessary anymore after the refactor.


Line 235: def install_adls_deps():
> why not add the compiled deps logic here?
Not necessary anymore after the refactor.


PS1, Line 236: 6
> elsewhere the min reqs for the compiled_deps is said to be 6.7. Can you mak
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#7).

Change subject: IMPALA-4164: Use inline hints instead of always-inline
..

IMPALA-4164: Use inline hints instead of always-inline

When generating IR functions during codegen, we used to
always tag the functions with the "AlwaysInline" attribute.
That potentially leads to excessive inlining, leading to
very long optimization / compilation time with marginal
performance benefit at runtime. One of the reasons for doing
it was that the "target-cpu" and "target-features" attributes
are missing in the generated IR functions so the LLVM inliner
considers them incompatible with the cross-compiled functions.
As a result, the inliner will not inline the generated IR
functions into cross-compiled functions unless we put the
"AlwaysInline" attributes with the generated IR functions.

This change fixes the problem above by setting the "target-cpu"
and "target-features" attributes of all IR functions to match
that of of the host's CPUs so both generated IR functions and
cross-compiled functions will have the same values for those
attributes. With these attributes set, we now switch to using
"InlineHint" in place of "AlwaysInline" and let LLVM make the
decision of whether it's sensible to inline a function. With
this change, the codegen time of a query with a long predicate
went from 15s to 4s and the overall runtime went from 19s to 8s.

Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test
3 files changed, 211 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: Loads all TPC-DS tables
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6877/7//COMMIT_MSG
Commit Message:

PS7, Line 7: Loads all TPC-DS tables
Jira?


PS7, Line 21: before. The results of E2E tests were compared against the run 
done with
: Netezza and Vertica.
Thanks for saying so!


http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

PS7, Line 365:impala_home = 
os.getenv("IMPALA_HOME"),
 :hint = insert_hint)
Please remove spaces around = as above.


http://gerrit.cloudera.org:8080/#/c/6877/7/testdata/datasets/tpcds/tpcds_schema_template.sql
File testdata/datasets/tpcds/tpcds_schema_template.sql:

PS7, Line 27: cc_rec_start_date string
: cc_rec_end_date   string
Here and likely elsewhere, there are column type differences between the spec 
and here, and here is one of those places.. Is there a general motivation for 
having different types here differ with those in the spec? For example, why not 
timestamp?


-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-25 Thread Greg Rahn (Code Review)
Greg Rahn has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 2: Code-Review+1

LGTM

-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4164: Use inline hints instead of always-inline

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#6).

Change subject: IMPALA-4164: Use inline hints instead of always-inline
..

IMPALA-4164: Use inline hints instead of always-inline

When generating IR functions during codegen, we used to
always tag the functions with the "AlwaysInline" attribute.
That potentially leads to excessive inlining, leading to
very long optimization / compilation time with marginal
performance benefit at runtime. One of the reasons for doing
it was that the "target-cpu" and "target-features" attributes
are missing in the generated IR functions so the LLVM inliner
considers them incompatible with the cross-compiled functions.
As a result, the inliner will not inline the generated IR
functions into cross-compiled functions unless we put the
"AlwaysInline" attributes with the generated IR functions.

This change fixes the problem above by setting the "target-cpu"
and "target-features" attributes of all IR functions to match
that of of the host's CPUs so both generated IR functions and
cross-compiled functions will have the same values for those
attributes. With these attributes set, we now switch to using
"InlineHint" in place of "AlwaysInline" and let LLVM make the
decision of whether it's sensible to inline a function. With
this change, the codegen time of a query with a long predicate
went from 15s to 4s and the overall runtime went from 19s to 8s.

Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
A testdata/workloads/targeted-perf/queries/primitive_long_predicate.test
3 files changed, 211 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6941/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6941
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d87ae8d222b415587e7320cb9072e4a8d6615ce
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 19:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/

-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Patch Set 1:

(3 comments)

Can you mention the testing you do (when it's complete) in the commit msg?

http://gerrit.cloudera.org:8080/#/c/6995/1/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

PS1, Line 228: ret = install_compiled_deps_if_possible(COMPILED_REQS_PATH)
 :   if ret == False: return False
 :   if os.environ.get('TARGET_FILESYSTEM') == "adls":
 : ret = 
install_compiled_deps_if_possible(ADLS_COMPILED_REQS_PATH)
 : if ret == False: return False
this might be more elegantly expressed as:

  REQS = [COMPILED_REQS_PATH]
  if os.environ.get("TARGET_FILESYSTEM") == "adls":
REQS += ADLS_COMPILED_REQS_PATH
  for req in REQS:
if not install_compiled_deps_if_possible(req): return False
  return True


Line 235: def install_adls_deps():
why not add the compiled deps logic here?


PS1, Line 236: 6
elsewhere the min reqs for the compiled_deps is said to be 6.7. Can you make 
these consistent?


-- 
To view, visit http://gerrit.cloudera.org:8080/6995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 19: Code-Review+2

Carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6478

to look at the new patch set (#19).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default for local files
2. Share the file handle between scan ranges from the same
file

Local scan ranges no longer maintain their own Hdfs file
handles. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs or the
scan range is remote, the scan range will get a
file handle from the cache and hold it until the
scan range is closed. This mimics the existing behavior,
except the file handle stays in the cache and is owned
by the cache. Since it is in use, it will not be evicted.

If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.

Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/metadata/test_refresh_partition.py
M tests/query_test/test_hdfs_fd_caching.py
13 files changed, 861 insertions(+), 418 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/19
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..


Patch Set 1:

> Uploaded patch set 1.

Testing: Ran it on a machine with TARGET_FILESYSTEM='adls' and confirmed it 
worked. Currently running it on a CentOS 6.4 machine. Will post here once it 
passes the stage where it breaks on the 'master' branch.

-- 
To view, visit http://gerrit.cloudera.org:8080/6995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6995

Change subject: IMPALA-5375: Builds on CentOS 6.4 failing with broken python 
dependencies
..

IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies

Builds on CentOS 6.4 fail due to dependencies not met for the new
'cryptography' python package.

The commit states that the new packages are only required for ADLS
and that ADLS on a dev environment is only supported from CentOS 6.7.

This patch only installs the ADLS requirements if the dev cluster is
configured to use ADLS due to the above limitation. This is done by
taking the ADLS compiled requirements and adding them to a new file
called 'adls-compiled-requirements.txt' and invkoing that only when
the TARGET_FILESYSTEM is 'adls'.

Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
---
M infra/python/bootstrap_virtualenv.py
A infra/python/deps/adls-compiled-requirements.txt
M infra/python/deps/compiled-requirements.txt
3 files changed, 48 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6995/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8aff00aeaab92c2d885a6e457ea351431483fa04
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 2: Code-Review+1

(1 comment)

I think there would be some benefit in having ReleaseResources() called in the 
same place for all queries, but agree that this solves the bug.

http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 918:   DCHECK(!query_status_.ok());
Add a TODO that this is weird, since CancelInternal() is called on the 
successful query path as well (per UnregisterQuery()).


-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4848: Add WIDTH BUCKET() function

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4848: Add WIDTH_BUCKET() function
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6023/4/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 476:   int256_t y = 
DecimalUtil::MultiplyByScale(ConvertToInt256(range_size.value()),
> Agree.
I think we'll need to be more stingy with the types. For example, the input 
args are converted to Decimal16 here, but I think if we used the smallest 
types, then we can avoid going to int256_t for Decimal4 and Decimal8.


-- 
To view, visit http://gerrit.cloudera.org:8080/6023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I081bc916b1bef7b929ca161a9aade3b54c6b858f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: anujphadke 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 2: Code-Review+1

(2 comments)

LGTM but please see if Henry has any more comments.

http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS2, Line 1094:   if (filter_mem_tracker_.get() != nullptr) {
  : filter_mem_tracker_->UnregisterFromParent();
  :   }
does that belong here? if not, maybe add TODO to make it clear in the meantime.


http://gerrit.cloudera.org:8080/#/c/6897/2/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

PS2, Line 450: schedule_->Release()
What's that? Maybe it's referring to AdmissionController::ReleaseQuery()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/647/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6478

to look at the new patch set (#18).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default for local files
2. Share the file handle between scan ranges from the same
file

Local scan ranges no longer maintain their own Hdfs file
handles. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs or the
scan range is remote, the scan range will get a
file handle from the cache and hold it until the
scan range is closed. This mimics the existing behavior,
except the file handle stays in the cache and is owned
by the cache. Since it is in use, it will not be evicted.

If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.

Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/metadata/test_refresh_partition.py
M tests/query_test/test_hdfs_fd_caching.py
13 files changed, 861 insertions(+), 418 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/18
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 288: if (use_file_handle_cache && expected_local_) return Status::OK();
> you need to document that behavior in the scanrange declaration
Updated ScanRange::Open documentation and the comment for exclusive_hdfs_fh_.


http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 453: /// file handle from the file handle cache to hold for its 
exclusive use.
> also document the behavior for non-local scan ranges
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 2:

> Will wait for final patch before signing off.
 > 
 > What was the reason behind moving ReleaseResources() out of Done()?
 > IIUC, the race is fixed by not releasing the QS reference until the
 > coordinator is destroyed. Leaving ReleaseResources() in Done()
 > seems natural, so I wondered what the thinking was.

The thinking was that the coordinator already knows when it can release 
resources, and it seemed like a nice simplification of the public interface.

-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 453: /// file handle from the file handle cache to hold for its 
exclusive use.
also document the behavior for non-local scan ranges


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 17: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 288: if (use_file_handle_cache && expected_local_) return Status::OK();
you need to document that behavior in the scanrange declaration


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 2:

Will wait for final patch before signing off.

What was the reason behind moving ReleaseResources() out of Done()? IIUC, the 
race is fixed by not releasing the QS reference until the coordinator is 
destroyed. Leaving ReleaseResources() in Done() seems natural, so I wondered 
what the thinking was.

-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 16:

This upload disables the caching for remote files. This also rebases to the 
latest (which involved minor merges on the tests due to the ADLS change).

-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#17).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default for local files
2. Share the file handle between scan ranges from the same
file

Local scan ranges no longer maintain their own Hdfs file
handles. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs or the
scan range is remote, the scan range will get a
file handle from the cache and hold it until the
scan range is closed. This mimics the existing behavior,
except the file handle stays in the cache and is owned
by the cache. Since it is in use, it will not be evicted.

If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.

Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/metadata/test_refresh_partition.py
M tests/query_test/test_hdfs_fd_caching.py
13 files changed, 857 insertions(+), 418 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


IMPALA-5354: INSERT hints for Kudu tables

A previous change, IMPALA-3742, added an exchange node and
sort node to plans for inserts into Kudu tables to partition
and sort the input to match the target table.

This patch enables INSERT hints for Kudu tables - 'noshuffle'
which removes the exchange node from the plan and
'noclustered' which removes the sort node.

Insert hints have no effect for inserts that are small enough
to result in a single node execution.

Testing:
- Updated FE planner and analysis tests.
- Ran Kudu EE tests.

Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Reviewed-on: http://gerrit.cloudera.org:8080/6980
Reviewed-by: Matthew Jacobs 
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
8 files changed, 105 insertions(+), 34 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, but someone else must approve
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..


Patch Set 4:

Posting for discussion. The approach used is quite simple.

The main risk of this approach is if the cardinality estimate is too low, we 
may reserve less memory for the join/agg than needed. If there is memory 
pressure overall in the query, this could potentially force a join/agg to spill 
and do small I/Os, which could be slow.

A minimum spillable buffer size query option will provide a safety valve if the 
planner sets the buffer size too low.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bumped Kudu version to 795c435
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: Bumped Kudu version to 795c435
..


Bumped Kudu version to 795c435

This is needed for IMPALA-5167.

Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Reviewed-on: http://gerrit.cloudera.org:8080/6975
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-5160: adjust spill buffer size based on planner estimates

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
..

IMPALA-5160: adjust spill buffer size based on planner estimates

Scale down the buffer size in hash joins and hash aggregations if
estimates indicate that the build side of the join is small.
This greatly reduces minimum memory requirements for joins in some
common cases, e.g. small dimension tables.

Currently this is not plumbed through to the backend and only takes
effect in planner tests.

Testing:
Added targeted planner tests for small/mid/large/unknown memory
requirements for aggregations and joins.

Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
A fe/src/main/java/org/apache/impala/util/BitUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
8 files changed, 919 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6963/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6993/2/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

PS2, Line 259: an
a


-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..

IMPALA-5158: report buffer pool free memory in MemTracker

Clean pages and free buffers appear as untracked memory in the
MemTracker hierarchy. This was misleading since the memory is tracked
and present in the BufferPool. This change adds two MemTrackers below
the process level that accounts for this memory.

Updating global counters would be very inefficient and negate most of
the effort put into making the buffer allocator scalable. Instead the
values of the metrics are computed on demand by summing values across
all of the arena in BufferAlloctor.

The numbers reported are approximate because we do not lock any of the
BufferAllocator state and therefore don't get a consistent view of the
entire BufferAllocator at any moment in time. However they are accurate
enough to understand the general state of the system.

Also switches over ASAN to use a metric, similar to the regular TCMalloc
build so that the behaviour under ASAN diverges less.

Testing:
Add some checks to unit tests to sanity-check that the numbers computed
are valid.

Manually tested by rebasing my buffer pool dev branch onto this change
and running some spilling queries. The /memz page reported:

  Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB
Buffer Pool: Free Buffers: Total=391.50 MB
Buffer Pool: Clean Pages: Total=112.00 MB
Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB
  Query(b9421063d13af70b:ddb99739): Reservation=0 
ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB
  << snip >>
  Untracked Memory: Total=127.45 MB

Manually tested the ASAN change by building under ASAN, running some
queries, and inspecting the memz/ page. It reported a value of 100-200MB
untracked memory, similar to the non-ASAN build.

Change-Id: I007eb258377b33fff9f3246580d80fa551837078
---
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/scheduling/admission-controller.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/metrics.json
14 files changed, 357 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..

IMPALA-5158: report buffer pool free memory in MemTracker

Clean pages and free buffers appear as untracked memory in the
MemTracker hierarchy. This was misleading since the memory is tracked
and present in the BufferPool. This change adds two MemTrackers below
the process level that accounts for this memory.

Updating global counters would be very inefficient and negate most of
the effort put into making the buffer allocator scalable. Instead the
values of the metrics are computed on demand by summing values across
all of the arena in BufferAlloctor.

The numbers reported are approximate because we do not lock any of the
BufferAllocator state and therefore don't get a consistent view of the
entire BufferAllocator at any moment in time. However they are accurate
enough to understand the general state of the system.

Also switches over ASAN to use a metric, similar to the regular TCMalloc
build so that the behaviour under ASAN diverges less.

Testing:
Add some checks to unit tests to sanity-check that the numbers computed
are valid.

Manually tested by rebasing my buffer pool dev branch onto this change
and running some spilling queries. The /memz page reported:

  Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB
Buffer pool: free buffers: Total=391.50 MB
Buffer pool: clean pages: Total=112.00 MB
Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB
  Query(b9421063d13af70b:ddb99739): Reservation=0 
ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB
  << snip >>
  Untracked Memory: Total=127.45 MB

Manually tested the ASAN change by building under ASAN, running some
queries, and inspecting the memz/ page. It reported a value of 100-200MB
untracked memory, similar to the non-ASAN build.

Change-Id: I007eb258377b33fff9f3246580d80fa551837078
---
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/scheduling/admission-controller.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/metrics.json
14 files changed, 357 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5158: report buffer pool free memory in MemTracker

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6993

Change subject: IMPALA-5158: report buffer pool free memory in MemTracker
..

IMPALA-5158: report buffer pool free memory in MemTracker

Clean pages and free buffers appear as untracked memory in the
MemTracker hierarchy. This was misleading since the memory is tracked
and present in the BufferPool. This change adds two MemTrackers below
the process level that accounts for this memory.

Updating global counters would be very inefficient and negate most of
the effort put into making the buffer allocator scalable. Instead the
values of the metrics are computed on demand by summing values across
all of the arena in BufferAlloctor.

The numbers reported are approximate because we do not lock any of the
BufferAllocator state and therefore don't get a consistent view of the
entire BufferAllocator at any moment in time. However they are accurate
enough to understand the general state of the system.

Also switches over ASAN to use a metric, similar to the regular TCMalloc
build so that the behaviour under ASAN diverges less.

Testing:
Add some checks to unit tests to sanity-check that the numbers computed
are valid.

Manually tested by rebasing my buffer pool dev branch onto this change
and running some spilling queries. The /memz page reported:

  Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB
Buffer pool: free buffers: Total=391.50 MB
Buffer pool: clean pages: Total=112.00 MB
Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB
RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB
  Query(b9421063d13af70b:ddb99739): Reservation=0 
ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB
  << snip >>
  Untracked Memory: Total=127.45 MB

Manually tested the ASAN change by building under ASAN, running some
queries, and inspecting the memz/ page. It reported a value of 100-200MB
untracked memory, similar to the non-ASAN build.

Change-Id: I007eb258377b33fff9f3246580d80fa551837078
---
M be/src/runtime/bufferpool/buffer-allocator-test.cc
M be/src/runtime/bufferpool/buffer-allocator.cc
M be/src/runtime/bufferpool/buffer-allocator.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/scheduling/admission-controller.cc
M be/src/util/memory-metrics.cc
M be/src/util/memory-metrics.h
M common/thrift/metrics.json
14 files changed, 357 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6993/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I007eb258377b33fff9f3246580d80fa551837078
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5369: Remove old pom parent in testdata module
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5369: Remove old pom parent in testdata module
..


IMPALA-5369: Remove old pom parent in testdata module

Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Reviewed-on: http://gerrit.cloudera.org:8080/6992
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M testdata/pom.xml
1 file changed, 0 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/647/

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 7: Code-Review+2

(2 comments)

Carrying +2.

http://gerrit.cloudera.org:8080/#/c/6970/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 241:   public void computeHdfsStatsForTesting() {
> computeFileStatsForTesting()
Done


Line 246:   numHdfsFiles_ += partition.getNumFileDescriptors();
> cast not needed
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6970

to look at the new patch set (#7).

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..

IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd

We need not account these hdfs table metrics on the Catalog
server as they are eventually calculated again on the Impalads
while unpacking the corresponding thrift table.

This fix can potentially affect the frontend tests that directly load
the Catalog's version of HdfsTable without the loadFromThrift() call
paths that do the accounting. That is fixed by adding a separate call
that computes these values and is called from
ImpaladTestCatalog.getTable().

Testing: Enough tests already cover these code paths like show stats/
table or partition refresh tests etc. No new tests are added.

Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
2 files changed, 28 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6970/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 241:   public void updateHdfsStatsForTesting() {
computeFileStatsForTesting()


Line 246:   numHdfsFiles_ += (long) partition.getNumFileDescriptors();
cast not needed


-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6970

to look at the new patch set (#6).

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..

IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd

We need not account these hdfs table metrics on the Catalog
server as they are eventually calculated again on the Impalads
while unpacking the corresponding thrift table.

This fix can potentially affect the frontend tests that directly load
the Catalog's version of HdfsTable without the loadFromThrift() call
paths that do the accounting. That is fixed by adding a separate call
that computes these values and is called from
ImpaladTestCatalog.getTable().

Testing: Enough tests already cover these code paths like show stats/
table or partition refresh tests etc. No new tests are added.

Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
2 files changed, 28 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/6970/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins


[Impala-ASF-CR] IMPALA-5016: Simplify coalesce() in SimplifyConditionalsRule and do static partition pruning with coalesce() function.

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5016: Simplify coalesce() in SimplifyConditionalsRule 
and do static partition pruning with coalesce() function.
..


Patch Set 1:

(8 comments)

High-level comments. I'll take a closer look when we converge on the correct 
high-level approach first.

http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 88:   private final ExprRewriter partitionMetadataRewriter;
I don't think any changes should be made to HdfsPartitionPruner for this patch. 
See my other comment on Expr.optimizeConjuncts()


http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/CoalescePartitionMetadataRule.java
File 
fe/src/main/java/org/apache/impala/rewrite/CoalescePartitionMetadataRule.java:

Line 35:  * Rewrites coalesce function using partition metadata, it can be 
applied before partition pruning.
Let's please integrate this into the SimplifyConditionalsRule. I don't see this 
rule as being any different, and there is code/work duplication among this rule 
and your changes in SimplifyConditionalsRule.

This rule is just another case of simplification.

This rule should be generally applicable to any expr, not just those used for 
partition pruning. However, this rewrite rule will cause some queries to break 
when applied to exprs not used for partition pruning, e.g.:

select coalesce(1, count(*) from t

might be incorrectly rewritten to:

select 1 from t

The existing SimplifyConditionalsRule already handles these cases correctly 
(and this new rewrite will also just work when integrated into 
SimplifyConditionalsRule).


Line 44:   public CoalescePartitionMetadataRule(HdfsTable table, List 
partitionSlots) {
These do not need to be passed. You can get this information  from a SlotRef 
expr by looking at its SlotDescriptor. The SlotDescriptor's parent is a 
TupleDescriptor which has a reference to the Table. That should help with 
integrating this code into SimplifyConditionalsRule.


Line 67:   private boolean canRewriteWithExpr(Expr child) {
This check does not look quite right because it will return 'true' for 
something like 'year + NULL'. The fact that the expr is bound by partition 
slots and that those partition slots cannot return NULL, does not mean that the 
overall expr must be non-NULL. The overall expr can still be NULL. You can 
imagine other Exprs that may return NULL for a non-NULL input.

I think it would be better to keep this simple for now, and only rewrite if the 
child is a (possibly cast) SlotRef.


http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/PartitionMetadataRule.java
File fe/src/main/java/org/apache/impala/rewrite/PartitionMetadataRule.java:

Line 29:  * Base class for all Expr rewrite rules that need using partition 
metadata.
I don't think this is needed. We apply all rewrite rules before doing partition 
pruning for scans already.

See Expr.optimizeConjuncts() which gets called in 
SingleNodePlanner.createScanNode().


http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

Line 80:   private Expr simplifyIfFunctionExpr(FunctionCallExpr expr) {
simplifyIfFunction()


Line 101:   private Expr simplifyCoalesceFunctionExpr(FunctionCallExpr expr) {
simplifyCoalesceFunction()


http://gerrit.cloudera.org:8080/#/c/6990/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 339: RewritesOk("coalesce(null, a, b)", rule, "coalesce(a, b)");
Needs tests for the partition SlotRef case.


-- 
To view, visit http://gerrit.cloudera.org:8080/6990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0b57267ce68ef882d73120b5054603b72867c7f5
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: yu feng 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has submitted this change and it was merged.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


IMPALA-5333: Add support for Impala to work with ADLS

This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Added another dependency to bootstrap_build.sh for the ADLS Python
client.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil 
---
M bin/bootstrap_build.sh
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M infra/python/bootstrap_virtualenv.py
A infra/python/deps/adls-requirements.txt
M infra/python/deps/compiled-requirements.txt
M infra/python/deps/pip_download.py
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_hdfs_fd_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_scanners.py
M tests/stress/test_ddl_stress.py
A tests/util/adls_util.py
M tests/util/filesystem_utils.py
43 files changed, 359 insertions(+), 49 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Sailesh Mukil: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: Loads all TPC-DS tables
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6877/6/testdata/workloads/tpcds/queries/tpcds-q34.test
File testdata/workloads/tpcds/queries/tpcds-q34.test:

Line 34:'Williamson County','Williamson 
County','Williamson County','Williamson County')
> What's the deal with the repeated value?
They are the suggested substitution parameters for the official qualification 
run defined in the TPC-DS specification.


-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Loads all TPC-DS tables
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6877/6/testdata/workloads/tpcds/queries/tpcds-q34.test
File testdata/workloads/tpcds/queries/tpcds-q34.test:

Line 34:'Williamson County','Williamson 
County','Williamson County','Williamson County')
What's the deal with the repeated value?


-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Michael Ho (Code Review)
Hello Mostafa Mokhtar,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6877

to look at the new patch set (#7).

Change subject: Loads all TPC-DS tables
..

Loads all TPC-DS tables

This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.

Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.

Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
---
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M testdata/bin/compute-table-stats.sh
M testdata/bin/generate-schema-statements.py
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M testdata/workloads/tpcds/queries/count.test
A testdata/workloads/tpcds/queries/tpcds-q1.test
M testdata/workloads/tpcds/queries/tpcds-q19.test
A testdata/workloads/tpcds/queries/tpcds-q2.test
A testdata/workloads/tpcds/queries/tpcds-q23-1.test
A testdata/workloads/tpcds/queries/tpcds-q23-2.test
M testdata/workloads/tpcds/queries/tpcds-q27.test
A testdata/workloads/tpcds/queries/tpcds-q27a.test
A testdata/workloads/tpcds/queries/tpcds-q28.test
M testdata/workloads/tpcds/queries/tpcds-q3.test
M testdata/workloads/tpcds/queries/tpcds-q34.test
A testdata/workloads/tpcds/queries/tpcds-q4.test
M testdata/workloads/tpcds/queries/tpcds-q42.test
M testdata/workloads/tpcds/queries/tpcds-q43.test
M testdata/workloads/tpcds/queries/tpcds-q46.test
M testdata/workloads/tpcds/queries/tpcds-q47.test
M testdata/workloads/tpcds/queries/tpcds-q52.test
M testdata/workloads/tpcds/queries/tpcds-q53.test
M testdata/workloads/tpcds/queries/tpcds-q55.test
M testdata/workloads/tpcds/queries/tpcds-q59.test
M testdata/workloads/tpcds/queries/tpcds-q6.test
M testdata/workloads/tpcds/queries/tpcds-q61.test
M testdata/workloads/tpcds/queries/tpcds-q63.test
M testdata/workloads/tpcds/queries/tpcds-q65.test
M testdata/workloads/tpcds/queries/tpcds-q68.test
M testdata/workloads/tpcds/queries/tpcds-q7.test
M testdata/workloads/tpcds/queries/tpcds-q73.test
M testdata/workloads/tpcds/queries/tpcds-q79.test
M testdata/workloads/tpcds/queries/tpcds-q88.test
M testdata/workloads/tpcds/queries/tpcds-q89.test
M testdata/workloads/tpcds/queries/tpcds-q96.test
M testdata/workloads/tpcds/queries/tpcds-q98.test
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_tpcds_queries.py
41 files changed, 10,477 insertions(+), 3,827 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5232: Parquet reader error message prints memory address 
instead of value
..


IMPALA-5232: Parquet reader error message prints memory address instead of value

Changed the argument of the message to print the size instead of
the memory location.

Testing: Automating the test would be non-trivial as it would require 
generating a corrupt parquet file to trigger the error condition. Also, 
the test would only have to check a change in the error message and not 
the control flow. Manually tested the error message by modifying the 
value of num_bytes and performing a select on a table in parquet format.
Tested both condition with the following values:
- num_bytes = -1
- num_bytes = *data_size + 1

Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Reviewed-on: http://gerrit.cloudera.org:8080/6982
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Sailesh Mukil: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Add several items to "known issues" page

2017-05-25 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: [DOCS] Add several items to "known issues" page
..


Patch Set 4:

> I added IMPALA-3558 as suggested by Sailesh. But I couldn't find
 > Sailesh's comment in the gerrit UI to mark it as 'done'.

You couldn't mark at as Done because it wasn't an inline comment but instead 
top-level. Best you can do is reply like you've done.

-- 
To view, visit http://gerrit.cloudera.org:8080/5809
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Add several items to "known issues" page

2017-05-25 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Add several items to "known issues" page
..


Patch Set 4:

I added IMPALA-3558 as suggested by Sailesh. But I couldn't find Sailesh's 
comment in the gerrit UI to mark it as 'done'.

-- 
To view, visit http://gerrit.cloudera.org:8080/5809
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Add several items to "known issues" page

2017-05-25 Thread John Russell (Code Review)
Hello Michael Brown, Michael Ho, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5809

to look at the new patch set (#4).

Change subject: [DOCS] Add several items to "known issues" page
..

[DOCS] Add several items to "known issues" page

IMPALA-4432 is actually an incompatible change, so it gets
listed on the separate "Incompatible Changes" page.
Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf
---
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_known_issues.xml
2 files changed, 106 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5809/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5809
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibeed92e8f6603b858622b1d397d1f5cad810f4cf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 283: if (exclusive_hdfs_fh_ != nullptr) return Status::OK();
> when will this be true?
The IO thread is calling ReadRange on each ScanRange. ReadRange calls Open 
whether or not that ScanRange was already open. This check short-circuits in 
the case that the ScanRange was already open.


Line 351:   VLOG_FILE << "Cache HDFS file handle file=" << file();
> this is not a meaningful log message, let's remove it
Removed.


Line 420: HdfsFileHandle* hdfs_file_from_cache = nullptr;
> the name is somewhat misleading, because exclusive_hdfs_fh_ is still from t
Changed to "borrowed_hdfs_fh"


Line 484: // - or if the file handle is not from the cache
> but exclusive_hdfs_fh_ is also from the cache, why wouldn't we retry in tha
Changed to do retry for exclusive file handles.


http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 1271:   bool cache_hit;
> rename to dummy or something like that
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE

2017-05-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE 
and ALTER TABLE
..


Patch Set 1:

(5 comments)

Thank you for documenting this. Please see my inline comments. Let me know if 
you'd like to discuss them in person.

http://gerrit.cloudera.org:8080/#/c/6981/1/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

Line 388:   CREATE TABLE AS SELECT operation. Creating 
data files that are
I think it's important to understand that the source table property does not 
affect the target table. CREATE TABLE TARGET AS SELECT * FROM SOURCE; with 
SOURCE having SORT BY() columns will not copy them over to TARGET.


Line 389:   sorted is most useful for Parquet tables, where the metadata 
includes the minimum and
Here it should be clear that the information is stored in the file metadata 
inside each Parquet file, and not in our own metadata store.


Line 390:   maximum values for each column in each data file. Grouping data 
values together
Technically, statistics are stored per RowGroup. Impala only writes 1 rowgroup 
per file, but that's a self imposed limitation.


Line 400:   evident with Parquet tables.
We could mention here, that other file formats don't have statistics inside the 
file metadata.


Line 412:   tools that creat HDFS files, Impala does not guarantee or rely 
on the data being
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/6981
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icd571cd8840368edb327d16d27192458838ef234
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alan Choi 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#16).

Change subject: IMPALA-4623: Enable file handle cache
..

IMPALA-4623: Enable file handle cache

Currently, every scan range maintains a file handle, even
when multiple scan ranges are accessing the same file.
Opening the file handles causes load on the NameNode, which
can lead to scaling issues.

There are two parts to this transaction:
1. Enable file handle caching by default
2. Share the file handle between scan ranges from the same
file

The scan range no longer maintains its own Hdfs file
handle. On each read, the io thread will get the Hdfs file
handle from the cache (opening it if necessary) and use
that for the read. This allows multiple scan ranges on the
same file to use the same file handle. Since the file
offsets are no longer consistent for an individual scan
range, all Hdfs reads need to either use hdfsPread or do
a seek before reading. Additionally, since Hdfs read
statistics are maintained on the file handle, the read
statistics must be retrieved and cleared after each read.

To manage contention, the file handle cache is now
partitioned by a hash of the key into independent
caches with independent locks. The allowed capacity
of the file handle cache is split evenly among the
partitions. File handles are evicted independently
for each partition. The file handle cache maintains
ownership of the file handles at all times, but it
will not evict a file handle that is in use.

If max_cached_file_handles is set to 0 or the the
scan range is accessing data cached by Hdfs, the
scan range will get a file handle from the cache
and hold it until the scan range is closed. This
mimics the existing behavior, except the file
handle stays in the cache and is owned by the
cache. Since it is in use, it will not be evicted.

If a file handle in the cache becomes invalid,
it may result in Read() calls failing. Consequently,
if Read() encounters an error using a file handle
from the cache, it will destroy the handle and
retry once with a new file handle. Any subsequent
error is unrelated to the file handle cache and
will be returned.

Tests:
query_test/test_hdfs_fd_caching.py copies the files from
an existing table into a new directory and uses that to
create an external table. It queries the external table,
then uses the hdfs commandline to manipulate the hdfs file
(delete, move, etc). It queries again to make sure we
don't crash. Then, it runs "invalidate metadata". It
checks the row counts before the modification and after
"invalidate metadata", but it does not check the results
in between.
custom_cluster/test_hdfs_fd_caching.py starts up a cluster
with a small file handle cache size. It verifies that a
file handle can be reused (i.e. rerunning a query does
not result in more file handles cached). It also verifies
that the cache capacity is enforced.

Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/runtime/buffered-block-mgr.h
A be/src/runtime/disk-io-mgr-handle-cache.h
A be/src/runtime/disk-io-mgr-handle-cache.inline.h
M be/src/runtime/disk-io-mgr-internal.h
M be/src/runtime/disk-io-mgr-reader-context.cc
M be/src/runtime/disk-io-mgr-scan-range.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/disk-io-mgr.h
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/metadata/test_refresh_partition.py
M tests/query_test/test_hdfs_fd_caching.py
13 files changed, 849 insertions(+), 418 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-25 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..

IMPALA-5326: [DOCS] Document REPLACE() function

Included syntax, mention of performance benefit, and examples.
Also added an item in the "new features" list with a link to the
detailed writeup.

Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
---
M docs/topics/impala_new_features.xml
M docs/topics/impala_string_functions.xml
2 files changed, 67 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/6979/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function

2017-05-25 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5326: [DOCS] Document REPLACE() function
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

PS1, Line 70: A new function, replace(), which is faster than
: regexp_replace() for simple string 
substitutions.
: See  Proposal:
Done, with minor wordsmithing. (Editorial conventions to avoid passive voice & 
future tense.)


http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml
File docs/topics/impala_string_functions.xml:

PS1, Line 890: -
> Maybe best to alias the column/expression or use a different example.  This
Done


PS1, Line 890: -
> How about replace('hello world', '0', 'z') and just illustrate the original
Done


PS1, Line 890: -
> The lowercase folding is just something for the column heading/alias.
Done


PS1, Line 890: -
> This looks like a bug. It might be worth calling out the oddness here that 
I'll call out case-sensitivity in text rather than an example, as Zach 
suggested.


-- 
To view, visit http://gerrit.cloudera.org:8080/6979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd

2017-05-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on 
Catalogd
..


Patch Set 5: Code-Review-1

I see how this can break the tableSample patch tests.  We do the following in 
HdfsTable#getFileSample()

public Map> getFilesSample(List 
inputParts,
  long percentBytes, long randomSeed) {
Preconditions.checkState(percentBytes >= 0 && percentBytes <= 100);

// Conservative max size for Java arrays. The actual maximum varies
// from JVM version and sometimes between configurations.
final long JVM_MAX_ARRAY_SIZE = Integer.MAX_VALUE - 10;
if (numHdfsFiles_ > JVM_MAX_ARRAY_SIZE) {
  throw new IllegalStateException(String.format(
  "Too many files to generate a table sample. " +
  "Table '%s' has %s files, but a maximum of %s files are supported.",
  getTableName().toString(), numHdfsFiles_, JVM_MAX_ARRAY_SIZE));
}
int totalNumFiles = (int) numHdfsFiles_;  <-

The last line breaks only in tests because ImpaladTestCatalog#getTable() 
directly returns Catalog's version of HdfsTable during tests and doesn't 
rebuild it from Thrift. So basically the above line throws an exception.

Alex, should we change the approach and fix the accounting instead?

-- 
To view, visit http://gerrit.cloudera.org:8080/6970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries

2017-05-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
..


Patch Set 9:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

PS9, Line 567: skippable_map
skippable_map != nullptr


PS9, Line 586: (*skippable_map)[i] == true
(*skippable_map)[i]


PS9, Line 586: 0 
huh ?


PS9, Line 590: bool
skip_val


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

PS9, Line 153: Again
No need for "Again".


PS9, Line 159:   /// This function is capable of doing codegen for either of 
the above functions; simply
 :   /// pass a null value instead of a bool array pointer.
If 'skippable_map' is not NULL, this will generate the equivalent of 
EvalConjunctsPreEval(). Otherwise, this generates EvalConjuncts(). Also 
describe the content of skippable_map.

Alternately, we can keep this function private and expose two interfaces 
CodegenEvalConjuncts() and CodegenEvalConjunctsPreEval(). The former always 
passes nullptr for 'skippable_map' argument.


PS9, Line 163: skippable_map
Will 'skippable_conjuncts' be a more appropriate name ?


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

PS9, Line 88: LOG(INFO) << "tuple idx: " << tuple_index;
debugging output ?


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS9, Line 1050: std::map>&
DictionaryFilterPositionMap


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

PS9, Line 167: typedef std::map> 
DictFilterConjunctsPositionMap;
Comment what this map stands for ?

Slot Id -> index of the dictionary filter conjuncts in conjunct_ctxs_ ?


PS9, Line 366: Mapping to original position in conjuncts
Mapping from SlotId to ...


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS9, Line 206: IS_FILTERED
Comment what IS_FILTERED stands for.


PS9, Line 240: (filters == nullptr) ^ IS_FILTERED
Is there a reason to not use DCHECK((filters == nullptr) != IS_FILTERED) ?


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS9, Line 390: return false;
Is this hard coded for testing ? If not, please document why it's always false.


http://gerrit.cloudera.org:8080/#/c/6726/9/be/src/util/bitmap.h
File be/src/util/bitmap.h:

PS9, Line 86: (mask & buffer_[word_index])
(mask & buffer[word_index]) == 0


PS9, Line 99: buffer_.size()
num_words_


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5364: Correct title of query locations table

2017-05-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5364: Correct title of query locations table
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6988/1//COMMIT_MSG
Commit Message:

Line 12: We should have per-backend reporting on the number of fragment 
instances
It would be nice to have a JIRA for this and mention it here.


http://gerrit.cloudera.org:8080/#/c/6988/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS1, Line 858: host
nit: Mention this change in the commit message too.


-- 
To view, visit http://gerrit.cloudera.org:8080/6988
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2fddd41500ada10eb1c12d92afcb96a82009857b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5016 rewrite coalesce() in SimplifyConditionalsRule.java, and try to do static partition pruning with coalesce() function.

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5016 rewrite coalesce() in 
SimplifyConditionalsRule.java, and try to do static partition pruning with 
coalesce() function.
..


Patch Set 1:

No worries, the first patch is always the hardest :). Thanks for your fast 
response and continued work.

-- 
To view, visit http://gerrit.cloudera.org:8080/6974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I00aac2497ba51432037a6f9312805a63933ace87
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: yu feng 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[native-toolchain-CR] Ported native-toolchain to work on ppc64le

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Ported native-toolchain to work on ppc64le
..


Patch Set 5:

> (1 comment)

Thanks Jim, I do realize that so like I said, I wouldn't hold this up if it 
doesn't match up. That said, if it does match then we know it's not affected.

-- 
To view, visit http://gerrit.cloudera.org:8080/6468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a
Gerrit-PatchSet: 5
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Valencia Edna Serrao 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Valencia Edna Serrao 
Gerrit-HasComments: No


[native-toolchain-CR] Ported native-toolchain to work on ppc64le

2017-05-25 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Ported native-toolchain to work on ppc64le
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6468/5/source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch
File 
source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch:

Line 1: Only in breakpad-88e-p3/: my_breakpad
> I was hoping we could compare the sha1sum of the binary before/after the ch
SOme projects don't have reproducible builds:

https://reproducible-builds.org/

https://wiki.debian.org/ReproducibleBuilds

For projects like that, the sha1sum can change even without changing the source 
code, but by just building at a different time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a
Gerrit-PatchSet: 5
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Valencia Edna Serrao 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Valencia Edna Serrao 
Gerrit-HasComments: Yes


[native-toolchain-CR] Ported native-toolchain to work on ppc64le

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Ported native-toolchain to work on ppc64le
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6468/5/source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch
File 
source/breakpad/breakpad-ffe3e478657dc7126fca6329dfcedc49f4c726d9-patches/0002-Build-breakpad-ffe3e47-on-ppc64le.patch:

Line 1: Only in breakpad-88e-p3/: my_breakpad
> I checked if any interference, of ppc64le changes in x86 code, existed by b
I was hoping we could compare the sha1sum of the binary before/after the 
change? I don't want to hold this up if that doesn't line up, but it would give 
us confidence that we're not affecting the x86 bits :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6468
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a
Gerrit-PatchSet: 5
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Valencia Edna Serrao 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Valencia Edna Serrao 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/646/

-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: Bumped Kudu version to 795c435
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/645/

-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Bumped Kudu version to 795c435
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Bumped Kudu version to 795c435
..


Patch Set 2:

http://unittest.jenkins.cloudera.com/job/verify-impala-toolchain-package-build/394/

-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] Bumped Kudu version to 795c435

2017-05-25 Thread Thomas Tauber-Marshall (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6975

to look at the new patch set (#2).

Change subject: Bumped Kudu version to 795c435
..

Bumped Kudu version to 795c435

This is needed for IMPALA-5167.

Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6975/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I925f5d746dcb497e17ab6f85217da9fcee1d4c00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5316: Adds last day() function

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5316: Adds last_day() function
..


Patch Set 2:

(7 comments)

Thanks for the contribution, Vincent.

http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 279:   void TestLastDayFunction() {
can you check the results of all of these tests are the same as a reference 
database?


Line 281: TestTimestampValue("last_day('2003-01-02')",
can you add times to some of these?


Line 282:   TimestampValue::Parse("2003-01-31 00:00:00", 19));
for all of the below, use the Parse() overload that takes std::string, i.e. 
remove the str len.


Line 316:   TimestampValue::Parse("2100-02-28 00:00:00", 19));
I pointed out a few special cases to consider in the .cc file

In addition:
'1400-01-01 00:00:00' (first supported date)
'-12-31 23:59:59' (last supported date)


http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions-ir.cc
File be/src/exprs/timestamp-functions-ir.cc:

PS2, Line 562: end_of_month()
1) I wonder if you can get a TimestampValue without a Date or with a "special" 
date, eg. maybe if you create a TIMESTAMP with just a time. Please test 
last_day(cast("00:00:00" as timestamp));

2) I think we have to be sure to check the case when timestamp.date() is at the 
end of the supported date range (year 12-31- 23:59:59), I suspect it should 
return a valid date but we should check.

I can't think of any other corner cases.


PS2, Line 562:   ptime pdate(timestamp.date().end_of_month());
 :   TimestampValue tsv(pdate);
I think you can avoid converting to ptime and just use the TimestampValue 
constructor that takes a gregorian date and a time_duration, e.g.

tsv(timestamp.date().end_of_month(), time_duration(0,0,0,0))


http://gerrit.cloudera.org:8080/#/c/6991/2/be/src/exprs/timestamp-functions.h
File be/src/exprs/timestamp-functions.h:

Line 198:   static TimestampVal LastDay(FunctionContext* context, const 
TimestampVal& ts);
please add a comment. look to other fns in the file for examples.


-- 
To view, visit http://gerrit.cloudera.org:8080/6991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I429c8734bddca3c37a2eedc211a16a4ffcb04370
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

PS1, Line 79: eos
> this doesn't include the DML case, where query completion is signalled by t
i'll clarify.


-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#2).

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..

IMPALA-4890/5143: Coordinator race involving TearDown()

TearDown() releases resources and destroys control
structures (the QueryState reference), and it can be called
while a concurrent thread executes Exec() or might call
GetNext() in the future. The solution is not to destroy
the control structures.

This also releases resources automatically at the end
of query execution.

Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
3 files changed, 34 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/6897/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 895
> I believe that PlanRootSink() may set eos in GetNext() at the same time as 
good point, QueryResultSet is self-contained.

i'll leave these two calls in place and add a todo, because getting rid of the 
blocking here isn't necessary for the bug fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module

2017-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5369: Remove old pom parent in testdata module
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/644/

-- 
To view, visit http://gerrit.cloudera.org:8080/6992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5369: Remove old pom parent in testdata module
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Loads all TPC-DS tables

2017-05-25 Thread Michael Ho (Code Review)
Hello Mostafa Mokhtar,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6877

to look at the new patch set (#6).

Change subject: Loads all TPC-DS tables
..

Loads all TPC-DS tables

This change loads the missing tables in TPC-DS. In addition,
it also fixes up the loading of the partitioned table store_sales
so all partitions will be loaded. The existing TPC-DS queries are
also updated to use the parameters for qualification runs as noted
in the TPC-DS specification. Some hard-coded partition filters were
also removed. They were there due to the lack of dynamic partitioning
in the past. Some missing TPC-DS queries are also added to this change,
including query28 which discovered the infamous IMPALA-5251.

Having all tables in TPC-DS available paves the way for us to include
all supported TPCDS queries in our functional testing. Due to the change
in the data, planner tests and the E2E tests have different results than
before. The results of E2E tests were compared against the run done with
Netezza and Vertica. The divergence were all due to the truncation behavior
of decimal types in DECIMAL_V1.

Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
---
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M testdata/bin/compute-table-stats.sh
M testdata/bin/generate-schema-statements.py
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds.test
M testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
M testdata/workloads/tpcds/queries/count.test
A testdata/workloads/tpcds/queries/tpcds-q1.test
M testdata/workloads/tpcds/queries/tpcds-q19.test
A testdata/workloads/tpcds/queries/tpcds-q2.test
A testdata/workloads/tpcds/queries/tpcds-q23-1.test
A testdata/workloads/tpcds/queries/tpcds-q23-2.test
M testdata/workloads/tpcds/queries/tpcds-q27.test
A testdata/workloads/tpcds/queries/tpcds-q27a.test
A testdata/workloads/tpcds/queries/tpcds-q28.test
M testdata/workloads/tpcds/queries/tpcds-q3.test
M testdata/workloads/tpcds/queries/tpcds-q34.test
A testdata/workloads/tpcds/queries/tpcds-q4.test
M testdata/workloads/tpcds/queries/tpcds-q42.test
M testdata/workloads/tpcds/queries/tpcds-q43.test
M testdata/workloads/tpcds/queries/tpcds-q46.test
M testdata/workloads/tpcds/queries/tpcds-q47.test
M testdata/workloads/tpcds/queries/tpcds-q52.test
M testdata/workloads/tpcds/queries/tpcds-q53.test
M testdata/workloads/tpcds/queries/tpcds-q55.test
M testdata/workloads/tpcds/queries/tpcds-q59.test
M testdata/workloads/tpcds/queries/tpcds-q6.test
M testdata/workloads/tpcds/queries/tpcds-q61.test
M testdata/workloads/tpcds/queries/tpcds-q63.test
M testdata/workloads/tpcds/queries/tpcds-q65.test
M testdata/workloads/tpcds/queries/tpcds-q68.test
M testdata/workloads/tpcds/queries/tpcds-q7.test
M testdata/workloads/tpcds/queries/tpcds-q73.test
M testdata/workloads/tpcds/queries/tpcds-q79.test
M testdata/workloads/tpcds/queries/tpcds-q88.test
M testdata/workloads/tpcds/queries/tpcds-q89.test
M testdata/workloads/tpcds/queries/tpcds-q96.test
M testdata/workloads/tpcds/queries/tpcds-q98.test
M tests/query_test/test_mem_usage_scaling.py
M tests/query_test/test_tpcds_queries.py
41 files changed, 10,498 insertions(+), 3,848 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/6877/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 


[Impala-ASF-CR] IMPALA-5369: Remove old pom parent in testdata module

2017-05-25 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6992

Change subject: IMPALA-5369: Remove old pom parent in testdata module
..

IMPALA-5369: Remove old pom parent in testdata module

Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
---
M testdata/pom.xml
1 file changed, 0 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6992/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6992
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie9013aeb5afd631546bda9201d0345dc9321
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-25 Thread Vincent Tran (Code Review)
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 5:

I can make that happen. Can I just push the patch again as normal?

-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04

2017-05-25 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
..


Patch Set 1:

David and I poked at this a little and tried to figure out what was going on - 
why, for instance, would installing the DateTime module make the datetime 
module work? We didn't figure it out.

To compound the strangeness, while this line was required to get this working 
on my machine, now if I remove this line and `python 
./infra/python/bootstrap_virtualenv.py -r`, starting the minicluster works.

Alice laughed: "There's no use trying," she said; "one can't believe
impossible things." "I daresay you haven't had much practice," said the
Queen. "When I was younger, I always did it for half an hour a day. Why,
sometimes I've believed as many as six impossible things before breakfast."

-- 
To view, visit http://gerrit.cloudera.org:8080/6989
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6980/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS1, Line 502: deringExprs.add(
 : KuduUtil.createPartitionEx
> 1 line
Done


PS1, Line 504: orderingExprs.addAll(insertStmt.getPrimaryKeyExprs());
 :   }
 : } else if (insertStmt.hasClusteredHint() || 
!insertStmt.getSortExprs().isEmpty()) {
 :   // NOTE: If the table has a 'sort.colum
> how about refactoring this to KuduUtil to avoid duplicate code w/ Distribut
Done


http://gerrit.cloudera.org:8080/#/c/6980/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

Line 213: 
> add test with both hints
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables

2017-05-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-5354: INSERT hints for Kudu tables
..

IMPALA-5354: INSERT hints for Kudu tables

A previous change, IMPALA-3742, added an exchange node and
sort node to plans for inserts into Kudu tables to partition
and sort the input to match the target table.

This patch enables INSERT hints for Kudu tables - 'noshuffle'
which removes the exchange node from the plan and
'noclustered' which removes the sort node.

Insert hints have no effect for inserts that are small enough
to result in a single node execution.

Testing:
- Updated FE planner and analysis tests.
- Ran Kudu EE tests.

Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
8 files changed, 105 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/6980/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function

2017-05-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5030: Adds support for NVL2() function
..


Patch Set 5:

I'm late to the party but we need a test that confirms the expr gets written 
correctly - e.g. in expr-test.

-- 
To view, visit http://gerrit.cloudera.org:8080/6962
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vincent Tran 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()

2017-05-25 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4890/5143: Coordinator race involving TearDown()
..


Patch Set 1:

(1 comment)

> (2 comments)
 > 
 > Where does ReleaseResources() get called for DML queries? It used
 > to be in ClientRequestState::Done(), I think, but I don't see where
 > it gets called now (it must be, because tests are passing).

This is now done implicitly as part of Cancel(). UnregisterQuery() calls 
CRS::Cancel (which in turn calls Coordinator::Cancel) before it calls Done().

http://gerrit.cloudera.org:8080/#/c/6897/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 113: if (query_state_ != nullptr) {
 : 
ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(query_state_);
 :   }
> I thought that in general we wanted to move away from work done in d'tors?
we want to move away from release resources in d'tors, and from tearing down 
control structures prematurely (as was done in teardown()). the coordinator now 
has access to the query state until the coordinator itself goes away.


-- 
To view, visit http://gerrit.cloudera.org:8080/6897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


  1   2   >