[jira] [Resolved] (IMPALA-6835) Improve Kudu scanner error messages to include the table name and the plan node id

2018-06-18 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6835.
--
Resolution: Fixed

> Improve Kudu scanner error messages to include the table name and the plan 
> node id
> --
>
> Key: IMPALA-6835
> URL: https://issues.apache.org/jira/browse/IMPALA-6835
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: yyzzjj
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: kudu, observability, supportability
> Fix For: Impala 3.1.0
>
>
> E.g: 
> [https://github.com/apache/impala/blob/830e3346f186aebc879e4ef2927e08db97143100/be/src/exec/kudu-scanner.cc#L338]
> big sql  usually contains dozens of tables, when query fail  kudu tserver  or 
> master response error info  which not contain  table name
> Inconvenient positioning problem which table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-4908) NULL floats don't compare equal to other NULL floats

2018-06-18 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-4908.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

IMPALA-4908: NULL floats don't compare equal to other NULL floats

This change ensures that comparing two NULL floats with different
val fields returns true. Although this is undefined behavior, it
is now consistent with other types.

Along with the change, a unit test was added to ensure that
equality checking of floats returns results as expected.

Change-Id: Ie7310645e5752d8203be5abc22a6562a59b6e975
Reviewed-on: http://gerrit.cloudera.org:8080/10707
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 

> NULL floats don't compare equal to other NULL floats
> 
>
> Key: IMPALA-4908
> URL: https://issues.apache.org/jira/browse/IMPALA-4908
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Zachary
>Assignee: Pooja Nilangekar
>Priority: Trivial
> Fix For: Impala 3.1.0
>
>
> FloatVals which are NULL only compare equal to other FloatVals if the float 
> val also matches?
> It's already undefined behavior, but it would be nice to be consistent with 
> other types.  From the code:
>  
>bool operator==(const FloatVal& other) const {
>  return is_null == other.is_null && val == other.val;
>}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7209) Disallow self referencing ALTER VIEW statments

2018-06-25 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7209:


 Summary: Disallow self referencing ALTER VIEW statments
 Key: IMPALA-7209
 URL: https://issues.apache.org/jira/browse/IMPALA-7209
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Pooja Nilangekar
Assignee: Pooja Nilangekar


Currently, an ALTER VIEW statement accepts self referencing definitions. 
However, upon querying the altered view, the analyzer is unable to find the 
reference and hence throws a StackOverflowError: null error.

 The expected behavior would be to throw an AnalysisException while executing 
the alter view statement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7216) Invalid SQL generated by toSql functions in CreateViewStmt & AlterViewStmt

2018-06-26 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7216:


 Summary: Invalid SQL generated by toSql functions in 
CreateViewStmt & AlterViewStmt
 Key: IMPALA-7216
 URL: https://issues.apache.org/jira/browse/IMPALA-7216
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Pooja Nilangekar


The toSql functions in CreateViewStmt and AlterViewStmt generate SQL by 
appending types to the column definitions. This is invalid because view 
definitions should not specify the type of a column. The column type should be 
inherited from the source table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6305) Allow column definitions in ALTER VIEW

2018-06-27 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6305.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Allow column definitions in ALTER VIEW
> --
>
> Key: IMPALA-6305
> URL: https://issues.apache.org/jira/browse/IMPALA-6305
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Alexander Behm
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: sql-language
> Fix For: Impala 3.1.0
>
>
> When working with views we currently only allow separate column definitions 
> in CREATE VIEW but not in ALTER VIEW.
> Example:
> {code}
> create table t1 (c1 int, c2 int);
> create view v (x comment 'hello world', y) as select * from t1;
> describe v;
> +--+--+-+
> | name | type | comment |
> +--+--+-+
> | x| int  | hello world |
> | y| int  | |
> +--+--+-+
> {code}
> Currently we cannot use ALTER VIEW to change the column definitions after the 
> fact, i.e. the following should be supported:
> {code}
> alter view v (z1, z2 comment 'foo bar') as select * from t1;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7234) Non-deterministic majority format for a table with equal partition instances

2018-07-02 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7234:


 Summary: Non-deterministic majority format for a table with equal 
partition instances 
 Key: IMPALA-7234
 URL: https://issues.apache.org/jira/browse/IMPALA-7234
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Pooja Nilangekar


The getMajorityFormat method of the FeCatalogUtils currently returns 
non-deterministic results when its argument is a list of partitions where there 
is no numerical majority in terms of the number of instances. The result is 
determined by the order in which the partitions are added to the HashMap. We 
need more deterministic results which also considers the memory requirement 
among different types of partitions. Ideally, this function should return the 
format with higher memory requirements in case of a tie. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7246) TestMtDopParquet.test_parquet_filtering is flaky

2018-07-03 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7246:


 Summary: TestMtDopParquet.test_parquet_filtering is flaky
 Key: IMPALA-7246
 URL: https://issues.apache.org/jira/browse/IMPALA-7246
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
 Attachments: TEST-impala-parallel.log, TEST-impala-parallel.xml

The failure observer in IMPALA-6960 occurred again while testing 
[https://gerrit.cloudera.org/c/10704/]

The query is different however, the cause of failure is the same. In this case, 
the profile for Instance e7462de57ef6fd00:25f810c30005 
(host=ip-172-31-28-156:22000) is empty. The query returned the expected results 
however, the runtime profile wasn't updated. 

I have attached the failure logs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6625) Skip dictionary and collection conjunct assignment for non-Parquet scans.

2018-07-06 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6625.
--
Resolution: Fixed

> Skip dictionary and collection conjunct assignment for non-Parquet scans.
> -
>
> Key: IMPALA-6625
> URL: https://issues.apache.org/jira/browse/IMPALA-6625
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>Reporter: Alexander Behm
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: perf, planner
>
> In HdfsScanNode.init() we try to assign dictionary and collection conjuncts 
> even for non-Parquet scans. Such predicates only make sense for Parquet 
> scans, so there is no point in collecting them for other scans.
> The current behavior is undesirable because:
> * init() can be substantially slower because assigning dictionary filters may 
> involve evaluating exprs in the BE which can be expensive
> * the explain plan of non-Parquet scans may have a section "parquet 
> dictionary predicates" which is confusing/misleading
> Relevant code snippet from HdfsScanNode:
> {code}
> @Override
>   public void init(Analyzer analyzer) throws ImpalaException {
> conjuncts_ = orderConjunctsByCost(conjuncts_);
> checkForSupportedFileFormats();
> assignCollectionConjuncts(analyzer);
> computeDictionaryFilterConjuncts(analyzer);
> // compute scan range locations with optional sampling
> Set fileFormats = computeScanRangeLocations(analyzer);
> ...
> if (fileFormats.contains(HdfsFileFormat.PARQUET)) { <--- assignment 
> should go in here
>   computeMinMaxTupleAndConjuncts(analyzer);
> }
> ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6031) Distributed plan describes coordinator-only nodes as scanning

2018-07-09 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6031.
--
Resolution: Fixed

> Distributed plan describes coordinator-only nodes as scanning
> -
>
> Key: IMPALA-6031
> URL: https://issues.apache.org/jira/browse/IMPALA-6031
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Jim Apple
>Assignee: Pooja Nilangekar
>Priority: Major
>
> In a cluster with one coordinator-only node and three executor-only nodes:
> {noformat}
> Query: explain select count(*) from web_sales a, web_sales b where 
> a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
> +--+
> | Explain String  
>  |
> +--+
> | Per-Host Resource Reservation: Memory=136.00MB  
>  |
> | Per-Host Resource Estimates: Memory=3.04GB  
>  |
> | 
>  |
> | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>  |
> |   PLAN-ROOT SINK
>  |
> |   |  mem-estimate=0B mem-reservation=0B 
>  |
> |   | 
>  |
> |   07:AGGREGATE [FINALIZE]   
>  |
> |   |  output: count:merge(*) 
>  |
> |   |  mem-estimate=10.00MB mem-reservation=0B
>  |
> |   |  tuple-ids=2 row-size=8B cardinality=1  
>  |
> |   | 
>  |
> |   06:EXCHANGE [UNPARTITIONED]   
>  |
> |  mem-estimate=0B mem-reservation=0B 
>  |
> |  tuple-ids=2 row-size=8B cardinality=1  
>  |
> | 
>  |
> | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4 
> instances=4 |
> |   DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED]
>  |
> |   |  mem-estimate=0B mem-reservation=0B 
>  |
> |   03:AGGREGATE  
>  |
> |   |  output: count(*)   
>  |
> |   |  mem-estimate=10.00MB mem-reservation=0B
>  |
> |   |  tuple-ids=2 row-size=8B cardinality=1  
>  |
> |   | 
>  |
> |   02:HASH JOIN [INNER JOIN, PARTITIONED]
>  |
> |   |  hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number = 
> b.ws_order_number |
> |   |  runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number 
>  |
> |   |  mem-estimate=2.95GB mem-reservation=136.00MB   
>  |
> |   |  tuple-ids=0,1 row-size=32B cardinality=72376   
>  |
> |   | 
>  |
> |   |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)] 
>  |
> |   | mem-estimate=0B mem-reservation=0B  
>  |
> |   | tuple-ids=1 row-size=16B cardinality=72376  
>  |
> |   | 
>  |
> |   04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)]
>  |
> |  mem-estimate=0B mem-reservation=0B 
>  |
> |  tuple-ids=0 row-size=16B cardinality=72376 
>  |
> | 
>  |
> | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4  

[jira] [Resolved] (IMPALA-6223) Gracefully handle malformed 'with' queries in impala-shell

2018-07-09 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6223.
--
Resolution: Fixed

> Gracefully handle malformed 'with' queries in impala-shell
> --
>
> Key: IMPALA-6223
> URL: https://issues.apache.org/jira/browse/IMPALA-6223
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.10.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Minor
>  Labels: newbie
>
> Impala shell can throw a lexer error if it encounters a malformed "with" 
> query.
> {noformat}
> impala-shell.sh -q "with foo as (select bar from temp where temp.a='"
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 2.11.0-SNAPSHOT DEBUG (build 
> 0ee1765f38082bc5c10aa37b23cb8e57caa57d4e)
> Traceback (most recent call last):
>   File "/home/bharath/Impala/shell/impala_shell.py", line 1463, in 
> execute_queries_non_interactive_mode(options, query_options)
>   File "/home/bharath/Impala/shell/impala_shell.py", line 1338, in 
> execute_queries_non_interactive_mode
> shell.execute_query_list(queries)):
>   File "/home/bharath/Impala/shell/impala_shell.py", line 1218, in 
> execute_query_list
> if self.onecmd(q) is CmdStatus.ERROR:
>   File "/home/bharath/Impala/shell/impala_shell.py", line 505, in onecmd
> return cmd.Cmd.onecmd(self, line)
>   File "/usr/lib/python2.7/cmd.py", line 221, in onecmd
> return func(arg)
>   File "/home/bharath/Impala/shell/impala_shell.py", line 1024, in do_with
> tokens = list(lexer)
>   File "/usr/lib/python2.7/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib/python2.7/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib/python2.7/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> {noformat}
> This happens because we use shlex to parse the input query to determine if 
> its a DML and it can throw if the input doesn't have balanced quotes.
> {noformat}
> def do_with(self, args):
> """Executes a query with a WITH clause, fetching all rows"""
> query = self.imp_client.create_beeswax_query("with %s" % args,
>  self.set_query_options)
> # Set posix=True and add "'" to escaped quotes
> # to deal with escaped quotes in string literals
> lexer = shlex.shlex(query.query.lstrip(), posix=True)
> lexer.escapedquotes += "'"
> # Because the WITH clause may precede DML or SELECT queries,
> # just checking the first token is insufficient.
> is_dml = False
> tokens = list(lexer)  <
> {noformat}
> A simple shlex repro of that is as follows,
> {noformat}
> >>> lexer = shlex.shlex("with foo as (select bar from temp where temp.a='", 
> >>> posix=True);
> >>> list(lexer)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/lib/python2.7/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib/python2.7/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib/python2.7/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> {noformat}
> Fix: Either catch the exception and handle it gracefully or have a better way 
> to figure out the query type, using a SQL parser (more involved).
> This query also repros it:
> {code}
> with v as (select 1)
> select foo('\\'), ('bar
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7216) Invalid SQL generated by toSql functions in CreateViewStmt & AlterViewStmt

2018-07-11 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7216.
--
Resolution: Fixed

> Invalid SQL generated by toSql functions in CreateViewStmt & AlterViewStmt
> --
>
> Key: IMPALA-7216
> URL: https://issues.apache.org/jira/browse/IMPALA-7216
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Major
>
> The toSql functions in CreateViewStmt and AlterViewStmt generate SQL by 
> appending types to the column definitions. This is invalid because view 
> definitions should not specify the type of a column. The column type should 
> be inherited from the source table. 
>  
> Example query to reproduce:
> {code:java}
> create view foo (a, b) as select int_col, bigint_col from functional.alltypes;
> {code}
> The SQL generated by the toSql() function:
> {code:java}
> CREATE VIEW foo(a INT, b BIGINT) AS SELECT int_col, bigint_col FROM 
> functional.alltypes
> {code}
> Executing the query generated by toSql():
> {code:java}
> [localhost:21000] default> CREATE VIEW foo(a INT, b BIGINT) AS SELECT 
> int_col, bigint_col FROM functional.alltypes;
> Query: CREATE VIEW foo(a INT, b BIGINT) AS SELECT int_col, bigint_col FROM 
> functional.alltypes
> ERROR: AnalysisException: Syntax error in line 1:
> CREATE VIEW foo(a INT, b BIGINT) AS SELECT int...
>   ^
> Encountered: INTEGER
> Expected: COMMENT, COMMA
> CAUSED BY: Exception: Syntax error
> {code}
> In other databases like MySQL and PostgreSQL, the view definition statements 
> can't explicitly set the column types. The type of a column in a view should 
> always be inherited from the source table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7209) Disallow self referencing ALTER VIEW statments

2018-07-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7209.
--
Resolution: Fixed

> Disallow self referencing ALTER VIEW statments
> --
>
> Key: IMPALA-7209
> URL: https://issues.apache.org/jira/browse/IMPALA-7209
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Major
>
> Currently, an ALTER VIEW statement accepts self referencing definitions. 
> However, upon querying the altered view, the analyzer is unable to find the 
> reference and hence throws a StackOverflowError: null error.
> The expected behavior would be to throw an AnalysisException while executing 
> the alter view statement.
>  
> Example:
>  
> {code:java}
> [localhost:21000] default> create view foo as select * from 
> functional.alltypes;
> Query: create view foo as select * from functional.alltypes
> Query submitted at: 2018-07-03 11:36:48 (Coordinator: 
> http://pooja-OptiPlex-7040:25000)
> Query progress can be monitored at: 
> http://pooja-OptiPlex-7040:25000/query_plan?query_id=614e03efbcb4d8b1:586a4bad
> ++
> | summary                |
> ++
> | View has been created. |
> ++
> Fetched 1 row(s) in 0.26s
> [localhost:21000] default> alter view foo as select * from foo;
> Query: alter view foo as select * from foo
> ++
> | summary                |
> ++
> | View has been altered. |
> ++
> Fetched 1 row(s) in 5.65s
> [localhost:21000] default> select * from foo;
> Query: select * from foo
> Query submitted at: 2018-07-03 11:37:12 (Coordinator: 
> http://pooja-OptiPlex-7040:25000)
> ERROR: StackOverflowError: null 
> {code}
>  
> The select statement on the view fails because the analyzer can't resolve its 
> reference. Other databases return failure during the alter view statement 
> because stating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7234) Non-deterministic majority format for a table with equal partition instances

2018-08-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7234.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Non-deterministic majority format for a table with equal partition instances 
> -
>
> Key: IMPALA-7234
> URL: https://issues.apache.org/jira/browse/IMPALA-7234
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> The getMajorityFormat method of the FeCatalogUtils currently returns 
> non-deterministic results when its argument is a list of partitions where 
> there is no numerical majority in terms of the number of instances. The 
> result is determined by the order in which the partitions are added to the 
> HashMap. We need more deterministic results which also considers the memory 
> requirement among different types of partitions. Ideally, this function 
> should return the format with higher memory requirements in case of a tie. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-6153) Prevent Coordinator::UpdateFilter() running after query exec resources are released

2018-08-09 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar closed IMPALA-6153.

   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Prevent Coordinator::UpdateFilter() running after query exec resources are 
> released
> ---
>
> Key: IMPALA-6153
> URL: https://issues.apache.org/jira/browse/IMPALA-6153
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sailesh Mukil
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: query-lifecycle, runtime-filters
> Fix For: Impala 3.1.0
>
>
> Coordinator::UpdateFilter() and CoordinatorBackendState::PublishFilter() run 
> independent of the lifecycle of any fragment instance. This is problematic 
> during query teardown.
> Specifically we should not release resources for a query if any one of those 
> above functions are still running for that query and we also should not not 
> start running the above methods after resources are released for the query. 
> Also, the 'rpc_params' in UpdateFilter() could potentially hold large amounts 
> of untracked memory, so we should track it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7430) Remove the log added to HdfsScanNode::ScannerThread

2018-08-13 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7430:


 Summary: Remove the log added to HdfsScanNode::ScannerThread
 Key: IMPALA-7430
 URL: https://issues.apache.org/jira/browse/IMPALA-7430
 Project: IMPALA
  Issue Type: Task
Reporter: Pooja Nilangekar
Assignee: Pooja Nilangekar


Logs were added to the HdfsScanNode in order to debug IMPALA-7135 and 
IMPALA-7418. These need to be removed once the cause of these bugs is 
established and we find a way to reproduce it locally. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7418) test_udf_errors - returns Cancelled instead of actual error

2018-08-23 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7418.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> test_udf_errors - returns Cancelled instead of actual error
> ---
>
> Key: IMPALA-7418
> URL: https://issues.apache.org/jira/browse/IMPALA-7418
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> {noformat}
> query_test.test_udfs.TestUdfExecution.test_udf_errors[exec_option: 
> {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'exec_single_node_rows_threshold': 0, 'enable_expr_rewrites': True} | 
> table_format: text/none] (from pytest)
> Failing for the past 1 build (Since Failed#2925 )
> Took 19 sec.
> add description
> Error Message
> query_test/test_udfs.py:415: in test_udf_errors 
> self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database) 
> common/impala_test_suite.py:412: in run_test_case 
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
> common/impala_test_suite.py:290: in __verify_exceptions (expected_str, 
> actual_str) E   AssertionError: Unexpected exception string. Expected: 
> BadExpr2 prepare error E   Not found in actual: ImpalaBeeswaxException: Query 
> aborted:Cancelled
> Stacktrace
> query_test/test_udfs.py:415: in test_udf_errors
> self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
> common/impala_test_suite.py:412: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:290: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: BadExpr2 prepare 
> error
> E   Not found in actual: ImpalaBeeswaxException: Query aborted:Cancelled
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_udf_errors_be4e0293` CASCADE;
> MainThread: Started query bd4790b45c20640d:9c62ffba
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_udf_errors_be4e0293`;
> MainThread: Started query 474595a3ecba67bd:7a14c84
> MainThread: Created database "test_udf_errors_be4e0293" for test ID 
> "query_test/test_udfs.py::TestUdfExecution::()::test_udf_errors[exec_option: 
> {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'exec_single_node_rows_threshold': 0, 'enable_expr_rewrites': True} | 
> table_format: text/none]"
> -- executing against localhost:21000
> use test_udf_errors_be4e0293;
> MainThread: Started query 264b0cd09d289c09:cc5dafed
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=True;
> SET exec_single_node_rows_threshold=0;
> SET enable_expr_rewrites=True;
> -- executing against localhost:21000
> create function if not exists hive_pi() returns double
> location '/test-warehouse/hive-exec.jar'
> symbol='org.apache.hadoop.hive.ql.udf.UDFPI';
> MainThread: Started query ba41ccb6f020becd:db23209f
> -- executing against localhost:21000
> create function if not exists foo() returns double
> location '/test-warehouse/not-a-real-file.so'
> symbol='FnDoesNotExist';
> -- executing against localhost:21000
> create function if not exists foo() returns double
> location '/test-warehouse/not-a-real-file.so'
> symbol='FnDoesNotExist';
> -- executing against localhost:21000
> create function if not exists foo (string, string) returns string location
> '/test-warehouse/test_udf_errors_be4e0293_bad_udf.ll' symbol='MyAwesomeUdf';
> -- executing against localhost:21000
> create function if not exists twenty_args(int, int, int, int, int, int,
> int, int, int, int, int, int, int, int, int, int, int, int, int, int) 
> returns int
> location '/test-warehouse/libTestUdfs.so'
> symbol='TwentyArgs';
> MainThread: Started query 6b4dc82f22e2f0f6:9d28ab03
> -- executing against localhost:21000
> select twenty_args(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
> MainThread: Started query d40ea0f1effacd1:22e5c31f
> -- executing against localhost:21000
> create function if not exists twenty_one_args(int, int, int, int, int, int,
> int, int, int, int, int, int, int, int, int, int, int, int, int, int, 
> int) returns int
> location '/test-warehouse/libTestUdfs.so'
> symbol='TwentyOneArgs';
> MainThread: Started query 12453a7e4b13fa4d:d163be33
> -- executing against localhost:21000
> select twenty_one_args(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21);
> MainThread: Started query 26461e2ce5ce3adf:3544166a
> -- executing against 

[jira] [Resolved] (IMPALA-6644) Add last heartbeat timestamp into Statestore metric

2018-08-30 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6644.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Add last heartbeat timestamp into Statestore metric
> ---
>
> Key: IMPALA-6644
> URL: https://issues.apache.org/jira/browse/IMPALA-6644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mala Chikka Kempanna
>Assignee: Pooja Nilangekar
>Priority: Minor
>  Labels: ramp-up, supportability
> Fix For: Impala 3.1.0
>
>
> In the latest and previous versions, statestore in it's default logging 
> reports only when it fails to send heartbeat to any host.
> There is no way to confirm if Statestore is indeed continuing to heartbeat in 
> all passing conditions, except for turning on debug logs, which becomes too 
> noisy. But at the same time its important to know statestore is indeed 
> heartbeating.
> The suggestion here is to add a metric in statestore metric page and also 
> print the same in log every once in a minute(or any configurable 
> time-frequency), reporting the last heartbeat timestamp and last heartbeat 
> host(optional).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7551) Inaccurate timeline for "Row Available"

2018-09-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7551:


 Summary: Inaccurate timeline for "Row Available" 
 Key: IMPALA-7551
 URL: https://issues.apache.org/jira/browse/IMPALA-7551
 Project: IMPALA
  Issue Type: Improvement
Reporter: Pooja Nilangekar


While debugging IMPALA-6932, it was noticed that the "Rows Available" metric in 
the query profile was a short duration (~ 1 second) for a long running limit 1 
query (~ 1 hour).

Currently, it tracks when Open() from the top-most node in the plan returns, 
not when the first row is actually produced. This can be misleading. A better 
timeline would be to return true when the first non-empty batch was added to 
the PlanRootSink. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7430) Remove the log added to HdfsScanNode::ScannerThread

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7430.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Remove the log added to HdfsScanNode::ScannerThread
> ---
>
> Key: IMPALA-7430
> URL: https://issues.apache.org/jira/browse/IMPALA-7430
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> Logs were added to the HdfsScanNode in order to debug IMPALA-7335 and 
> IMPALA-7418. These need to be removed once the cause of these bugs is 
> established and we find a way to reproduce it locally. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7335) Assertion Failure - test_corrupt_files

2018-09-21 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7335.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Assertion Failure - test_corrupt_files
> --
>
> Key: IMPALA-7335
> URL: https://issues.apache.org/jira/browse/IMPALA-7335
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: nithya
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> test_corrupt_files fails 
>  
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option: 
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from 
> pytest)
>  
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files     
> self.run_test_case('QueryTest/parquet-abort-on-error', vector) 
> common/impala_test_suite.py:420: in run_test_case     assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: Column 
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
>     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
>     assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Column metadata states there are 11 
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7352) HdfsTableSink doesn't take into account insert clustering

2018-09-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7352.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> HdfsTableSink doesn't take into account insert clustering
> -
>
> Key: IMPALA-7352
> URL: https://issues.apache.org/jira/browse/IMPALA-7352
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.1.0
>
>
> I noticed that the code doesn't check whether the insert is clustered, which 
> would mean it only produces a single partition at a time.
> {code}
>   @Override
>   public void computeResourceProfile(TQueryOptions queryOptions) {
> HdfsTable table = (HdfsTable) targetTable_;
> // TODO: Estimate the memory requirements more accurately by partition 
> type.
> HdfsFileFormat format = table.getMajorityFormat();
> PlanNode inputNode = fragment_.getPlanRoot();
> int numInstances = fragment_.getNumInstances(queryOptions.getMt_dop());
> // Compute the per-instance number of partitions, taking the number of 
> nodes
> // and the data partition of the fragment executing this sink into 
> account.
> long numPartitionsPerInstance =
> fragment_.getPerInstanceNdv(queryOptions.getMt_dop(), 
> partitionKeyExprs_);
> if (numPartitionsPerInstance == -1) {
>   numPartitionsPerInstance = DEFAULT_NUM_PARTITIONS;
> }
> long perPartitionMemReq = getPerPartitionMemReq(format);
> long perInstanceMemEstimate;
> // The estimate is based purely on the per-partition mem req if the input 
> cardinality_
> // or the avg row size is unknown.
> if (inputNode.getCardinality() == -1 || inputNode.getAvgRowSize() == -1) {
>   perInstanceMemEstimate = numPartitionsPerInstance * perPartitionMemReq;
> } else {
>   // The per-partition estimate may be higher than the memory required to 
> buffer
>   // the entire input data.
>   long perInstanceInputCardinality =
>   Math.max(1L, inputNode.getCardinality() / numInstances);
>   long perInstanceInputBytes =
>   (long) Math.ceil(perInstanceInputCardinality * 
> inputNode.getAvgRowSize());
>   long perInstanceMemReq =
>   PlanNode.checkedMultiply(numPartitionsPerInstance, 
> perPartitionMemReq);
>   perInstanceMemEstimate = Math.min(perInstanceInputBytes, 
> perInstanceMemReq);
> }
> resourceProfile_ = ResourceProfile.noReservation(perInstanceMemEstimate);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7678) Revert IMPALA-7660

2018-10-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7678:


 Summary: Revert IMPALA-7660
 Key: IMPALA-7678
 URL: https://issues.apache.org/jira/browse/IMPALA-7678
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


After merging IMPALA-7660, impala server starts up but start-impala-cluster.py 
can't contact the debug webpage on RHEL builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7690) TestAdmissionController.test_pool_config_change_while_queued fails on centos6

2018-10-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7690:


 Summary: 
TestAdmissionController.test_pool_config_change_while_queued fails on centos6
 Key: IMPALA-7690
 URL: https://issues.apache.org/jira/browse/IMPALA-7690
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar
Assignee: Bikramjeet Vig


TestAdmissionController.test_pool_config_change_while_queued fails on Centos6 
because python 2.6 does not support iter() on {{xml.etree.ElementTree. }}

 

Here are the logs from the test failure:

 
{code:java}
custom_cluster/test_admission_controller.py:767: in 
test_pool_config_change_while_queued
config.set_config_value(pool_name, config_str, 1)
common/resource_pool_config.py:43: in set_config_value
node = self.__find_xml_node(self.root, pool_name, config_str)
common/resource_pool_config.py:86: in __find_xml_node
for property in xml_root.iter('property'):
E   AttributeError: _ElementInterface instance has no attribute 'iter'
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7697) Query gets erased before reporting ExecSummary

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7697:


 Summary: Query gets erased before reporting ExecSummary
 Key: IMPALA-7697
 URL: https://issues.apache.org/jira/browse/IMPALA-7697
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


In a recent build, certain queries went missing from the 
ImpalaServer::query_log_index_ before the ImpalaServer::GetExecSumamry function 
was invoked. Hence the test case failed. An easy (intermediate) fix would be to 
increase FLAGS_query_log_size. However, ideally the query shouldn't get erased 
before the ExecSummary has been reported to the client via the beeswax/hs2 
servers. 

Here are the test logs:

{code:java}
Error Message

query_test/test_resource_limits.py:45: in test_resource_limits 
self.run_test_case('QueryTest/query-resource-limits', vector) 
common/impala_test_suite.py:478: in run_test_case assert False, "Expected 
exception: %s" % expected_str E   AssertionError: Expected exception: 
row_regex:.*expired due to execution time limit of 2s000ms.*

Stacktrace

query_test/test_resource_limits.py:45: in test_resource_limits
self.run_test_case('QueryTest/query-resource-limits', vector)
common/impala_test_suite.py:478: in run_test_case
assert False, "Expected exception: %s" % expected_str
E   AssertionError: Expected exception: row_regex:.*expired due to execution 
time limit of 2s000ms.*

Standard Error
-- executing against localhost:21000
SET SCAN_BYTES_LIMIT="0";

-- 2018-10-10 22:38:29,826 INFO MainThread: Started query 
8e45a13bc999749e:58175e16
{code}


Here are the impalad logs: 

{code:java}
impalad.INFO.20181010-191824.5460:I1010 22:38:29.825745 31580 
impala-server.cc:1060] Registered query 
query_id=8e45a13bc999749e:58175e16 
session_id=43434de5f83010f9:7e0750ad7ad86b80
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826026 31580 
impala-server.cc:1115] Query 8e45a13bc999749e:58175e16 has scan bytes 
limit of 100.00 GB
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826328 31580 
impala-beeswax-server.cc:197] get_results_metadata(): 
query_id=8e45a13bc999749e:58175e16
impalad..INFO.20181010-191824.5460:I1010 22:38:29.826584 31580 
impala-server.cc:776] Query id 8e45a13bc999749e:58175e16 not found.
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826858 31580 
impala-beeswax-server.cc:239] close(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826861 31580 
impala-server.cc:1127] UnregisterQuery(): 
query_id=8e45a13bc999749e:58175e16
impalad.INFO.20181010-191824.5460:I1010 22:38:29.826864 31580 
impala-server.cc:1238] Cancel(): query_id=8e45a13bc999749e:58175e16

{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7699) TestSpillingNoDebugActionDimensions fails earlier than expected

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7699:


 Summary: TestSpillingNoDebugActionDimensions fails earlier than 
expected 
 Key: IMPALA-7699
 URL: https://issues.apache.org/jira/browse/IMPALA-7699
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Tim Armstrong


In some of the recent runs, the query fails without sufficient memory, however 
it is in the HDFS scan node rather than the hash join node. Here are the 
corresponding logs: 

Stacktrace:
{code:java}
query_test/test_spilling.py:113: in test_spilling_no_debug_action 
self.run_test_case('QueryTest/spilling-no-debug-action', vector) 
common/impala_test_suite.py:466: in run_test_case 
self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 
common/impala_test_suite.py:319: in __verify_exceptions (expected_str, 
actual_str) E AssertionError: Unexpected exception string. Expected: 
row_regex:.*Cannot perform hash join at node with id .*. Repartitioning did not 
reduce the size of a spilled partition.* E Not found in actual: 
ImpalaBeeswaxException: Query aborted:Memory limit exceeded: Failed to allocate 
tuple bufferHDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without 
exceeding limit.Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
{code}
 

Here are the impalad logs: 
{code:java}
I1010 18:31:30.721693  7270 coordinator.cc:498] ExecState: query 
id=2e4f0f944d373848:9ae1d7e2 
finstance=2e4f0f944d373848:9ae1d7e20002 on host=localhost:22001 (EXECUTING 
-> ERROR) status=Memory limit exceeded: Failed to allocate tuple buffer
HDFS_SCAN_NODE (id=1) could not allocate 190.00 KB without exceeding limit.
Error occurred on backend localhost:22001 by fragment 
2e4f0f944d373848:9ae1d7e20002
Memory left in process limit: 9.19 GB
Memory left in query limit: 157.62 KB
Query(2e4f0f944d373848:9ae1d7e2): Limit=150.00 MB Reservation=117.25 MB 
ReservationLimit=118.00 MB OtherMemory=32.60 MB Total=149.85 MB Peak=149.85 MB
  Unclaimed reservations: Reservation=5.75 MB OtherMemory=0 Total=5.75 MB 
Peak=55.75 MB
  Fragment 2e4f0f944d373848:9ae1d7e20003: Reservation=2.00 MB 
OtherMemory=22.20 MB Total=24.20 MB Peak=24.20 MB
Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB 
OtherMemory=0 Total=2.00 MB Peak=2.00 MB
SORT_NODE (id=3): Total=0 Peak=0
HASH_JOIN_NODE (id=2): Total=42.25 KB Peak=42.25 KB
  Exprs: Total=13.12 KB Peak=13.12 KB
  Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=13.12 KB
Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB
HDFS_SCAN_NODE (id=0): Total=0 Peak=0
EXCHANGE_NODE (id=4): Reservation=18.79 MB OtherMemory=235.89 KB 
Total=19.02 MB Peak=19.02 MB
  KrpcDeferredRpcs: Total=235.89 KB Peak=235.89 KB
KrpcDataStreamSender (dst_id=5): Total=480.00 B Peak=480.00 B
CodeGen: Total=3.13 MB Peak=3.13 MB
  Fragment 2e4f0f944d373848:9ae1d7e20002: Reservation=109.50 MB 
OtherMemory=10.39 MB Total=119.89 MB Peak=119.89 MB
HDFS_SCAN_NODE (id=1): Reservation=109.50 MB OtherMemory=10.20 MB 
Total=119.70 MB Peak=119.70 MB
  Queued Batches: Total=6.12 MB Peak=6.12 MB
KrpcDataStreamSender (dst_id=4): Total=688.00 B Peak=688.00 B
CodeGen: Total=488.00 B Peak=51.00 KB
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7700) test_shell_commandline.TestImpalaShell.test_cancellation failure

2018-10-11 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7700:


 Summary: test_shell_commandline.TestImpalaShell.test_cancellation  
 failure
 Key: IMPALA-7700
 URL: https://issues.apache.org/jira/browse/IMPALA-7700
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Thomas Tauber-Marshall


The query is not getting cancelled as expected. Here are the logs: 


{code:java}
Error Message

/data/jenkins/workspace/impala-cdh6.0.x-core/repos/Impala/tests/shell/test_shell_commandline.py:328:
 in test_cancellation result = p.get_result() shell/util.py:154: in 
get_result result.stdout, result.stderr = 
self.shell_process.communicate(input=stdin_input) 
/usr/lib64/python2.7/subprocess.py:800: in communicate return 
self._communicate(input) /usr/lib64/python2.7/subprocess.py:1401: in 
_communicate stdout, stderr = self._communicate_with_poll(input) 
/usr/lib64/python2.7/subprocess.py:1455: in _communicate_with_poll ready = 
poller.poll() E   Failed: Timeout >7200s
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7545) Add admission control status to query log

2018-10-17 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7545.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Add admission control status to query log
> -
>
> Key: IMPALA-7545
> URL: https://issues.apache.org/jira/browse/IMPALA-7545
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: admission-control, observability
> Fix For: Impala 3.1.0
>
>
> We already include the query progress in the HS2 GetLog() response (although 
> for some reason we don't do the same for beeswax) so we should include 
> admission control progress. We should definitely include it if the query is 
> currently queued, it's probably too noisy to include once the query has been 
> admitted.
> We should also do the same for beeswax/impala-shell so that 
> live_progress/live_summary is useful if the query is queued. We should look 
> at the live_progress/live_summary mechanisms and extend those to include the 
> required information to report admission control state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-29 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7749.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=6001215 
>   |
> | |  in pipelines: 00(GETNEXT)
>   |
> | |   
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
>   |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB 
> thread-reservation=2|
> | 01:AGGREGATE [STREAMING]
>   |
> | |  output: count(*) 
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB 
> 

[jira] [Created] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7791:


 Summary: Aggregation Node memory estimates don't account for 
number of fragment instances
 Key: IMPALA-7791
 URL: https://issues.apache.org/jira/browse/IMPALA-7791
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar


AggregationNode's memory estimates are calculated based on the input 
cardinality of the node, without accounting for the division of input data 
across fragment instances. This results in very high memory estimates. In 
reality, the nodes often use only a part of this memory.   

Example query:

{code:java}
[localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
{code}

Summary: 

{code:java}
+--++--+--+---++---+---+---+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  
| Est. Peak Mem | Detail




|
+--++--+--+---++---+---+---+
| 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB  
| 16.00 KB  | UNPARTITIONED 




|
| 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 MB 
| 1.62 GB   | FINALIZE  




|
| 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB   
| 10.78 MB  | 
HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
 |
| 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB  
| 1.62 GB   | STREAMING 




|
| 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB  
| 264.00 MB | tpch.lineitem 




   

[jira] [Resolved] (IMPALA-7363) Spurious error generated by sequence file scanner with weird scan range length

2018-11-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7363.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Spurious error generated by sequence file scanner with weird scan range length
> --
>
> Key: IMPALA-7363
> URL: https://issues.apache.org/jira/browse/IMPALA-7363
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: avro
> Fix For: Impala 3.1.0
>
>
> Repro on master
> {noformat}
> tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh
> Starting Impala Shell without Kerberos authentication
> Connected to localhost:21000
> Server version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> cec33fa0ae75392668273d40b5a1bc4bbd7e9e2e)
> ***
> Welcome to the Impala shell.
> (Impala Shell v3.1.0-SNAPSHOT (cec33fa) built on Thu Jul 26 09:50:10 PDT 2018)
> To see a summary of a query's progress that updates in real-time, run 'set
> LIVE_PROGRESS=1;'.
> ***
> [localhost:21000] default> use tpch_seq_snap;
> Query: use tpch_seq_snap
> [localhost:21000] tpch_seq_snap> SET max_scan_range_length=5377;
> MAX_SCAN_RANGE_LENGTH set to 5377
> [localhost:21000] tpch_seq_snap> select count(*)
>> from lineitem;
> Query: select count(*)
> from lineitem
> Query submitted at: 2018-07-26 14:10:18 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=e9428efe173ad2f4:84b66bdb
> +--+
> | count(*) |
> +--+
> | 5993651  |
> +--+
> WARNINGS: SkipText: length is negative
> Problem parsing file 
> hdfs://localhost:20500/test-warehouse/tpch.lineitem_seq_snap/00_0 at 
> 36472193
> {noformat}
> Found while adding a test for IMPALA-7360



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7814) Aggregation Node

2018-11-05 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7814:


 Summary: Aggregation Node
 Key: IMPALA-7814
 URL: https://issues.apache.org/jira/browse/IMPALA-7814
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Pooja Nilangekar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-11-08 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7791.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Aggregation Node memory estimates don't account for number of fragment 
> instances
> 
>
> Key: IMPALA-7791
> URL: https://issues.apache.org/jira/browse/IMPALA-7791
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
> Fix For: Impala 3.1.0
>
>
> AggregationNode's memory estimates are calculated based on the input 
> cardinality of the node, without accounting for the division of input data 
> across fragment instances. This results in very high memory estimates. In 
> reality, the nodes often use only a part of this memory.   
> Example query:
> {code:java}
> [localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
> {code}
> Summary: 
> {code:java}
> +--++--+--+---++---+---+---+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem 
>  | Est. Peak Mem | Detail 
>   
>   
>   
>   
>|
> +--++--+--+---++---+---+---+
> | 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB 
>  | 16.00 KB  | UNPARTITIONED  
>   
>   
>   
>   
>|
> | 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 
> MB | 1.62 GB   | FINALIZE 
>   
>   
>   
>   
>  |
> | 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB  
>  | 10.78 MB  | 
> HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
>  |
> | 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB 
>  | 1.62 GB   | STREAMING  
>   
>   
>   
>   
>|
> | 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.

[jira] [Resolved] (IMPALA-7367) Pack StringValue, CollectionValue and TimestampValue slots

2018-11-19 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7367.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Pack StringValue, CollectionValue and TimestampValue slots
> --
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: perfomance
> Fix For: Impala 3.2.0
>
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 
> didn't actually fully pack the memory layout because StringValue, 
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12 
> bytes of actual data. This results in a higher memory footprint, which leads 
> to higher memory requirements and worse performance. We don't get any benefit 
> from the padding since the majority of tuples are not actually aligned in 
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H 
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs 
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version:  impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +--+---+-++++
> | Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
> Delta(GeoMean) |
> +--+---+-++++
> | TPCH(10) | parquet / none / none | 2.69| -4.78% | 2.09   | 
> -3.11% |
> +--+---+-++++
> +--+--+---++-++++-+---+
> | Workload | Query| File Format   | Avg(s) | Base Avg(s) | 
> Delta(Avg) | StdDev(%)  | Base StdDev(%) | Num Clients | Iters |
> +--+--+---++-++++-+---+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94   | 0.93|   
> +0.75%   |   3.37%|   2.84%| 1   | 30|
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32   | 3.32|   
> +0.13%   |   1.74%|   2.09%| 1   | 30|
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99   | 0.99|   
> -0.02%   |   3.74%|   3.16%| 1   | 30|
> | TPCH(10) | TPCH-Q5  | parquet / none / none | 2.30   | 2.33|   
> -0.96%   |   2.15%|   2.45%| 1   | 30|
> | TPCH(10) | TPCH-Q2  | parquet / none / none | 1.55   | 1.57|   
> -1.45%   |   1.65%|   1.49%| 1   | 30|
> | TPCH(10) | TPCH-Q8  | parquet / none / none | 2.89   | 2.93|   
> -1.51%   |   2.69%|   1.34%| 1   | 30|
> | TPCH(10) | TPCH-Q9  | parquet / none / none | 5.96   | 6.06|   
> -1.63%   |   1.34%|   1.82%| 1   | 30|
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58   | 1.61|   
> -1.85%   |   2.28%|   2.16%| 1   | 30|
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18   | 1.21|   
> -2.11%   |   3.68%|   4.72%| 1   | 30|
> | TPCH(10) | TPCH-Q3  | parquet / none / none | 2.13   | 2.18|   
> -2.31%   |   2.09%|   1.92%| 1   | 30|
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86   | 1.90|   
> -2.52%   |   2.06%|   2.22%| 1   | 30|
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85   | 1.90|   
> -2.86%   |   10.00%   |   8.02%| 1   | 30|
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58   | 2.66|   
> -2.93%   |   1.68%|   6.49%| 1   | 30|
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37   | 1.42|   
> -3.22%   |   3.35%|   6.24%| 1   | 30|
> | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99   | 5.17|   
> -3.38%   |   1.75%|   3.82%| 1   | 30|
> | TPCH(10) | TPCH-Q6  | parquet / none / none | 0.66   | 0.69|   
> -3.73%   |   5.04%|   4.12%| 1   | 30|
> | TPCH(10) | TPCH-Q4  | parquet / none / none | 1.07   | 1.12|   
> -3.97%   |   1.79%|   2.85%| 1   

[jira] [Resolved] (IMPALA-7873) TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit

2018-11-20 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7873.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory 
> limit
> -
>
> Key: IMPALA-7873
> URL: https://issues.apache.org/jira/browse/IMPALA-7873
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> This is failing to hit a memory exceeded on the last two core runs:
> {noformat}
> query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling
> self.run_test_case('QueryTest/exchange-mem-scaling', vector)
> common/impala_test_suite.py:482: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Memory limit exceeded{noformat}
> It might be that the limit needs to be adjusted. 
> There were two changes since the last successful run: IMPALA-7367 
> (2a4835cfba7597362cc1e72e21315868c5c75d0a) and IMPALA-5031 
> (53ce6bb571cd9ae07ba5255197d35aa852a6f97c)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-7878) Bad SQL generated by compute incremental stats

2018-11-21 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-7878:


 Summary: Bad SQL generated by compute incremental stats 
 Key: IMPALA-7878
 URL: https://issues.apache.org/jira/browse/IMPALA-7878
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Pooja Nilangekar
Assignee: Paul Rogers


Computing incremental stats on partitions generates bad sql for instance: 
For a table foo partitioned by column bar, the compute stats statement:

{code:java}
compute incremental stats foo partition (bar = 1); 
{code}

would generate the following query: 

{code:java}
SELECT COUNT(*), month FROM foo WHERE (bar=1) GROUP BY bar;
{code}

If this were to be rewritten as follows, it would produce fewer fragments and 
hence also reduce query memory by avoiding a hash aggregation node. 

{code:java}
SELECT COUNT(*), 1 FROM foo WHERE bar=1; 
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-7882) ASAN failure in llvm-codegen-test

2018-11-27 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7882.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> ASAN failure in llvm-codegen-test
> -
>
> Key: IMPALA-7882
> URL: https://issues.apache.org/jira/browse/IMPALA-7882
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> The llvm-codegen-test backend test is failing under ASAN with the following 
> output:
> {noformat}
> 18:12:34 [ RUN  ] LlvmCodeGenTest.StringValue
> 18:12:34 =
> 18:12:34 ==124917==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7ffc0f39e86c at pc 0x017ea479 bp 0x7ffc0f39e550 sp 0x7ffc0f39e548
> 18:12:34 READ of size 4 at 0x7ffc0f39e86c thread T0
> 18:12:34 #0 0x17ea478 in testing::AssertionResult 
> testing::internal::CmpHelperEQ(char const*, char const*, int 
> const&, int const&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1316:19
> 18:12:34 #1 0x17d3a8d in 
> _ZN7testing8internal8EqHelperILb1EE7CompareIiiEENS_15AssertionResultEPKcS6_RKT_RKT0_PNS0_8EnableIfIXntsr10is_pointerISA_EE5valueEE4typeE
>  
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/gtest-1.6.0/include/gtest/gtest.h:1392:12
> 18:12:34 #2 0x17c656b in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:379:3
> 18:12:34 #3 0x4d55af2 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d55af2)
> 18:12:34 #4 0x4d4c669 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c669)
> 18:12:34 #5 0x4d4c7b7 in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c7b7)
> 18:12:34 #6 0x4d4c894 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4c894)
> 18:12:34 #7 0x4d4db17 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4db17)
> 18:12:34 #8 0x4d4ddf2 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x4d4ddf2)
> 18:12:34 #9 0x17ce16e in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:569:10
> 18:12:34 #10 0x7fc221bd5c04 in __libc_start_main 
> (/lib64/libc.so.6+0x21c04)
> 18:12:34 #11 0x16b63c6 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/codegen/llvm-codegen-test+0x16b63c6)
> 18:12:34 
> 18:12:34 Address 0x7ffc0f39e86c is located in stack of thread T0 at offset 
> 492 in frame
> 18:12:34 #0 0x17c567f in 
> impala::LlvmCodeGenTest_StringValue_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen-test.cc:343
> 18:12:34 
> 18:12:34   This frame has 57 object(s):
> 18:12:34 [32, 40) 'codegen' (line 344)
> 18:12:34 [64, 72) 'ref.tmp' (line 345)
> 18:12:34 [96, 104) 'ref.tmp2' (line 345)
> 18:12:34 [128, 129) 'ref.tmp3' (line 345)
> 18:12:34 [144, 160) 'gtest_ar_' (line 345)
> 18:12:34 [176, 184) 'temp.lvalue'
> 18:12:34 [208, 216) 'ref.tmp6' (line 345)
> 18:12:34 [240, 248) 'temp.lvalue8'
> 18:12:34 [272, 288) 'ref.tmp9' (line 345)
> 18:12:34 [304, 320) 'gtest_ar_12' (line 346)
> 18:12:34 [336, 344) 'ref.tmp15' (line 346)
> 18:12:34 [368, 376) 'temp.lvalue16'
> 18:12:34 [400, 416) 'ref.tmp17' (line 346)
> 18:12:34 [432, 440) 'str' (line 348)
> 18:12:34 [464, 465) 'ref.tmp19' (line 348)
> 18:12:34 [480, 492) 'str_val' (line 350) <== Memory access at offset 492 
> overflows this variable
> 18:12:34 [512, 528) 'gtest_ar_24' (line 357)
> 18:12:34 [544, 552) 'ref.tmp27' (line 357)
> 18:12:34 [576, 584) 'temp.lvalue28'
> 18:12:34 [608, 624) 'ref.tmp29' (line 357)
> 18:12:34 [640, 648) 'jitted_fn' (line 360)
> 18:12:34 [672, 680) 'ref.tmp33' (line 362)
> 18:12:34 [704, 720) 'gtest_ar_35' 

[jira] [Created] (IMPALA-8059) TestWebPage::test_backend_states is flaky

2019-01-08 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8059:


 Summary: TestWebPage::test_backend_states is flaky
 Key: IMPALA-8059
 URL: https://issues.apache.org/jira/browse/IMPALA-8059
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar


test_backend_states is flaky. The query reaches the _"FINISHED"_ state before 
it's state is verified by the python test. Here are the relevant log: 

{code:java}
07:33:45 - Captured stderr call 
-
07:33:45 -- executing async: localhost:21000
07:33:45 select sleep(1) from functional.alltypes limit 1;
07:33:45 
07:33:45 -- 2019-01-08 07:31:57,952 INFO MainThread: Started query 
7f46f15ed4d6d0f6:4d58cdbc
07:33:45 -- getting state for operation: 

{code}


This bug was introduced by IMPALA-7625.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-8064) test_min_max_filters is flaky

2019-01-10 Thread Pooja Nilangekar (JIRA)
Pooja Nilangekar created IMPALA-8064:


 Summary: test_min_max_filters is flaky 
 Key: IMPALA-8064
 URL: https://issues.apache.org/jira/browse/IMPALA-8064
 Project: IMPALA
  Issue Type: Bug
Reporter: Pooja Nilangekar
Assignee: Janaki Lahorani


The following configuration of the test_min_max_filters:
{code:java}
query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | table_format: kudu/none]{code}
It produces a higher aggregation of sum over the proberows than expected:
{code:java}
query_test/test_runtime_filters.py:113: in test_min_max_filters 
self.run_test_case('QueryTest/min_max_filters', vector) 
common/impala_test_suite.py:518: in run_test_case 
update_section=pytest.config.option.update_results) 
common/test_result_verifier.py:612: in verify_runtime_profile % (function, 
field, expected_value, actual_value, actual)) 
E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
results. 
E   EXPECTED VALUE: E   619 
EACTUAL VALUE: E   652
{code}

This test was introduced in the patch for IMPALA-6533



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8007) test_slow_subscriber is flaky

2019-01-12 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8007.
--
Resolution: Fixed

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-01-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6932) Simple LIMIT 1 query can be really slow on many-filed sequence datasets

2019-01-31 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-6932.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Simple LIMIT 1 query can be really slow on many-filed sequence datasets
> ---
>
> Key: IMPALA-6932
> URL: https://issues.apache.org/jira/browse/IMPALA-6932
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Philip Zeyliger
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> I recently ran across really slow behavior with the trivial {{SELECT * FROM 
> table LIMIT 1}} query. The table used Avro as a file format and had about 
> 45,000 files across about 250 partitions. An optimization kicked in to set 
> NUM_NODES to 1.
> The query ran for about an hour, and the profile indicated that it was 
> opening files:
>   - TotalRawHdfsOpenFileTime(*): 1.0h (3622833666032)
> I took a single minidump while this query was running, and I suspect the 
> query was here:
> {code:java}
> 1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) 
> [scanner-context.cc : 115 + 0x13]
> 2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned 
> char**, bool, long*) [scanner-context.cc : 241 + 0x5]
> 3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h 
> : 54 + 0x1f]
> 4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) 
> [base-sequence-scanner.cc : 157 + 0x13]
> 5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
> 6 
> impalad!impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, 
> impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
> 7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 
> 0x1c]
> 8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) [function_template.hpp : 767 + 0x7]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8151) HiveUdfCall assumes StringValue is 16 bytes

2019-02-06 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8151.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> HiveUdfCall assumes StringValue is 16 bytes
> ---
>
> Key: IMPALA-8151
> URL: https://issues.apache.org/jira/browse/IMPALA-8151
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 3.2.0
>
>
> HiveUdfCall has the sizes of internal types hardcoded as magic numbers:
> {code}
>   switch (GetChild(i)->type().type) {
> case TYPE_BOOLEAN:
> case TYPE_TINYINT:
>   // Using explicit sizes helps the compiler unroll memcpy
>   memcpy(input_ptr, v, 1);
>   break;
> case TYPE_SMALLINT:
>   memcpy(input_ptr, v, 2);
>   break;
> case TYPE_INT:
> case TYPE_FLOAT:
>   memcpy(input_ptr, v, 4);
>   break;
> case TYPE_BIGINT:
> case TYPE_DOUBLE:
>   memcpy(input_ptr, v, 8);
>   break;
> case TYPE_TIMESTAMP:
> case TYPE_STRING:
> case TYPE_VARCHAR:
>   memcpy(input_ptr, v, 16);
>   break;
> default:
>   DCHECK(false) << "NYI";
>   }
> {code}
> STRING and VARCHAR were only 16 bytes because of padding. This padding is 
> removed by IMPALA-7367, so this will read past the end of the actual value. 
> This could in theory lead to a crash.
> We need to change the value, but we should probably also switch to 
> sizeof(StringValue) so that it doesn't get broken by similar changes in 
> future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8096) Limit on #rows returned from query

2019-02-07 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8096.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Limit on #rows returned from query
> --
>
> Key: IMPALA-8096
> URL: https://issues.apache.org/jira/browse/IMPALA-8096
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> Sometimes users accidentally run queries that return a large number of rows, 
> e.g.
> {code}
> SELECT * FROM table
> {code}
> When they really only need to look at a subset of the rows. It would be 
> useful to have a guardrail to fail queries the return more rows than a 
> particular limit. Maybe it would make sense to integrate with IMPALA-4268 so 
> that the query is failed when the buffer fills up, but it may also be useful 
> to have an easier-to-understand option based on #rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-8201) How to generate erver-key.pem and server-cert.pem when executing the impala ssl test cases?

2019-02-14 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar closed IMPALA-8201.

Resolution: Information Provided

> How to generate erver-key.pem and server-cert.pem when executing the impala 
> ssl test cases?
> ---
>
> Key: IMPALA-8201
> URL: https://issues.apache.org/jira/browse/IMPALA-8201
> Project: IMPALA
>  Issue Type: Test
>  Components: Security
>Affects Versions: Impala 2.10.0
>Reporter: Donghui Xu
>Priority: Minor
>
> When executing the test case in webserver-test.cc, it was found to use 
> be/src/testutil/server-cert.pem and be/src/testutil/server-key.pem.
> How is the above file generated?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8064) test_min_max_filters is flaky

2019-02-25 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8064.
--
Resolution: Fixed

> test_min_max_filters is flaky 
> --
>
> Key: IMPALA-8064
> URL: https://issues.apache.org/jira/browse/IMPALA-8064
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Pooja Nilangekar
>Assignee: Pooja Nilangekar
>Priority: Blocker
>  Labels: broken-build, flaky-test
> Fix For: Impala 3.2.0
>
> Attachments: profile.txt
>
>
> The following configuration of the test_min_max_filters:
> {code:java}
> query_test.test_runtime_filters.TestMinMaxFilters.test_min_max_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]{code}
> It produces a higher aggregation of sum over the proberows than expected:
> {code:java}
> query_test/test_runtime_filters.py:113: in test_min_max_filters 
> self.run_test_case('QueryTest/min_max_filters', vector) 
> common/impala_test_suite.py:518: in run_test_case 
> update_section=pytest.config.option.update_results) 
> common/test_result_verifier.py:612: in verify_runtime_profile % 
> (function, field, expected_value, actual_value, actual)) 
> E   AssertionError: Aggregation of SUM over ProbeRows did not match expected 
> results. 
> E   EXPECTED VALUE: E   619 
> EACTUAL VALUE: E   652
> {code}
> This test was introduced in the patch for IMPALA-6533. The failure occurred 
> during an ASAN build. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8245) Add hostname to timeout error message in HdfsMonitoredOps

2019-02-26 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8245.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Add hostname to timeout error message in HdfsMonitoredOps
> -
>
> Key: IMPALA-8245
> URL: https://issues.apache.org/jira/browse/IMPALA-8245
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Pooja Nilangekar
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> If a DiskIo operation times out, it generates a 
> TErrorCode::THREAD_POOL_TASK_TIMED_OUT or 
> TErrorCode::THREAD_POOL_SUBMIT_FAILED error codes. These call 
> GetDescription() to get DiskIo related context. That information should 
> include the hostname where the error occurred to allow tracking down a 
> problematic host that is seeing DiskIo timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5397) Set "End Time" earlier rather than on unregistration.

2019-03-01 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-5397.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Set "End Time" earlier rather than on unregistration.
> -
>
> Key: IMPALA-5397
> URL: https://issues.apache.org/jira/browse/IMPALA-5397
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Mostafa Mokhtar
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: admission-control, query-lifecycle
> Fix For: Impala 3.2.0
>
>
> When queries are executed from Hue and hit the idle query timeout then the 
> query duration keeps going up even though the query was cancelled and it is 
> not actually doing any more work. The end time is only set when the query is 
> actually unregistered.
> Queries below finished in 1s640ms while the reported time is much longer. 
> |User||Default Db||Statement||Query Type||Start Time||Waiting 
> Time||Duration||Scan Progress||State||Last Event||# rows fetched||Resource 
> Pool||Details||Action|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 09:38:20.472804000|4m27s|4m32s|261 / 261 ( 100%)|FINISHED|First row 
> fetched|1|root.default|Details|Close|
> |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select 
> count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 
> 08:38:52.780237000|2017-05-31 09:38:20.289582000|59m27s|261 / 261 ( 
> 100%)|FINISHED|1|root.default|Details|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-8189) TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp' fails

2019-03-04 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8189.
--
Resolution: Fixed

> TestParquet.test_resolution_by_name fails on S3 because 'hadoop fs -cp'  fails
> --
>
> Key: IMPALA-8189
> URL: https://issues.apache.org/jira/browse/IMPALA-8189
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Andrew Sherman
>Assignee: Pooja Nilangekar
>Priority: Critical
>  Labels: broken-build, flaky-test
>
> In parquet-resolution-by-name.test a parquet file is copied. 
> {quote}
>  SHELL
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> hadoop fs -cp 
> $FILESYSTEM_PREFIX/test-warehouse/complextypestbl_parquet/nonnullable.parq \
> $FILESYSTEM_PREFIX/test-warehouse/$DATABASE.db/nested_resolution_by_name_test/
> {quote}
> The first copy succeeds, but the second fails. In the DEBUG output (below) 
> you can see the copy writing data to an intermediate file 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  and then after the stream is closed, the copy cannot find the file.
> {quote}
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  7
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  8
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  3
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_create += 1  ->  1
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  6
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  9
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  10
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  4
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: Initialized 
> S3ABlockOutputStream for 
> test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>  output to FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3ablock-0001-1315190405959387081.tmp, 
> state=Writing, dataSize=0, limit=104857600}
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: op_get_file_status += 1  -> 
>  7
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Getting path status for 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
>   
> (test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_)
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  11
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_metadata_requests += 
> 1  ->  12
> 19/02/12 05:33:13 DEBUG s3a.S3AStorageStatistics: object_list_requests += 1  
> ->  5
> 19/02/12 05:33:13 DEBUG s3a.S3AFileSystem: Not Found: 
> s3a://impala-test-uswest2-1/test-warehouse/test_resolution_by_name_daec05d5.db/nested_resolution_by_name_test/nonnullable.parq._COPYING_
> 19/02/12 05:33:13 DEBUG s3a.S3AInputStream: 
> reopen(s3a://impala-test-uswest2-1/test-warehouse/complextypestbl_parquet/nonnullable.parq)
>  for read from new offset range[0-3186], length=4096, streamPosition=0, 
> nextReadPosition=0, policy=normal
> 19/02/12 05:33:13 DEBUG s3a.S3ABlockOutputStream: 
> S3ABlockOutputStream{WriteOperationHelper {bucket=impala-test-uswest2-1}, 
> blockSize=104857600, activeBlock=FileBlock{index=1, 
> destFile=/tmp/hadoop-jenkins/s3a/s3abl