[jira] [Resolved] (IMPALA-8932) impala shell shouldn't retry with kerberos when connecting over http

2019-09-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8932.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> impala shell shouldn't retry with kerberos when connecting over http
> 
>
> Key: IMPALA-8932
> URL: https://issues.apache.org/jira/browse/IMPALA-8932
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> {noformat}
> Error connecting: EOFError, 
> Kerberos ticket found in the credentials cache, retrying the connection with 
> a secure transport.
> Warning: --connect_timeout_ms is currently ignored with HTTP transport.
> Kerberos not supported with HTTP endpoints.
> Error connecting: NotImplementedError, 
> {noformat}
> The NotImplementedError is confusing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8904) Daemons fails fast when statestore has not started up

2019-09-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8904.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Daemons fails fast when statestore has not started up
> -
>
> Key: IMPALA-8904
> URL: https://issues.apache.org/jira/browse/IMPALA-8904
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> If you start the statestored and the other services at the same time, there 
> is a race between the statestore starting and the other services trying to 
> register with it. If the other services "win" the race, they abort startup 
> because they can't register with the statestore.
> The log looks like.
> {noformat}
> │ I0828 00:19:10.46 1 statestore-subscriber.cc:219] Starting 
> statestore subscriber 
>   
>  ││ I0828 
> 00:19:10.461310 1 thrift-server.cc:451] ThriftServer 
> 'StatestoreSubscriber' started on port: 23000 
>   
>  │
> │ I0828 00:19:10.461320 1 statestore-subscriber.cc:247] Registering with 
> statestore
>   
>  ││ I0828 00:19:10.461309   
> 299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2 
>   
>   
>   │
> │ I0828 00:19:10.462744 1 statestore-subscriber.cc:253] statestore 
> registration unsuccessful: RPC Error: Client for statestored:24000 hit an 
> unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: 
> N6impala27TRegisterSubscriberRe ││ sponseE, send: done
>   
>   
>   
>│
> │ E0828 00:19:10.462818 1 impalad-main.cc:90] Impalad services did not 
> start correctly, exiting.  Error: RPC Error: Client for statestored:24000 hit 
> an unexpected exception: No more data to read., type: 
> N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ 
> ubscriberResponseE, send: done
>   
>   
>   │
> │ Statestore subscriber did not start up. 
>   
> {noformat}
> Most management systems will automatically restart failed processes, so 
> typically the impalads will come back up and find the statestore, but the 
> crash loop is unnecessary.
> I propose that the services should retry for a while before giving up (we 
> still want the services to fail when there genuinely isn't a statestore 
> available).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8915) Re-fix # links on /catalog page.

2019-09-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8915.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Re-fix # links on /catalog page.
> 
>
> Key: IMPALA-8915
> URL: https://issues.apache.org/jira/browse/IMPALA-8915
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Minor
> Fix For: Impala 3.4.0
>
>
> The patch for IMPALA-8879 unintentionally reverted the fix from IMPALA-8901



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-3945) Don't allow creation of text tables with nonsensical delimiter and escape character combinations

2019-09-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3945.
---
Resolution: Won't Fix

I don't think this is likely to be worth the effort.

> Don't allow creation of text tables with nonsensical delimiter and escape 
> character combinations
> 
>
> Key: IMPALA-3945
> URL: https://issues.apache.org/jira/browse/IMPALA-3945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0
> Environment: CentOS 6.7
>Reporter: Yuanhao Luo
>Priority: Minor
>  Labels: compatibility, newbie, usability
>
> There are some corner cases for delimiter. All of them are added in function 
> CreateTableStmt.java:analyzeRowFormat().
> Such as:
> # AnalysisException: Field delimiter and line delimiter have same value
> # Warning:  Field delimiter and escape character have same value
> # Warning: Line delimiter and escape character have same value
> I have run a simple test on last two cases and the result shows that it 
> doesn't work as we expected.
> Detail logs as below:
> * Normal case
> {noformat}
> [root@nobida147 workspace]# cat text-comma-backslash-newline.txt 
> one,two,3,4
> one\,one,two,3,4
> one\\,two,3,4
> one\\\,one,two,3,4
> one,two,3,4
> [nobida147:21000] > create table text_comma_backslash_newline(col1 string, 
> col2 string, col3 int, col4 int) row format delimited fields terminated by 
> ',' escaped by '\\' lines terminated by '\n';
> Query: create table text_comma_backslash_newline(col1 string, col2 string, 
> col3 int, col4 int) row format delimited fields terminated by ',' escaped by 
> '\\' lines terminated by '\n'
> Query submitted at: 2016-07-25 15:40:25 (Coordinator: http://0.0.0.0:25000)
> Query progress can be monitored at: 
> http://0.0.0.0:25000/query_plan?query_id=cc4f0a970ac242ac:a7bde0a84aa49c8c
> ++
> ||
> ++
> ++
> Fetched 0 row(s) in 0.14s
> [nobida147:21000] > load data inpath 
> '/user/root/text-comma-backslash-newline.txt' into table 
> text_comma_backslash_newline;
> Query: load data inpath '/user/root/text-comma-backslash-newline.txt' into 
> table text_comma_backslash_newline
> Query submitted at: 2016-07-25 15:40:38 (Coordinator: http://0.0.0.0:25000)
> Query progress can be monitored at: 
> http://0.0.0.0:25000/query_plan?query_id=1f4c908335a41010:1006f8153e8068ab
> +--+
> | summary  |
> +--+
> | Loaded 1 file(s). Total files in destination location: 1 |
> +--+
> Fetched 1 row(s) in 5.05s
> [nobida147:21000] > select * from text_comma_backslash_newline;
> Query: select * from text_comma_backslash_newline
> Query submitted at: 2016-07-25 15:40:49 (Coordinator: http://0.0.0.0:25000)
> Query progress can be monitored at: 
> http://0.0.0.0:25000/query_plan?query_id=9e473f7fe5822ca4:b663f16106e0f87
> +--+--+--+--+
> | col1 | col2 | col3 | col4 |
> +--+--+--+--+
> | one  | two  | 3| 4|
> | one,one  | two  | 3| 4|
> | one\ | two  | 3| 4|
> | one\,one | two  | 3| 4|
> | one\\| two  | 3| 4|
> +--+--+--+--+
> Fetched 5 row(s) in 0.44s
> {noformat}
> As above log shows, delimiter text parser works as expected.
> * Corner case: Field delimiter and escape character have same value
> {noformat}
> [root@nobida147 workspace]# cat text-at-at-newline.txt 
> one@two@3@4
> one@,one@two@3@4
> one@\@two@3@4
> one@\@,one@two@3@4
> one@\@\@two@3@4
> [nobida147:21000] > create table text_at_at_newline(col1 string, col2 string, 
> col3 int, col4 int) row format delimited fields terminated by '@' escaped by 
> '@' lines terminated by '\n';
> Query: create table text_at_at_newline(col1 string, col2 string, col3 int, 
> col4 int) row format delimited fields terminated by '@' escaped by '@' lines 
> terminated by '\n'
> Query submitted at: 2016-07-25 16:59:23 (Coordinator: http://0.0.0.0:25000)
> Query progress can be monitored at: 
> http://0.0.0.0:25000/query_plan?query_id=9d4933d6e0c32dcd:92f6ade14fb545ba
> ++
> ||
> ++
> ++
> WARNINGS: Escape character is the first byte of field delimiter: byte @. 
> Escape character will be ignored
> Fetched 0 row(s) in 0.12s
> [nobida147:21000] > load data inpath '/user/root/text-at-at-newline.txt' into 
> table text_at_at_newline;
> Query: load data inpath '/user/root/text-at-at-newline.txt' into table 
> text_at_at_newline
> Query submitted at: 2016-07-25 16:59:33 (Coordinator: http://0.0.0.0:25000)
> Query progress can be monitored at: 
> http://0.0.0.0:25000/query_plan?q

[jira] [Resolved] (IMPALA-7912) Inconsistent and intermittent behavior of queries

2019-09-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7912.
---
Resolution: Not A Bug

> Inconsistent and intermittent behavior of queries
> -
>
> Key: IMPALA-7912
> URL: https://issues.apache.org/jira/browse/IMPALA-7912
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.1.0
>Reporter: Yongjun Zhang
>Priority: Major
>
> When investigating IMPALA-5474, which reported the different log messages of 
> two queries, I found inconsistent and intermittent behavior of the queries. 
> I added a line to log state change in client-request-state.cc:
> {code:java}
> void ClientRequestState::UpdateOperationState(
>   TOperationState::type operation_state) { 
> operation_state_ = operation_state; 
> summary_profile_->AddInfoString("Query State",
> PrintThriftEnum(BeeswaxQueryState())); 
> VLOG_QUERY << "YJDebug UpdateOperationState: " <<   
> PrintThriftEnum(BeeswaxQueryState()) << 
> endl;
> }{code}
> and a line to log value got by ImpalaServer::get_state
> {code:java}
> beeswax::QueryState::type ImpalaServer::get_state(const QueryHandle& handle) {
> ..
> // Take the lock to ensure that if the client sees a query_state == 
> EXCEPTION, it is
> // guaranteed to see the error query_status.
> lock_guard l(*request_state->lock());
> beeswax::QueryState::type query_state = request_state->BeeswaxQueryState();
> DCHECK_EQ(query_state == beeswax::QueryState::EXCEPTION,
> !request_state->query_status().ok());
> VLOG_QUERY << "YJDebug ImpalaServer::get_state: " << query_state << endl;
> return query_state;
> }{code}
> * Query1. select id from bad_column_metadata s;
> A: most of the time:
> {code:java}
> I1129 12:09:39.639384 17555 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: COMPILED
> I1129 12:09:39.639884 17555 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 2
> I1129 12:09:39.641791 17585 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: RUNNING
> I1129 12:09:39.668946 17586 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: FINISHED
> I1129 12:09:39.740308 17555 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 4
> I1129 12:09:39.741384 17555 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: EXCEPTION{code}
> We can see that the query_state transitioned from COMILED to RUNNING to 
> FINISHED then to EXCEPTION.
> B: sometimes, I saw it in the beginning after started impala shell (possibly 
> right after restarting impala cluster):(
> {code}
> I1129 12:15:25.937026 18234 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: COMPILED
> I1129 12:15:25.937563 18234 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 2
> I1129 12:15:25.952119 18264 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: RUNNING
> I1129 12:15:26.037926 18234 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 3
> I1129 12:15:26.079480 18265 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: EXCEPTION
> I1129 12:15:26.138288 18234 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 5
> yzhang@yzhang-pa:~/apache/Impala/logs/cluster$
> {code}
> We can see that the query state transitioned from COMPILED to RUNNING then to 
> EXCEPTION.
> But most of the time, it got into FINISHED state before getting into 
> EXCEPTION state, that's why ERROR was reported. 
> * Query2. select id, cnt from bad_column_metadata t, (select 1 cnt) u; 
> {code}
> 1129 12:11:42.715994 17679 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: COMPILED
> I1129 12:11:42.716413 17679 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 2
> I1129 12:11:42.717469 17721 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: RUNNING
> I1129 12:11:42.745760 17722 client-request-state.cc:1232] YJDebug 
> UpdateOperationState: EXCEPTION
> I1129 12:11:42.816792 17679 impala-beeswax-server.cc:265] YJDebug 
> ImpalaServer::get_state: 5
> {code}
> We can see that the query state transitioned from COMPILED to RUNNING then to 
> EXCEPTION. It persistently shows this state transition, and reports WARNING 
> in the end.
> There are two issues here:
>  # Inconsistent behavior of query1 comparing with query2. Why it reached 
> FINISHED before getting into EXCEPTION?
>  # Intermittent behavior of query1: it sometimes get into FINISHED state, 
> sometimes get into EXCEPTION state.
> The root cause of these two issues might be the same. Creating this Jira to 
> log the issues.
> See
> https://issues.apache.org/jira/browse/IMPALA-5474?focusedCommentId=16703872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-

[jira] [Created] (IMPALA-8933) Ranger column deny policies not respected under certain circumstances

2019-09-09 Thread Kurt Deschler (Jira)
Kurt Deschler created IMPALA-8933:
-

 Summary: Ranger column deny policies not respected under certain 
circumstances
 Key: IMPALA-8933
 URL: https://issues.apache.org/jira/browse/IMPALA-8933
 Project: IMPALA
  Issue Type: Bug
  Components: Security
Affects Versions: Impala 3.4.0
Reporter: Kurt Deschler
Assignee: Kurt Deschler






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8932) impala shell shouldn't retry with kerberos when connecting over http

2019-09-09 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-8932:
-

 Summary: impala shell shouldn't retry with kerberos when 
connecting over http
 Key: IMPALA-8932
 URL: https://issues.apache.org/jira/browse/IMPALA-8932
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


{noformat}
Error connecting: EOFError, 
Kerberos ticket found in the credentials cache, retrying the connection with a 
secure transport.
Warning: --connect_timeout_ms is currently ignored with HTTP transport.
Kerberos not supported with HTTP endpoints.
Error connecting: NotImplementedError, 
{noformat}

The NotImplementedError is confusing.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8931) Fe not generating lineages when event hooks are configured

2019-09-09 Thread bharath v (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-8931.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Fe not generating lineages when event hooks are configured
> --
>
> Key: IMPALA-8931
> URL: https://issues.apache.org/jira/browse/IMPALA-8931
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Frontend generates the lineage only when {{--lineage_event_log_dir}} is 
> configured. That is a legacy param and the users are expected to use event 
> hooks to consume lineages. It should also generate lineage events when hooks 
> are configured.
> {noformat}
> public boolean getComputeLineage() {
> return !Strings.isNullOrEmpty(backendCfg_.lineage_event_log_dir);  
> } {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue

2019-09-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8779.
--
Resolution: Won't Fix

Marking this as 'Won't Fix' for now. There does not seem to be a strong need to 
add this in right now, given that there is no other use case for a generic 
{{RowBatch}} queue. The one used in the scan nodes has some unique requirements 
and re-factoring it to use a generic interface does not seem worth it. We can 
re-visit this later if we find a stronger use case for it.

> Add RowBatchQueue interface with an implementation backed by a std::queue
> -
>
> Key: IMPALA-8779
> URL: https://issues.apache.org/jira/browse/IMPALA-8779
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Add a {{RowBatchQueue}} interface with an implementation backed by a 
> {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}-es 
> will help with the implementation of {{BufferedPlanRootSink}}. Rather than 
> tie the {{BufferedPlanRootSink}} to a specific method of queuing row batches, 
> we can use an interface. In future patches, a {{RowBatchQueue}} backed by a 
> {{BufferedTupleStream}} can easily be switched out in 
> {{BufferedPlanRootSink}}.
> We should consider re-factoring the existing {{RowBatchQueue}} to use the new 
> interface. The KRPC receiver does some buffering of {{RowBatch}}-es as well 
> which might benefit from the new RowBatchQueue interface, and some more KRPC 
> buffering might be added in IMPALA-6692.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

2019-09-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8818.
--
Resolution: Fixed

> Replace deque queue with spillable queue in BufferedPlanRootSink
> 
>
> Key: IMPALA-8818
> URL: https://issues.apache.org/jira/browse/IMPALA-8818
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer all results in memory. This requires setting an accurate 
> value of {{ResourceProfile#memEstimateBytes_}} in 
> {{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
> estimate can be based on the number of estimated rows returned multiplied by 
> the size of the rows returned. The min reservation should account for a read 
> and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)