[jira] [Resolved] (IMPALA-8932) impala shell shouldn't retry with kerberos when connecting over http
[ https://issues.apache.org/jira/browse/IMPALA-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8932. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > impala shell shouldn't retry with kerberos when connecting over http > > > Key: IMPALA-8932 > URL: https://issues.apache.org/jira/browse/IMPALA-8932 > Project: IMPALA > Issue Type: Bug > Components: Clients >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Fix For: Impala 3.4.0 > > > {noformat} > Error connecting: EOFError, > Kerberos ticket found in the credentials cache, retrying the connection with > a secure transport. > Warning: --connect_timeout_ms is currently ignored with HTTP transport. > Kerberos not supported with HTTP endpoints. > Error connecting: NotImplementedError, > {noformat} > The NotImplementedError is confusing. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8904) Daemons fails fast when statestore has not started up
[ https://issues.apache.org/jira/browse/IMPALA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8904. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Daemons fails fast when statestore has not started up > - > > Key: IMPALA-8904 > URL: https://issues.apache.org/jira/browse/IMPALA-8904 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 3.4.0 > > > If you start the statestored and the other services at the same time, there > is a race between the statestore starting and the other services trying to > register with it. If the other services "win" the race, they abort startup > because they can't register with the statestore. > The log looks like. > {noformat} > │ I0828 00:19:10.46 1 statestore-subscriber.cc:219] Starting > statestore subscriber > > ││ I0828 > 00:19:10.461310 1 thrift-server.cc:451] ThriftServer > 'StatestoreSubscriber' started on port: 23000 > > │ > │ I0828 00:19:10.461320 1 statestore-subscriber.cc:247] Registering with > statestore > > ││ I0828 00:19:10.461309 > 299 TAcceptQueueServer.cpp:314] connection_setup_thread_pool_size is set to 2 > > > │ > │ I0828 00:19:10.462744 1 statestore-subscriber.cc:253] statestore > registration unsuccessful: RPC Error: Client for statestored:24000 hit an > unexpected exception: No more data to read., type: > N6apache6thrift9transport19TTransportExceptionE, rpc: > N6impala27TRegisterSubscriberRe ││ sponseE, send: done > > > >│ > │ E0828 00:19:10.462818 1 impalad-main.cc:90] Impalad services did not > start correctly, exiting. Error: RPC Error: Client for statestored:24000 hit > an unexpected exception: No more data to read., type: > N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala27TRegisterS ││ > ubscriberResponseE, send: done > > > │ > │ Statestore subscriber did not start up. > > {noformat} > Most management systems will automatically restart failed processes, so > typically the impalads will come back up and find the statestore, but the > crash loop is unnecessary. > I propose that the services should retry for a while before giving up (we > still want the services to fail when there genuinely isn't a statestore > available). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8915) Re-fix # links on /catalog page.
[ https://issues.apache.org/jira/browse/IMPALA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8915. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Re-fix # links on /catalog page. > > > Key: IMPALA-8915 > URL: https://issues.apache.org/jira/browse/IMPALA-8915 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.4.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Minor > Fix For: Impala 3.4.0 > > > The patch for IMPALA-8879 unintentionally reverted the fix from IMPALA-8901 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-3945) Don't allow creation of text tables with nonsensical delimiter and escape character combinations
[ https://issues.apache.org/jira/browse/IMPALA-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-3945. --- Resolution: Won't Fix I don't think this is likely to be worth the effort. > Don't allow creation of text tables with nonsensical delimiter and escape > character combinations > > > Key: IMPALA-3945 > URL: https://issues.apache.org/jira/browse/IMPALA-3945 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.5.0 > Environment: CentOS 6.7 >Reporter: Yuanhao Luo >Priority: Minor > Labels: compatibility, newbie, usability > > There are some corner cases for delimiter. All of them are added in function > CreateTableStmt.java:analyzeRowFormat(). > Such as: > # AnalysisException: Field delimiter and line delimiter have same value > # Warning: Field delimiter and escape character have same value > # Warning: Line delimiter and escape character have same value > I have run a simple test on last two cases and the result shows that it > doesn't work as we expected. > Detail logs as below: > * Normal case > {noformat} > [root@nobida147 workspace]# cat text-comma-backslash-newline.txt > one,two,3,4 > one\,one,two,3,4 > one\\,two,3,4 > one\\\,one,two,3,4 > one,two,3,4 > [nobida147:21000] > create table text_comma_backslash_newline(col1 string, > col2 string, col3 int, col4 int) row format delimited fields terminated by > ',' escaped by '\\' lines terminated by '\n'; > Query: create table text_comma_backslash_newline(col1 string, col2 string, > col3 int, col4 int) row format delimited fields terminated by ',' escaped by > '\\' lines terminated by '\n' > Query submitted at: 2016-07-25 15:40:25 (Coordinator: http://0.0.0.0:25000) > Query progress can be monitored at: > http://0.0.0.0:25000/query_plan?query_id=cc4f0a970ac242ac:a7bde0a84aa49c8c > ++ > || > ++ > ++ > Fetched 0 row(s) in 0.14s > [nobida147:21000] > load data inpath > '/user/root/text-comma-backslash-newline.txt' into table > text_comma_backslash_newline; > Query: load data inpath '/user/root/text-comma-backslash-newline.txt' into > table text_comma_backslash_newline > Query submitted at: 2016-07-25 15:40:38 (Coordinator: http://0.0.0.0:25000) > Query progress can be monitored at: > http://0.0.0.0:25000/query_plan?query_id=1f4c908335a41010:1006f8153e8068ab > +--+ > | summary | > +--+ > | Loaded 1 file(s). Total files in destination location: 1 | > +--+ > Fetched 1 row(s) in 5.05s > [nobida147:21000] > select * from text_comma_backslash_newline; > Query: select * from text_comma_backslash_newline > Query submitted at: 2016-07-25 15:40:49 (Coordinator: http://0.0.0.0:25000) > Query progress can be monitored at: > http://0.0.0.0:25000/query_plan?query_id=9e473f7fe5822ca4:b663f16106e0f87 > +--+--+--+--+ > | col1 | col2 | col3 | col4 | > +--+--+--+--+ > | one | two | 3| 4| > | one,one | two | 3| 4| > | one\ | two | 3| 4| > | one\,one | two | 3| 4| > | one\\| two | 3| 4| > +--+--+--+--+ > Fetched 5 row(s) in 0.44s > {noformat} > As above log shows, delimiter text parser works as expected. > * Corner case: Field delimiter and escape character have same value > {noformat} > [root@nobida147 workspace]# cat text-at-at-newline.txt > one@two@3@4 > one@,one@two@3@4 > one@\@two@3@4 > one@\@,one@two@3@4 > one@\@\@two@3@4 > [nobida147:21000] > create table text_at_at_newline(col1 string, col2 string, > col3 int, col4 int) row format delimited fields terminated by '@' escaped by > '@' lines terminated by '\n'; > Query: create table text_at_at_newline(col1 string, col2 string, col3 int, > col4 int) row format delimited fields terminated by '@' escaped by '@' lines > terminated by '\n' > Query submitted at: 2016-07-25 16:59:23 (Coordinator: http://0.0.0.0:25000) > Query progress can be monitored at: > http://0.0.0.0:25000/query_plan?query_id=9d4933d6e0c32dcd:92f6ade14fb545ba > ++ > || > ++ > ++ > WARNINGS: Escape character is the first byte of field delimiter: byte @. > Escape character will be ignored > Fetched 0 row(s) in 0.12s > [nobida147:21000] > load data inpath '/user/root/text-at-at-newline.txt' into > table text_at_at_newline; > Query: load data inpath '/user/root/text-at-at-newline.txt' into table > text_at_at_newline > Query submitted at: 2016-07-25 16:59:33 (Coordinator: http://0.0.0.0:25000) > Query progress can be monitored at: > http://0.0.0.0:25000/query_plan?q
[jira] [Resolved] (IMPALA-7912) Inconsistent and intermittent behavior of queries
[ https://issues.apache.org/jira/browse/IMPALA-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7912. --- Resolution: Not A Bug > Inconsistent and intermittent behavior of queries > - > > Key: IMPALA-7912 > URL: https://issues.apache.org/jira/browse/IMPALA-7912 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.1.0 >Reporter: Yongjun Zhang >Priority: Major > > When investigating IMPALA-5474, which reported the different log messages of > two queries, I found inconsistent and intermittent behavior of the queries. > I added a line to log state change in client-request-state.cc: > {code:java} > void ClientRequestState::UpdateOperationState( > TOperationState::type operation_state) { > operation_state_ = operation_state; > summary_profile_->AddInfoString("Query State", > PrintThriftEnum(BeeswaxQueryState())); > VLOG_QUERY << "YJDebug UpdateOperationState: " << > PrintThriftEnum(BeeswaxQueryState()) << > endl; > }{code} > and a line to log value got by ImpalaServer::get_state > {code:java} > beeswax::QueryState::type ImpalaServer::get_state(const QueryHandle& handle) { > .. > // Take the lock to ensure that if the client sees a query_state == > EXCEPTION, it is > // guaranteed to see the error query_status. > lock_guard l(*request_state->lock()); > beeswax::QueryState::type query_state = request_state->BeeswaxQueryState(); > DCHECK_EQ(query_state == beeswax::QueryState::EXCEPTION, > !request_state->query_status().ok()); > VLOG_QUERY << "YJDebug ImpalaServer::get_state: " << query_state << endl; > return query_state; > }{code} > * Query1. select id from bad_column_metadata s; > A: most of the time: > {code:java} > I1129 12:09:39.639384 17555 client-request-state.cc:1232] YJDebug > UpdateOperationState: COMPILED > I1129 12:09:39.639884 17555 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 2 > I1129 12:09:39.641791 17585 client-request-state.cc:1232] YJDebug > UpdateOperationState: RUNNING > I1129 12:09:39.668946 17586 client-request-state.cc:1232] YJDebug > UpdateOperationState: FINISHED > I1129 12:09:39.740308 17555 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 4 > I1129 12:09:39.741384 17555 client-request-state.cc:1232] YJDebug > UpdateOperationState: EXCEPTION{code} > We can see that the query_state transitioned from COMILED to RUNNING to > FINISHED then to EXCEPTION. > B: sometimes, I saw it in the beginning after started impala shell (possibly > right after restarting impala cluster):( > {code} > I1129 12:15:25.937026 18234 client-request-state.cc:1232] YJDebug > UpdateOperationState: COMPILED > I1129 12:15:25.937563 18234 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 2 > I1129 12:15:25.952119 18264 client-request-state.cc:1232] YJDebug > UpdateOperationState: RUNNING > I1129 12:15:26.037926 18234 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 3 > I1129 12:15:26.079480 18265 client-request-state.cc:1232] YJDebug > UpdateOperationState: EXCEPTION > I1129 12:15:26.138288 18234 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 5 > yzhang@yzhang-pa:~/apache/Impala/logs/cluster$ > {code} > We can see that the query state transitioned from COMPILED to RUNNING then to > EXCEPTION. > But most of the time, it got into FINISHED state before getting into > EXCEPTION state, that's why ERROR was reported. > * Query2. select id, cnt from bad_column_metadata t, (select 1 cnt) u; > {code} > 1129 12:11:42.715994 17679 client-request-state.cc:1232] YJDebug > UpdateOperationState: COMPILED > I1129 12:11:42.716413 17679 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 2 > I1129 12:11:42.717469 17721 client-request-state.cc:1232] YJDebug > UpdateOperationState: RUNNING > I1129 12:11:42.745760 17722 client-request-state.cc:1232] YJDebug > UpdateOperationState: EXCEPTION > I1129 12:11:42.816792 17679 impala-beeswax-server.cc:265] YJDebug > ImpalaServer::get_state: 5 > {code} > We can see that the query state transitioned from COMPILED to RUNNING then to > EXCEPTION. It persistently shows this state transition, and reports WARNING > in the end. > There are two issues here: > # Inconsistent behavior of query1 comparing with query2. Why it reached > FINISHED before getting into EXCEPTION? > # Intermittent behavior of query1: it sometimes get into FINISHED state, > sometimes get into EXCEPTION state. > The root cause of these two issues might be the same. Creating this Jira to > log the issues. > See > https://issues.apache.org/jira/browse/IMPALA-5474?focusedCommentId=16703872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-
[jira] [Created] (IMPALA-8933) Ranger column deny policies not respected under certain circumstances
Kurt Deschler created IMPALA-8933: - Summary: Ranger column deny policies not respected under certain circumstances Key: IMPALA-8933 URL: https://issues.apache.org/jira/browse/IMPALA-8933 Project: IMPALA Issue Type: Bug Components: Security Affects Versions: Impala 3.4.0 Reporter: Kurt Deschler Assignee: Kurt Deschler -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8932) impala shell shouldn't retry with kerberos when connecting over http
Tim Armstrong created IMPALA-8932: - Summary: impala shell shouldn't retry with kerberos when connecting over http Key: IMPALA-8932 URL: https://issues.apache.org/jira/browse/IMPALA-8932 Project: IMPALA Issue Type: Bug Components: Clients Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Tim Armstrong {noformat} Error connecting: EOFError, Kerberos ticket found in the credentials cache, retrying the connection with a secure transport. Warning: --connect_timeout_ms is currently ignored with HTTP transport. Kerberos not supported with HTTP endpoints. Error connecting: NotImplementedError, {noformat} The NotImplementedError is confusing. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8931) Fe not generating lineages when event hooks are configured
[ https://issues.apache.org/jira/browse/IMPALA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v resolved IMPALA-8931. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Fe not generating lineages when event hooks are configured > -- > > Key: IMPALA-8931 > URL: https://issues.apache.org/jira/browse/IMPALA-8931 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.2.0 >Reporter: bharath v >Assignee: bharath v >Priority: Major > Fix For: Impala 3.4.0 > > > Frontend generates the lineage only when {{--lineage_event_log_dir}} is > configured. That is a legacy param and the users are expected to use event > hooks to consume lineages. It should also generate lineage events when hooks > are configured. > {noformat} > public boolean getComputeLineage() { > return !Strings.isNullOrEmpty(backendCfg_.lineage_event_log_dir); > } {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue
[ https://issues.apache.org/jira/browse/IMPALA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8779. -- Resolution: Won't Fix Marking this as 'Won't Fix' for now. There does not seem to be a strong need to add this in right now, given that there is no other use case for a generic {{RowBatch}} queue. The one used in the scan nodes has some unique requirements and re-factoring it to use a generic interface does not seem worth it. We can re-visit this later if we find a stronger use case for it. > Add RowBatchQueue interface with an implementation backed by a std::queue > - > > Key: IMPALA-8779 > URL: https://issues.apache.org/jira/browse/IMPALA-8779 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Add a {{RowBatchQueue}} interface with an implementation backed by a > {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}-es > will help with the implementation of {{BufferedPlanRootSink}}. Rather than > tie the {{BufferedPlanRootSink}} to a specific method of queuing row batches, > we can use an interface. In future patches, a {{RowBatchQueue}} backed by a > {{BufferedTupleStream}} can easily be switched out in > {{BufferedPlanRootSink}}. > We should consider re-factoring the existing {{RowBatchQueue}} to use the new > interface. The KRPC receiver does some buffering of {{RowBatch}}-es as well > which might benefit from the new RowBatchQueue interface, and some more KRPC > buffering might be added in IMPALA-6692. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8818. -- Resolution: Fixed > Replace deque queue with spillable queue in BufferedPlanRootSink > > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not sufficient to add > a new page to the stream large enough to fit 'row' and the stream could not > increase the reservation to get enough unused reservation"), it should unpin > the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if > the row still could not be added, then an error must have occurred, perhaps > an IO error, in which case return the error and fail the query). > *Constraining Resources*: > When result spooling is disabled, a user can run a {{select * from > [massive-fact-table]}} and scroll through the results without affecting the > health of the Impala cluster (assuming they close they query promptly). > Impala will stream the results one batch at a time to the user. > With result spooling, a naive implementation might try and buffer the enter > fact table, and end up spilling all the contents to disk, which can > potentially take up a large amount of space. So there needs to be > restrictions on the memory and disk space used by the {{BufferedTupleStream}} > in order to ensure a scan of a massive table does not consume all the memory > or disk space of the Impala coordinator. > This problem can be solved by placing a max size on the amount of unpinned > memory (perhaps through a new config option > {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). > The max amount of pinned memory should already be constrained by the > reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the > number of rows returned by a query, and so it should limit the number of rows > buffered by the BTS as well (although it is set to 0 by default). > SCRATCH_LIMIT already limits the amount of disk space used for spilling > (although it is set to -1 by default). > The {{PlanRootSink}} should attempt to accurately estimate how much memory it > needs to buffer all results in memory. This requires setting an accurate > value of {{ResourceProfile#memEstimateBytes_}} in > {{PlanRootSink#computeResourceProfile}}. If statistics are available, the > estimate can be based on the number of estimated rows returned multiplied by > the size of the rows returned. The min reservation should account for a read > and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)