[jira] [Resolved] (IMPALA-6787) On large secure clusters the connection setup thread becomes bottleneck at warmup and cause occasional timeout failures

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6787.
---
Resolution: Won't Do

I think this is unlikely to be necessary now that we've converted all of the 
data and control plane RPCs to KRPC (statestore and catalog are still thrift 
but less likely to be an issue).

> On large secure clusters the connection setup thread becomes bottleneck at 
> warmup and cause occasional timeout failures
> ---
>
> Key: IMPALA-6787
> URL: https://issues.apache.org/jira/browse/IMPALA-6787
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: rpc
>
> On +200 node clusters a single thread is not sufficient and ends up being a 
> bottleneck for a while, which appears to cause queries to fail with 
> {code}
> I0401 20:20:55.032140 1806361 thrift-util.cc:123] TSocket::open() connect() 
> Connection timed out
> I0401 20:20:55.032346 1806361 thrift-client.cc:78] Couldn't open transport 
> for va1007.foo.com:22000 (connect() failed: Connection timed out)
> I0401 20:20:55.032364 1806361 thrift-client.cc:94] Unable to connect to 
> va1007.foo.com:22000
> {code}
> {code}
> // Only using one thread here is sufficient for performance, and it avoids 
> potential
>   // thread safety issues with the thrift code called in SetupConnection.
>   constexpr int CONNECTION_SETUP_POOL_SIZE = 1;
>   // New - this is the thread pool used to process the internal accept queue.
>   ThreadPool> connection_setup_pool("setup-server", 
> "setup-worker",
>   CONNECTION_SETUP_POOL_SIZE, FLAGS_accepted_cnxn_queue_depth,
>   [this](int tid, const shared_ptr& item) {
> this->SetupConnection(item);
>   });
> {code}
> {code}
> #0  0x7fd927de8e20 in krb5int_MD5Update () from /lib64/libk5crypto.so.3
> #1  0x7fd927de7bca in k5_md5_hash () from /lib64/libk5crypto.so.3
> #2  0x7fd927e01e32 in krb5int_hmac_keyblock () from 
> /lib64/libk5crypto.so.3
> #3  0x7fd927dfc448 in usage_key.isra.2 () from /lib64/libk5crypto.so.3
> #4  0x7fd927dfc9fc in krb5int_arcfour_decrypt () from 
> /lib64/libk5crypto.so.3
> #5  0x7fd927df97e4 in krb5_k_decrypt () from /lib64/libk5crypto.so.3
> #6  0x7fd927df98bd in krb5_c_decrypt () from /lib64/libk5crypto.so.3
> #7  0x7fd9297191fb in rd_req_decoded_opt () from /lib64/libkrb5.so.3
> #8  0x7fd92971a1da in krb5_rd_req_decoded () from /lib64/libkrb5.so.3
> #9  0x7fd9282371df in kg_accept_krb5 () from /lib64/libgssapi_krb5.so.2
> #10 0x7fd9282388ca in krb5_gss_accept_sec_context_ext () from 
> /lib64/libgssapi_krb5.so.2
> #11 0x7fd928238a29 in krb5_gss_accept_sec_context () from 
> /lib64/libgssapi_krb5.so.2
> #12 0x7fd92822607a in gss_accept_sec_context () from 
> /lib64/libgssapi_krb5.so.2
> #13 0x7fd92653aedc in gssapi_server_mech_step () from 
> /usr/lib64/sasl2/libgssapiv2.so
> #14 0x7fd92bc27b9b in sasl_server_step () from /lib64/libsasl2.so.3
> #15 0x00caf3b1 in 
> sasl::TSaslServer::evaluateChallengeOrResponse(unsigned char const*, unsigned 
> int, unsigned int*) ()
> #16 0x00cb3040 in 
> apache::thrift::transport::TSaslTransport::doSaslNegotiation() ()
> #17 0x00cb1488 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr)
>  ()
> #18 0x00b143c7 in 
> apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr)
>  ()
> #19 0x00b14eb2 in 
> boost::detail::function::void_function_obj_invoker2  boost::shared_ptr const&)#1}, void, 
> int, boost::shared_ptr 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> boost::shared_ptr const&) ()
> #20 0x00b17d79 in 
> impala::ThreadPool 
> >::WorkerThread(int) ()
> #21 0x00d6049f in impala::Thread::SuperviseThread(std::string const&, 
> std::string const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) ()
> #22 0x00d60c9a in boost::detail::thread_data void (*)(std::string const&, std::string const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5, 
> boost::_bi::value, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> #23 0x012d794a in thread_proxy ()
> #24 0x7fd928c7ddc5 in start_thread () from /lib64/libpthread.so.0
> #25 0x7fd9289aaced in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7136) Set a default THREAD_RESERVATION_AGGREGATE_LIMIT value

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7136.
---
Resolution: Won't Do

Not sure that this really makes sense with multithreading and other scalability 
improvements we've made.

> Set a default THREAD_RESERVATION_AGGREGATE_LIMIT value
> --
>
> Key: IMPALA-7136
> URL: https://issues.apache.org/jira/browse/IMPALA-7136
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: resource-management, scalability
>
> Similar to IMPALA-7115, we should consider setting a default value for this 
> option that will reject queries that are likely to be problematic because of 
> scalability concerns. The aggregate thread count really depends on 
> scalability, so [~kwho] probably has the best idea of what a realistic limit 
> is here after the KRPC changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8006) Consider replacing the modulus in KrpcDatastreamSender with fast mod

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8006.
---
Resolution: Won't Do

Per Norbert's comment, this likely isn't high impact.

> Consider replacing the modulus in KrpcDatastreamSender with fast mod 
> -
>
> Key: IMPALA-8006
> URL: https://issues.apache.org/jira/browse/IMPALA-8006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Michael Ho
>Priority: Major
>  Labels: ramp-up
>
> [~tlipcon]  pointed out that there is potential improvement which can be 
> implemented for the modulus used in our sender for the partitioning 
> exchanges: 
> http://www.idryman.org/blog/2017/05/03/writing-a-damn-fast-hash-table-with-tiny-memory-footprints/
>  (Optimizing Division for Hash Table Size).
> We should evaluate its effectiveness and implement it for 
> KrpcDataStreamSender if appropriate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9006) Consolidate the Statestore subscriber's retry logic

2020-12-23 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9006.
---
Resolution: Later

Minor cleanup, don't need a JIRA to track.

> Consolidate the Statestore subscriber's retry logic
> ---
>
> Key: IMPALA-9006
> URL: https://issues.apache.org/jira/browse/IMPALA-9006
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Priority: Minor
> Attachments: 76c83e9.diff
>
>
> Currently, a Statestore subscriber starts a separate thread after the initial 
> registration with Statestore to periodically check if the Statestore may have 
> failed and re-registered with Statestore if necessary. Similarly, the 
> function {{StatestoreSubscriber::Register()}} also relies on the old Thrift 
> client's retry logic to retry failed RPC attempts to Statestore. This is 
> needed as the initial registration relies on this retry logic to wait for 
> Statestore to startup in case an Impala daemon starts before the Statestore. 
> These two retry paths may be consolidated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6763) to_utc_timestamp() doesn't consider daylight saving for all timezones

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6763.
---
Resolution: Cannot Reproduce

 IMPALA-3307 fixed a lot of issues with timezone database errors. Note that CDT 
and PDT are not supported timezones now..

> to_utc_timestamp() doesn't consider daylight saving for all timezones
> -
>
> Key: IMPALA-6763
> URL: https://issues.apache.org/jira/browse/IMPALA-6763
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> to_utc_timestamp does not consider daylight savings for all timezones shown 
> in below example.
>  Query-
> {code}
> with t as (select current_timestamp() as rightnow),
> t1 as (select rightnow, 
> to_utc_timestamp(rightnow, 'PDT') as utc_ts
> from t)
> select t1.rightnow, t1.utc_ts, 
> from_utc_timestamp(utc_ts, 'PST') as pst_ts, 
> from_utc_timestamp(utc_ts, 'PDT') as pdt_ts, 
> from_utc_timestamp(utc_ts, 'MST') as mst_ts, 
> from_utc_timestamp(utc_ts, 'MDT') as mdt_ts, 
> from_utc_timestamp(utc_ts, 'CST') as cst_ts, 
> from_utc_timestamp(utc_ts, 'CDT') as cdt_ts, 
> from_utc_timestamp(utc_ts, 'EST') as est_ts, 
> from_utc_timestamp(utc_ts, 'EDT') as edt_ts, 
> from t1
> order by 1;
> {code}
>  
>  
> {code:java}
> impalad version 2.9.0-cdh5.12.1
> +---++---+---+---+---+---+---+---+---+---+---+---+---+---+
> | rightnow | now_int | utc_ts | pst_ts | pdt_ts | mst_ts | mdt_ts | cst_ts | 
> cdt_ts | est_ts | edt_ts | chicago_ts | az_ts | in_ts | in_knox_ts |
> +---++---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 2018-03-28 14:02:21.210465000 | 1522245741 | 2018-03-28 19:02:21.210465000 
> | 2018-03-28 12:02:21.210465000 | 2018-03-28 12:02:21.210465000 | 2018-03-28 
> 12:02:21.210465000 | 2018-03-28 13:02:21.210465000 | 2018-03-28 
> 14:02:21.210465000 | 2018-03-28 13:02:21.210465000 | 2018-03-28 
> 14:02:21.210465000 | 2018-03-28 15:02:21.210465000 | 2018-03-28 
> 14:02:21.210465000 | 2018-03-28 12:02:21.210465000 | 2018-03-28 
> 15:02:21.210465000 | 2018-03-28 14:02:21.210465000 |
> +---++---+---+---+---+---+---+---+---+---+---+---+---+---+
> impalad version 2.10.0-cdh5.13.x
> +---++---+---+---+---+---+---+---+---+---+---+---+---+---+
> | rightnow | now_int | utc_ts | pst_ts | pdt_ts | mst_ts | mdt_ts | cst_ts | 
> cdt_ts | est_ts | edt_ts | chicago_ts | az_ts | in_ts | in_knox_ts |
> +---++---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 2018-03-28 14:19:38.978262000 | 1522246778 | 2018-03-28 19:19:38.978262000 
> | 2018-03-28 12:19:38.978262000 | 2018-03-28 12:19:38.978262000 | 2018-03-28 
> 12:19:38.978262000 | 2018-03-28 13:19:38.978262000 | 2018-03-28 
> 14:19:38.978

[jira] [Resolved] (IMPALA-6737) bin/bootstrap_development doesn't capture Docker vagaries with Kudu

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6737.
---
Resolution: Later

> bin/bootstrap_development doesn't capture Docker vagaries with Kudu
> ---
>
> Key: IMPALA-6737
> URL: https://issues.apache.org/jira/browse/IMPALA-6737
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Philip Martin
>Assignee: Jim Apple
>Priority: Major
>
> As discussed in the comments of review 
> https://gerrit.cloudera.org/#/c/9085/1//COMMIT_MSG@28, if you use Docker to 
> develop Impala, and you use 'docker commit', you may run into KUDU-1419 and 
> need to learn how to work around it. We should update the documentation notes 
> to reflect that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9310) OrcStringColumnReader could move blob instead of copy

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9310.
---
Resolution: Later

Probably not worth keeping this open for now.

> OrcStringColumnReader could move blob instead of copy
> -
>
> Key: IMPALA-9310
> URL: https://issues.apache.org/jira/browse/IMPALA-9310
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Norbert Luksa
>Priority: Major
>  Labels: orc
>
> Since IMPALA-9226, in OrcStringColumnReader the blobs of a batch are copied 
> to a freshly allocated space by Impala. As Zoltán pointed out in a 
> [review|https://gerrit.cloudera.org/#/c/15051/1/be/src/exec/orc-column-readers.cc@158],
>  a possible improvement is moving a blob by finding a way of getting hold of 
> the memory allocated by the ORC library instead of copying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8661) Create randomized tests for stressing the event processor

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8661.
---
Fix Version/s: Impala 3.3.0
   Resolution: Fixed

> Create randomized tests for stressing the event processor
> -
>
> Key: IMPALA-8661
> URL: https://issues.apache.org/jira/browse/IMPALA-8661
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: catalog-v2
> Fix For: Impala 3.3.0
>
>
> We should create pseudo-randomized batches of events to stress event 
> processor so that we can flush out any bugs. The tests could be a junit test 
> which generates a random sized batch with the supported event types. Once the 
> random batch of events are processed, we should validate if the table matches 
> with what is present in HMS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8628) start impala-catalog failed

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8628.
---
Resolution: Not A Bug

> start impala-catalog failed
> ---
>
> Key: IMPALA-8628
> URL: https://issues.apache.org/jira/browse/IMPALA-8628
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.2.0
> Environment: hadoop 3.1 hive 3.1 impala 3.2
>Reporter: alex zhou
>Priority: Major
>
> Hello sir:
>I have a question about a impala version here. I am using the latest 
> impala here. The version is 3.2.  I am currently building impala. The version 
> using 2.6 and 3.2 failed. The error message is:
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Lorg/apache/hadoop/hive/conf/HiveConf;Lorg/apache/hadoop/hive/metastore/HiveMetaHookLoader;Ljava/lang/String;
>  ) Lorg/apache/hadoop/hive/metastore/IMetaStoreClient;
> At 
> org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.(MetaStoreClientPool.java:99)
> At 
> org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.(MetaStoreClientPool.java:78)
> At 
> org.apache.impala.catalog.MetaStoreClientPool.initClients(MetaStoreClientPool.java:174)
> At 
> org.apache.impala.catalog.MetaStoreClientPool.(MetaStoreClientPool.java:163)
> At 
> org.apache.impala.catalog.MetaStoreClientPool.(MetaStoreClientPool.java:155)
> At 
> org.apache.impala.catalog.CatalogServiceCatalog.(CatalogServiceCatalog.java:351)
> At org.apache.impala.service.JniCatalog.(JniCatalog.java:119)
> Impalad exiting.
> I want to know how to solve this problem?
>   Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8422) get problem when start catalogd with the tarballs builded by myself

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8422.
---
Resolution: Cannot Reproduce

> get problem when start catalogd with the tarballs builded by myself
> ---
>
> Key: IMPALA-8422
> URL: https://issues.apache.org/jira/browse/IMPALA-8422
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
> Environment: centos6.5  
>Reporter: Lycan
>Priority: Major
>
> I compiled impala under Ubuntu16.04,and run catalogd under centos6.5.Then I 
> get the following problem:
> E0416 19:17:35.646098 153074 logging.cc:147] stderr will be logged to this 
> file.
> java.lang.ClassFormatError: org.apache.impala.common.JniUtil (unrecognized 
> class file version)
> at java.lang.VMClassLoader.defineClass(libgcj.so.10)
> at java.lang.ClassLoader.defineClass(libgcj.so.10)
> at java.security.SecureClassLoader.defineClass(libgcj.so.10)
> at java.net.URLClassLoader.findClass(libgcj.so.10)
> at java.lang.ClassLoader.loadClass(libgcj.so.10)
> at java.lang.ClassLoader.loadClass(libgcj.so.10)
> F0416 19:17:36.217576 153074 init.cc:313] Failed to find JniUtil class.
> . Impalad exiting.
> *** Check failure stack trace: ***
> @ 0x7fd12ac07dac
> @ 0x7fd12ac09651
> @ 0x7fd12ac07786
> @ 0x7fd12ac0ad4d
> @ 0x7fd12d8efef3
> @ 0x7fd12f634d29
> @ 0x697906
> @ 0x7fd129c0a92f
> @ 0x6d6038
> loadFileSystems error:
> ClassFormatError: org.apache.hadoop.fs.FileSystem (unrecognized class file 
> version)java.lang.ClassFormatError: org.apache.hadoop.fs.FileSystem 
> (unrecognized class file ver sion)
> at java.lang.VMClassLoader.defineClass(libgcj.so.10)
> at java.lang.ClassLoader.defineClass(libgcj.so.10)
> at java.security.SecureClassLoader.defineClass(libgcj.so.10)
> at java.net.URLClassLoader.findClass(libgcj.so.10)
> at java.lang.ClassLoader.loadClass(libgcj.so.10)
> at java.lang.ClassLoader.loadClass(libgcj.so.10)
> this is my java version:
> [root@impala3.2]# java -version
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8088) Enhance profile to show dynamically pruned file count

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8088.
---
Resolution: Won't Do

> Enhance profile to show dynamically pruned file count
> -
>
> Key: IMPALA-8088
> URL: https://issues.apache.org/jira/browse/IMPALA-8088
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Janaki Lahorani
>Priority: Major
>  Labels: observability, ramp-up
>
> Profile should be enhanced to show the number of partitions pruned because of 
> a runtime filter or min-max filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8210) Support reading/writing tiny RDBMS tables

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8210.
---
Resolution: Duplicate

> Support reading/writing tiny RDBMS tables
> -
>
> Key: IMPALA-8210
> URL: https://issues.apache.org/jira/browse/IMPALA-8210
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Priority: Major
>
> It'd be quite helpful if Impala can read/write some tiny 
> RDBMS(MySQL/PostgreSQL/SQLServer) tables. Parallelism or efficiency can be 
> ignored since the target tables are all tiny. Some use cases:
>  * Some dimension tables in Hive are snapshots of RDBMS tables. Users want to 
> query the difference between the snapshot in Hive and the latest data in 
> RDBMS.
>  * Users want to run queries joining Hive fact tables and the latest data in 
> RDBMS.
>  * Users hope their query results can be ingested into MySQL directly
> Implement an "External Data Source" as a generic JDBC wrapper for RDBMS data 
> sources could be a solution. The drawback is that "External Data Source" 
> requires users to create tables in Impala for each RDBMS table they want to 
> access. Users can't list tables (show tables) of a schema(database). 
> There're other solutions that support RDBMS directly. For example 
> [https://www.slideshare.net/liuknag/cloudera-impala-postgre-sql-29025605]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8086) Check query option value when set

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8086.
---
Resolution: Won't Do

This is kinda fundamental to how the shell handles query options client-side. 

> Check query option value when set 
> --
>
> Key: IMPALA-8086
> URL: https://issues.apache.org/jira/browse/IMPALA-8086
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Janaki Lahorani
>Priority: Major
>
> When a parameter is set some value, it is evaluated when the query is run.  
> It should ideally be evaluated with the set is called.
> [localhost:21000] functional_kudu> set runtime_filter_mode=On;
> RUNTIME_FILTER_MODE set to On
> [localhost:21000] functional_kudu> select STRAIGHT_JOIN count(*) from 
> decimal_rtf_tbl a join [BROADCAST] decimal_rtf_tbl_tiny_d5_kudu b where 
> a.d5_0 = b.d5_0;
> Query: select STRAIGHT_JOIN count(*) from decimal_rtf_tbl a join [BROADCAST] 
> decimal_rtf_tbl_tiny_d5_kudu b where a.d5_0 = b.d5_0
> Query submitted at: 2018-12-07 20:00:55 (Coordinator: 
> http://janaki-OptiPlex-7050:25000)
> ERROR: Errors parsing query options
> Invalid runtime filter mode 'On'. Valid modes are OFF(0), LOCAL(1) or 
> GLOBAL(2).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8082) Save intermediate state and data if applicable

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8082.
---
Resolution: Later

> Save intermediate state and data if applicable
> --
>
> Key: IMPALA-8082
> URL: https://issues.apache.org/jira/browse/IMPALA-8082
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Janaki Lahorani
>Priority: Major
>
> When a query is stalled, it will be beneficial to flush the state and if 
> needed data as well, to disk (temporary space) so that the query can be 
> suspended and resources can be freed.  The query can resume execution at a 
> later point when it becomes un-stalled.  The amount of space that can be used 
> should probably be configurable.  There should be life cycle management to 
> clean up this space and abort stalled queries.  In reality, this space will 
> be quite big.  If it is getting filled up then there is a problem that needs 
> to be analyzed and addressed - may be in code, or in terms of management and 
> logistics at deployment.  Consequently necessary tools, logging and 
> diagnostics should be built in tandem.
> When a query crashes, it could potentially affect many queries that are 
> running in that process.  It looks like end user is required to manually 
> restart all these queries.  If there is an infrastructure that saved stages, 
> then the non-crashed queries could be restarted from a saved point and these 
> could be used to finish running the queries without requiring a user 
> intervention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7877) Support Hive GenericUDF

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7877.
---
Resolution: Duplicate

> Support Hive GenericUDF
> ---
>
> Key: IMPALA-7877
> URL: https://issues.apache.org/jira/browse/IMPALA-7877
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: eugen yushin
>Priority: Major
>
> Running Hive UDF extending GenericUDF interface results in class cast 
> exception. Relevant [code 
> block|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java#L586]:
> {code}
> LOG.debug("Loading UDF '" + udfPath + "' from " + jarPath);
> loader = getClassLoader(jarPath);
> Class c = Class.forName(udfPath, true, loader);
> Class udfClass = c.asSubclass(UDF.class);
> {code}
> Reproduce steps:
> {code}
> create function my_lower(string) returns string location 
> '/path/to/hive-exec-1.1.0-cdh5.15.0.jar' 
> symbol='org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower';
> select my_lower('Some String NOT ALREADY LOWERCASE');
> {code}
> Stack trace:
> {code}
> I1121 11:58:29.509138 29092 Frontend.java:952] Analyzing query: select 
> my_lower('Some String NOT ALREADY LOWERCASE')
> I1121 11:58:29.513121 29092 UdfExecutor.java:581] Loading UDF 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower' from 
> /var/lib/impala/udfs/hive-exec-1.1.0-cdh5.15.0.83728.2.jar
> I1121 11:58:29.515535 29092 jni-util.cc:230] java.lang.ClassCastException: 
> class org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower
> at java.lang.Class.asSubclass(Class.java:3404)
> at 
> org.apache.impala.hive.executor.UdfExecutor.init(UdfExecutor.java:584)
> at 
> org.apache.impala.hive.executor.UdfExecutor.(UdfExecutor.java:217)
> at 
> org.apache.impala.service.FeSupport.NativeEvalExprsWithoutRow(Native Method)
> at 
> org.apache.impala.service.FeSupport.EvalExprsWithoutRow(FeSupport.java:208)
> at 
> org.apache.impala.service.FeSupport.EvalExprWithoutRow(FeSupport.java:163)
> at org.apache.impala.analysis.LiteralExpr.create(LiteralExpr.java:184)
> at 
> org.apache.impala.rewrite.FoldConstantsRule.apply(FoldConstantsRule.java:68)
> at 
> org.apache.impala.rewrite.ExprRewriter.applyRuleBottomUp(ExprRewriter.java:85)
> at 
> org.apache.impala.rewrite.ExprRewriter.applyRuleRepeatedly(ExprRewriter.java:71)
> at 
> org.apache.impala.rewrite.ExprRewriter.rewrite(ExprRewriter.java:55)
> at 
> org.apache.impala.analysis.SelectList.rewriteExprs(SelectList.java:97)
> at 
> org.apache.impala.analysis.SelectStmt.rewriteExprs(SelectStmt.java:894)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:432)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:393)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:962)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
> I1121 11:58:29.523166 29092 status.cc:125] ClassCastException: class 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFLower
> @   0x96663a  impala::Status::Status()
> @   0xcedfdd  impala::JniUtil::GetJniExceptionMsg()
> @  0x109457f  impala::HiveUdfCall::OpenEvaluator()
> @   0x96d757  impala::ScalarExprEvaluator::Open()
> @   0xbedc2d  
> Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow
> @ 0x7fc705b49e6d  (unknown)
> {code}
> Marked as bug because there're no any notes related to this behaviour in docs 
> (while it claims Impala supports Hive UDF, it should support all possible 
> Hive UDF formats if other is not specified).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7767) Add metrics and profiles for automatic invalidation

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7767.
---
Resolution: Later

> Add metrics and profiles for automatic invalidation
> ---
>
> Key: IMPALA-7767
> URL: https://issues.apache.org/jira/browse/IMPALA-7767
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Tianyi Wang
>Assignee: Tianyi Wang
>Priority: Major
>  Labels: observability
>
> The automatic invalidation introduced by IMPALA-7448 lacks supportive metrics 
> and query profiles. They should be added to help users to understand whether 
> it's working and to diagnose possible issues.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10405) Consider setting parquet_page_row_count_limit to 20000 by default

2020-12-22 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-10405:
--

 Summary: Consider setting parquet_page_row_count_limit to 2 by 
default
 Key: IMPALA-10405
 URL: https://issues.apache.org/jira/browse/IMPALA-10405
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong


PARQUET-1414 did some experiments for parquet-mr that concluded that this 
setting would enhance page filtering without giving up much compression. It's 
the default over there.

We should probably just do the same in Impala because we already have that 
evidence that it's better and we can avoid it being a confounding factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8550) Sentry refresh privileges has race conditions

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8550.
---
Resolution: Won't Fix

We removed sentry support 

> Sentry refresh privileges has race conditions
> -
>
> Key: IMPALA-8550
> URL: https://issues.apache.org/jira/browse/IMPALA-8550
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Recently, I encountered a race condition in {{SentryProxy}}'s 
> refreshSentryAuthorization loop. The race happens when Sentry server is slow 
> to update its information based on changes in HMS. Consider the following 
> scenario:
>  # Impala session from user A creates a database/table.
>  # AuthorizationManager will updateDatabaseOwnerPrivilege 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1159]]
>  Note that this add adds the user privilege in Catalog's cache out-of-band 
> (without confirming that Sentry has added this privilege in its database)
>  # Assume that Sentry is slow to update its database of roles/privileges. 
> (Actually depending on the timing of these events, it doesn't really matter 
> but likelihood of the issue increases if Sentry is slow.
>  # The refreshSentryAuthorization loop is triggered based on a configured 
> interval 
> [here|[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java#L174]].
>  Since Sentry has not yet updated its database of the owner information, this 
> loop will remove the privilege from Catalog. Any subsequent SQL which 
> requires privileges will fail until Sentry is synced and refresh loop adds 
> this privilege again the catalog cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9553) Test flaky with HBase: PlannerTest.testResourceRequirements

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9553.
---
Resolution: Duplicate

> Test flaky with HBase: PlannerTest.testResourceRequirements
> ---
>
> Key: IMPALA-9553
> URL: https://issues.apache.org/jira/browse/IMPALA-9553
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: flaky-test
>
> The following test seems flaky in the last few days (since 2020.03.23):
> org.apache.impala.planner.PlannerTest.testResourceRequirements
> Section PARALLELPLANS of query:
> select * from functional_hbase.alltypessmall
> Expected:
> Max Per-Host Resource Reservation: Memory=0B Threads=2
> Actual:
> Max Per-Host Resource Reservation: Memory=0B Threads=3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9438) error You need to implement atomic operations for this architecture

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9438.
---
Resolution: Fixed

> error You need to implement atomic operations for this architecture
> ---
>
> Key: IMPALA-9438
> URL: https://issues.apache.org/jira/browse/IMPALA-9438
> Project: IMPALA
>  Issue Type: Bug
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
>
> Built impala on aarch64 platform, an error raised:
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:88:2: error: 
> #error You need to implement atomic operations for this architecture
> [  1%] Running thrift compiler on Descriptors.thrift
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:321:8: error: 
> ‘Atomic32’ does not name a type
>  inline Atomic32 Acquire_CompareAndSwap(volatile Atomic32* ptr,
> ^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:326:8: error: 
> ‘Atomic32’ does not name a type
>  inline Atomic32 Release_CompareAndSwap(volatile Atomic32* ptr,
> ^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:331:36: error: 
> ‘Atomic32’ does not name a type
>  inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
> ^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:331:51: error: 
> ‘Atomic32’ has not been declared
>  inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
>^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h: In function 
> ‘void Acquire_Store(volatile int*, int)’:
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:332:9: error: 
> ‘base::subtle’ has not been declared
>base::subtle::Acquire_Store(ptr, value);
>  ^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h: At global scope:
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:334:36: error: 
> ‘Atomic32’ does not name a type
>  inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
> ^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:334:51: error: 
> ‘Atomic32’ has not been declared
>  inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
>^
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h: In function 
> ‘void Release_Store(volatile int*, int)’:
> /home/jenkins/workspace/Impala-ASF/be/src/gutil/atomicops.h:335:16: error: 
> ‘base::subtle’ has not been declared
>return base::subtle::Release_Store(ptr, value);
> ^
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8722) test_hbase_col_filter failure

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8722.
---
Resolution: Cannot Reproduce

> test_hbase_col_filter failure
> -
>
> Key: IMPALA-8722
> URL: https://issues.apache.org/jira/browse/IMPALA-8722
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Bikramjeet Vig
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: broken-build, flaky
>
> test_hbase_col_filter failure with the following exec params failed in one of 
> the exhaustive runs
> {noformat}
> query_test.test_hbase_queries.TestHBaseQueries.test_hbase_col_filter[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> hbase/none]
> {noformat}
> From the logs it seems like the insert query was executed completely around 
> 23:27:42 and the invalidate metadata in impala was run around 23:27:32
> I am not sure if this is due to the log being buffered and written later, but 
> if the insert finished after the invalidate metadata then it probably didn't 
> get the necessary file data and hence the query that expected 3 rows, didn't 
> get any. Note: The insert call is run in hive using 
> "self.run_stmt_in_hive(add_data, username='hdfs')"
> hive server 2 logs:
> {noformat}
> 2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
> exec.TableScanOperator: Initializing operator TS[0]
> 2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
> exec.SelectOperator: Initializing operator SEL[1]
> 2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
> exec.SelectOperator: SELECT 
> struct
> 2019-06-26T23:27:42,456  INFO [LocalJobRunner Map Task Executor #0] 
> exec.FileSinkOperator: Initializing operator FS[2]
> 2019-06-26T23:27:42,465  INFO [LocalJobRunner Map Task Executor #0] 
> hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count 
> = 2133979
> 2019-06-26T23:27:42,469  INFO [LocalJobRunner Map Task Executor #0] 
> exec.FileSinkOperator: Using serializer : class 
> org.apache.hadoop.hive.hbase.HBaseSerDe[[:key,cf:c:[k, c]:[string, string]]] 
> and formatter : 
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat@c73c2ac
> 2019-06-26T23:27:42,471  INFO [LocalJobRunner Map Task Executor #0] 
> exec.FileSinkOperator: New Final Path: FS 
> hdfs://localhost:20500/test-warehouse/test_hbase_col_filter_2598223d.db/_tmp.hbase_col_filter_testkeyx/00_0
> 2019-06-26T23:27:42,479 ERROR [LocalJobRunner Map Task Executor #0] 
> hadoop.ParquetRecordReader: Can not initialize counter due to context is not 
> a instance of TaskInputOutputContext, but is 
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
> 2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
> hadoop.InternalParquetRecordReader: RecordReader initialized will read a 
> total of 2142543 records.
> 2019-06-26T23:27:42,482  INFO [LocalJobRunner Map Task Executor #0] 
> hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-06-26T23:27:42,496  INFO [ReadOnlyZKClient-localhost:2181@0x4d49ce4e] 
> zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:2181 sessionTimeout=9 
> watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$47/372191955@532dae72
> 2019-06-26T23:27:42,497  INFO 
> [ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 2019-06-26T23:27:42,498  INFO 
> [ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Socket connection established, initiating session, 
> client: /0:0:0:0:0:0:0:1:55090, server: localhost/0:0:0:0:0:0:0:1:2181
> 2019-06-26T23:27:42,499  INFO 
> [ReadOnlyZKClient-localhost:2181@0x4d49ce4e-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Session establishment complete on server 
> localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x16b96a782de00c5, negotiated 
> timeout = 9
> 2019-06-26T23:27:42,503  INFO [LocalJobRunner Map Task Executor #0] 
> exec.FileSinkOperator: FS[2]: records written - 1
> 2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
> exec.MapOperator: MAP[0]: records read - 1
> 2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
> exec.MapOperator: MAP[0]: Total records read - 1. abort - false
> 2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0] 
> exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:1, 
> 2019-06-26T23:27:42,504  INFO [LocalJobRunner Map Task Executor #0

[jira] [Resolved] (IMPALA-5353) Add timeline metric for authorization/security checks

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5353.
---
Resolution: Duplicate

> Add timeline metric for authorization/security checks
> -
>
> Key: IMPALA-5353
> URL: https://issues.apache.org/jira/browse/IMPALA-5353
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Peter Ebert
>Priority: Major
>
> Would be helpful for Impala to show time spent performing authorization in 
> the query profile's timeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5921) Internal-UDF crashed

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5921.
---
Resolution: Cannot Reproduce

> Internal-UDF crashed
> 
>
> Key: IMPALA-5921
> URL: https://issues.apache.org/jira/browse/IMPALA-5921
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Attila Jeges
>Priority: Major
>  Labels: crash, udf
> Attachments: 150226_resolved.txt, hs_err_pid43750.log
>
>
> Impala crashed and a mindump was written. Here's the resolved stack:
> {code}
> Thread 231 (crashed)
>  0  impalad!impala::ScalarFnCall::EvaluateChildren(impala::ExprContext*, 
> impala::TupleRow const*, std::vector std::allocator >*) [anyval-util.h : 215 + 0x0]
> rax = 0x7f15a5316090   rdx = 0x0008
> rcx = 0x000a   rbx = 0x01c3dfc0
> rsi = 0x   rdi = 0x0010
> rbp = 0x   rsp = 0x7f161fad80d0
>  r8 = 0x7f161fad809cr9 = 0x7f15dca37680
> r10 = 0x   r11 = 0x0001
> r12 = 0x0010   r13 = 0x7f0d9ac4ba20
> r14 = 0x7a4d2aa0   r15 = 0x
> rip = 0x008a3209
> Found by: given as instruction pointer in context
>  1  impalad!impala_udf::StringVal 
> impala::ScalarFnCall::InterpretEval(impala::ExprContext*,
>  impala::TupleRow const*) [scalar-fn-call.cc : 562 + 0x5]
> rbx = 0x7f0d9ac4ba20   rbp = 0x7f0cd921ee40
> rsp = 0x7f161fad8120   r12 = 0x7a4d29a0
> r13 = 0x7c1f09d0   r14 = 0x7f161fad8230
> r15 = 0x7f161fad8250   rip = 0x008acdc6
> Found by: call frame info
>  2  impalad!impala::ScalarFnCall::GetStringVal(impala::ExprContext*, 
> impala::TupleRow const*) [scalar-fn-call.cc : 758 + 0x5]
> rbx = 0x7f0d9ac4ba20   rbp = 0x7f15a5316040
> rsp = 0x7f161fad8140   r12 = 0x7f15b7c77900
> r13 = 0x7c1f09d0   r14 = 0x7f161fad8230
> r15 = 0x7f161fad8250   rip = 0x008a3ca5
> Found by: call frame info
>  3  impalad!impala::ExprContext::GetValue(impala::Expr*, impala::TupleRow 
> const*) [expr-context.cc : 277 + 0xc]
> rbx = 0x7f0d9ac4ba20   rbp = 0x7f15a5316040
> rsp = 0x7f161fad8150   r12 = 0x7f15b7c77900
> r13 = 0x7c1f09d0   r14 = 0x7f161fad8230
> r15 = 0x7f161fad8250   rip = 0x0085ef3c
> Found by: call frame info
>  4  
> impalad!impala::ImpalaServer::QueryExecState::GetRowValue(impala::TupleRow*, 
> std::vector >*, std::vector std::allocator >*) [query-exec-state.cc : 837 + 0x5]
> rbx = 0x005a   rbp = 0x7f0d7390a000
> rsp = 0x7f161fad81a0   r12 = 0x7f15b7c77900
> r13 = 0x7c1f09d0   r14 = 0x7f161fad8230
> r15 = 0x7f161fad8250   rip = 0x00b24e13
> Found by: call frame info
>  5  impalad!impala::ImpalaServer::QueryExecState::FetchRowsInternal(int, 
> impala::ImpalaServer::QueryResultSet*) [query-exec-state.cc : 766 + 0x5]
> rbx = 0x7f0d7390a000   rbp = 0x7f161fad82e0
> rsp = 0x7f161fad81f0   r12 = 0x7f15b80b5200
> r13 = 0x7f161fad8230   r14 = 0x
> r15 = 0x7f161fad8220   rip = 0x00b25bde
> Found by: call frame info
>  6  impalad!impala::ImpalaServer::QueryExecState::FetchRows(int, 
> impala::ImpalaServer::QueryResultSet*) [query-exec-state.cc : 656 + 0x19]
> rbx = 0x7f0d7390a000   rbp = 0x7f161fad82e0
> rsp = 0x7f161fad82e0   r12 = 0x7f161fad83b0
> r13 = 0x7f161fad82f0   r14 = 0x7f15b80b5200
> r15 = 0x7f161fad84c0   rip = 0x00b261c9
> Found by: call frame info
>  7  impalad!impala::ImpalaServer::FetchInternal(impala::TUniqueId const&, 
> int, bool, apache::hive::service::cli::thrift::TFetchResultsResp*) 
> [impala-hs2-server.cc : 524 + 0x5]
> rbx = 0x7f161fad8628   rbp = 0x7f15b80b5200
> rsp = 0x7f161fad8330   r12 = 0x7f161fad83b0
> r13 = 0x7f161fad8470   r14 = 0x7f161fad83c0
> r15 = 0x7f161fad84c0   rip = 0x00b0c672
> Found by: call frame info
>  8  
> impalad!impala::ImpalaServer::FetchResults(apache::hive::service::cli::thrift::TFetchResultsResp&,
>  apache::hive::service::cli::thrift::TFetchResultsReq const&) 
> [impala-hs2-server.cc : 1072 + 0x22]
> rbx = 0x7f161fad8628   rbp = 0x7f161fad84b0
> rsp = 0x7f161fad8440   r12 = 0x7f161fad84e0
> r13 = 0x7f161fad85c8   r14 = 0x7f161fad85d8
> r15 = 0x7f161fad84c0   rip = 0x00b0ccd5
> Found by: call frame info
>  9  
> impalad!apache::hive::service::cli::thrift::TCLIServiceProcessor::process_FetchResults(int,
>  apache::thrift::pr

[jira] [Resolved] (IMPALA-7518) Release 2.12.1 since 2.12.0 fails to compile

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7518.
---
Resolution: Won't Do

> Release 2.12.1 since 2.12.0 fails to compile
> 
>
> Key: IMPALA-7518
> URL: https://issues.apache.org/jira/browse/IMPALA-7518
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Priority: Major
>
> 2.12.0-release failed to compile since it depends on cdh5.16.0-SNAPSHOT. An 
> update to Sentry 1.5.1-cdh5.16.0-SNAPSHOT broke the build. See the error 
> message in IMPALA-7513.
> We should release a compilable version of branch-2.12. There're two choices:
> * cherry-pick the patch in IMPALA-7513
> * Downgrade to depend on cdh5.15 stable versions
> I'm in favor with the last choice since future changes in cdh5.16.0-SNAPSHOT 
> won't break the builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8205) Illegal statistics for numFalse and numTrue

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8205.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Illegal statistics for numFalse and numTrue
> ---
>
> Key: IMPALA-8205
> URL: https://issues.apache.org/jira/browse/IMPALA-8205
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: wuchang
>Assignee: wuchang
>Priority: Major
>  Labels: impala, numFalse, numTrue, statistics
> Fix For: Impala 4.0
>
>
> When impala compute statistics, it set *numFalse = -1* and *numTrue = 1* when 
> the statistic is missing;
> *-1* for *numFalse* will corrupt some query engine like Presto and there 
> already exists some PR report and hotfix it : 
> [presto-11859|https://github.com/prestodb/presto/pull/11859]
> *1* for *numTrue* is also unreasonable because we are not sure whether it 
> indicates the real numTrue statistics or a missing statistics;
> Also, previously , the *nullCount* also use -1 to indicate its absence which 
> also caused problem for Presto. Presto has to add a hotfix for 
> it([presto-11549|https://github.com/prestodb/presto/pull/11549]) . But it is 
> a fortunate that impala has fixed this bug;
> It is necessary to set to null when these statistics are absent instead of -1 
> and 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9326) PARTITION BY RANGE clause does not accept DATE constants

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9326.
---
Resolution: Duplicate

This was fixed as part of IMPALA-8800.

> PARTITION BY RANGE clause does not accept DATE constants
> 
>
> Key: IMPALA-9326
> URL: https://issues.apache.org/jira/browse/IMPALA-9326
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Vladimir Verjovkin
>Priority: Major
>  Labels: kudu, ramp-up
>
> In impala-shell.sh, when I submit DDL command
> {code:java}
> create table kudu_partition_test3 (ts TIMESTAMP, primary key(ts))
> partition by range (ts) 
> ( partition '2018-05-01' <= values < '2018-06-01') stored as kudu;{code}
> it's succeeds.
> When I submit command:
> {code:java}
> create table kudu_partition_test4 (ts DATE, primary key(ts))
> partition by range (ts) 
> ( partition '2018-05-01' <= values < '2018-06-01') stored as kudu;{code}
> or
> {code:java}
> create table kudu_partition_test5 (ts DATE, primary key(ts))
> partition by range (ts) 
> ( partition to_date('2018-05-01') <= values < to_date('2018-06-01')) stored 
> as kudu;{code}
> or
> {code:java}
> create table kudu_partition_test6 (ts DATE, primary key(ts))
> partition by range (ts) 
> ( partition date '2018-05-01' <= values < date '2018-06-01') stored as 
> kudu;{code}
> it fails with message:
> {code:java}
> ERROR: IllegalStateException: null{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10250) TestNestedTypes.test_scanner_position fails in an ASAN test

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10250.

Resolution: Duplicate

> TestNestedTypes.test_scanner_position fails in an ASAN test
> ---
>
> Key: IMPALA-10250
> URL: https://issues.apache.org/jira/browse/IMPALA-10250
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build, flaky
>
> TestNestedTypes.test_scanner_position fails in a CORE ASAN job:
> {code:java}
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 0 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block]
> query_test.test_nested_types.TestNestedTypes.test_scanner_position[mt_dop: 2 
> | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> orc/def/block] {code}
> The stacktrace are the same:
> {code:java}
> query_test/test_nested_types.py:76: in test_scanner_position
> self.run_test_case('QueryTest/nested-types-scanner-position', vector)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 0,-1,7300 != 0,-1,9366
> E 0,1,7300 != 0,1,9800
> E 0,NULL,7300 != 0,NULL,9796
> E 1,1,7300 != 1,1,9796
> E 1,2,7300 != 1,2,9800
> E 2,2,7300 != 2,2,9796
> E 2,3,7300 != 2,3,9800
> E 3,NULL,7300 != 3,NULL,9796
> E 4,3,7300 != 4,3,9796
> E 5,NULL,7300 != 5,NULL,9796 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10009) test_insert suite's test_insert_bad_expr failed (may be flaky)

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10009.

Resolution: Duplicate

> test_insert suite's  test_insert_bad_expr failed (may be flaky) 
> 
>
> Key: IMPALA-10009
> URL: https://issues.apache.org/jira/browse/IMPALA-10009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Aman Sinha
>Priority: Major
>  Labels: flaky
>
> Seen on a recent build of impala-asf-master-exhaustive-release:
> {noformat}
> TestInsertQueries.test_insert_bad_expr[compression_codec: none | protocol: 
> beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 1, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none-unique_database0] 
> 02:31:45 query_test/test_insert.py:200: in test_insert_bad_expr
> 02:31:45 .get_db_name_from_format(vector.get_value('table_format'))})
> 02:31:45 common/impala_test_suite.py:668: in run_test_case
> 02:31:45 self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> 02:31:45 common/impala_test_suite.py:485: in __verify_exceptions
> 02:31:45 (expected_str, actual_str)
> 02:31:45 E   AssertionError: Unexpected exception string. Expected: Cannot 
> interpret native UDF 'twenty_one_args': number of arguments is more than 20. 
> Codegen is needed. Please set DISABLE_CODEGEN to false.
> 02:31:45 E   Not found in actual: ImpalaBeeswaxException: Query 
> aborted:ExecQueryFInstances rpc query_id=164b40a17ce7407d:aebe998e 
> failed: Exec() rpc failed: Aborted: ExecQueryFInstances RPC to 
> 127.0.0.1:27002 is cancelled in state SENT
> {noformat}
> The failed RPC indicates a transient failure because all other tests in this 
> build ran fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10052) Expose daemon health on /healthz endpoint for catalogd and statestored as well

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10052.

Fix Version/s: Impala 4.0
   Resolution: Fixed

[~bikramjeet.vig] just closing this out

> Expose daemon health on /healthz endpoint for catalogd and statestored as well
> --
>
> Key: IMPALA-10052
> URL: https://issues.apache.org/jira/browse/IMPALA-10052
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: observability
> Fix For: Impala 4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10311) custom_cluster.test_concurrent_ddls.TestConcurrentDdls query timeout

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10311.

Resolution: Duplicate

> custom_cluster.test_concurrent_ddls.TestConcurrentDdls query timeout
> 
>
> Key: IMPALA-10311
> URL: https://issues.apache.org/jira/browse/IMPALA-10311
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Kurt Deschler
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: broken-build, flaky
>
> [custom_cluster.test_concurrent_ddls.TestConcurrentDdls.test_local_catalog_ddls_with_invalidate_metadata_sync_ddl|https://master-02.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core-s3-data-cache/lastCompletedBuild/testReport/custom_cluster.test_concurrent_ddls/TestConcurrentDdls/test_local_catalog_ddls_with_invalidate_metadata_sync_ddl/]
> h3. Error Message
> AssertionError: Query timeout(60s): insert overwrite table 
> test_local_catalog_ddls_with_invalidate_metadata_sync_ddl_b87f02d6.test_6_part
>  partition(j=2) values (1), (2), (3), (4), (5) assert False
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3551) Fix ThriftClient initialization code

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3551.
---
Resolution: Won't Fix

I think the current pattern where Open()/OpenWithRetry() checks init_status_ is 
OK and it's not really that important to clean it up.

> Fix ThriftClient initialization code
> 
>
> Key: IMPALA-3551
> URL: https://issues.apache.org/jira/browse/IMPALA-3551
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0
>Reporter: Matthew Jacobs
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: kerberos, ramp-up, rpc, thrift
>
> The ThriftClient constructor has a few issues:
> * All errors need to be handled. The result of 
> {{auth_provider_->WrapClientTransport()}} is ignored.
> * Logic that can fail should be moved to an initialization method. Calling 
> code would need to be updated.
> * Failure conditions should be tested. Right now we don't have confidence in 
> making the above changes.
> This came up in the context of fixing IMPALA-1928, which had to be addressed 
> in a minimal way since we don't have enough test coverage. See this review 
> for some more context: http://gerrit.cloudera.org:8080/#/c/3093/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10404) Update docs to reflect RLE_DICTIONARY support

2020-12-22 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-10404:
--

 Summary: Update docs to reflect RLE_DICTIONARY support
 Key: IMPALA-10404
 URL: https://issues.apache.org/jira/browse/IMPALA-10404
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Tim Armstrong
Assignee: Tim Armstrong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2753) Investigate performance gains for adding random prefix to file name

2020-12-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2753.
---
Resolution: Won't Fix

This isnt really feasible to fix with Hive's traditional partitioning scheme 
but looks like there is a better solution for iceberg tables.

> Investigate performance gains for adding random prefix to file name
> ---
>
> Key: IMPALA-2753
> URL: https://issues.apache.org/jira/browse/IMPALA-2753
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: s3
>
> I noticed which is not directly related to Impala is that the file naming 
> convention HDFS produces is the anti pattern of what S3 recommends. 
> If we do a trick with the naming convention we can one up Hive when running 
> on S3. 
> {code}
> examplebucket/2013-26-05-15-00-00/cust1234234/photo1.jpg
> examplebucket/2013-26-05-15-00-00/cust3857422/photo2.jpg
> examplebucket/2013-26-05-15-00-00/cust8474937/photo2.jpg
> examplebucket/2013-26-05-15-00-00/cust1248473/photo3.jpg
> ...
> examplebucket/2013-26-05-15-00-01/cust1248473/photo4.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo5.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo6.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo7.jpg
> ...
> {code}
> The sequence pattern in the key names introduces a performance problem. To 
> understand the issue, let’s look at how Amazon S3 stores key names.
> Amazon S3 maintains an index of object key names in each AWS region. Object 
> keys are stored lexicographically across multiple partitions in the index. 
> That is, Amazon S3 stores key names in alphabetical order. The key name 
> dictates which partition the key is stored in. Using a sequential prefix, 
> such as timestamp or an alphabetical sequence, increases the likelihood that 
> Amazon S3 will target a specific partition for a large number of your keys, 
> overwhelming the I/O capacity of the partition. If you introduce some 
> randomness in your key name prefixes, the key names, and therefore the I/O 
> load, will be distributed across more than one partition.
> If you anticipate that your workload will consistently exceed 100 requests 
> per second, you should avoid sequential key names. If you must use sequential 
> numbers or date and time patterns in key names, add a random prefix to the 
> key name. The randomness of the prefix more evenly distributes key names 
> across multiple index partitions. Examples of introducing randomness are 
> provided later in this topic.
> http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9966) Add missing breaks in SetQueryOption

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9966.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add missing breaks in SetQueryOption
> 
>
> Key: IMPALA-9966
> URL: https://issues.apache.org/jira/browse/IMPALA-9966
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Tim Armstrong
>Priority: Minor
> Fix For: Impala 4.0
>
>
> Missing break statements in SetQueryOption can result in cases where the 
> switch statement hits the default case ultimately hitting a DCHECK().
> Currently these 2 query options dont have break statements at the end of 
> their case:
> RESOURCE_TRACE_RATIO
> BROADCAST_BYTES_LIMIT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8838) Impala wrote audit log with missing statement_type

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8838.
---
Resolution: Cannot Reproduce

> Impala wrote audit log with missing statement_type
> --
>
> Key: IMPALA-8838
> URL: https://issues.apache.org/jira/browse/IMPALA-8838
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.9.0
>Reporter: Tim Armstrong
>Priority: Major
>
> We saw an audit log with a missing statement_type, where it should have been 
> QUERY. Filing a bug to see if this reoccurs and if there is a pattern to it 
> (we don't have a way to reproduce or debug now).
> {noformat}
> {
>   "serviceType": "IMPALA", 
>   "serviceName": "impala", 
>   "extraValues": {
> "12345678912345": {
>   "status": "", 
>   "impersonator": null, 
>   "start_time": "2019-01-01 00:00:00.0", 
>   "network_address": "123.123.123.123:12345", 
>   "authorization_failure": false, 
>   "sql_statement": "SELECT NDV_NO_FINALIZE(col) AS col, CAST(-1 as 
> BIGINT), 8, CAST(8 as DOUBLE), COUNT(col), ... FROM table WHERE 
> (day='2019-01-01') GROUP BY day",
>   "session_id\\ ": "xx:xx", 
>   "query_id": "xxx:xx", 
>   "catalog_objects": [
> {
>   "privilege": "VIEW_METADATA", 
>   "object_type": "", 
>   "name": "_impala_builtins"
> }, 
> {
>   "privilege": "SELECT", 
>   " object_type": "", 
>   "name": "table"
> }
>   ], 
>   "statement_type": "", 
>   "user": "u...@realm.net"
> }
>   }
> }
> {noformat}
> statement_type is printed here:
> https://github.com/cloudera/Impala/blob/cdh5-2.9.0_5.12.2/be/src/service/impala-server.cc#L474
> It calls out to the function which prints an enum 
> here:https://github.com/cloudera/Impala/blob/cdh5-2.9.0_5.12.2/be/src/util/debug-util.cc#L68.
>  The only way it can produce an empty string is if the enum value is 
> out-of-range, which shouldn't be possible unless we're reading an 
> uninitialised value or the memory is somehow corrupted. However, all the 
> surrounding fields in the TExecRequest object look like they were written out 
> to the audit log OK
> The code has changed a bit in master because of the thrift version upgrade, 
> but it is still equivalent as far as I can see.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8789) Include script to trigger graceful shutdown in docker containers

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8789.
---
Fix Version/s: Impala 3.3.0
   Resolution: Fixed

> Include script to trigger graceful shutdown in docker containers
> 
>
> Key: IMPALA-8789
> URL: https://issues.apache.org/jira/browse/IMPALA-8789
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> We should include a utility script in the docker containers to trigger a 
> graceful shutdown by sending SIGRTMIN to all impalads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8958) OutOfMemoryError : Compressed class space

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8958.
---
Resolution: Cannot Reproduce

> OutOfMemoryError : Compressed class space
> -
>
> Key: IMPALA-8958
> URL: https://issues.apache.org/jira/browse/IMPALA-8958
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: hqbhoho
>Priority: Major
> Attachments: f39df03636aab574edf387906cfd5f8.png
>
>
> when I use impala with hive UDF,
> impala-shell  -k -i  worker01 , I can use UDF;
> impala-shell  -k -i  worker02 , I can't use UDF,and  will  occur  
> OutOfMemoryError : Compressed class space;
> After I restart impalad which is in worker02, It will be ok! 
> But after a while ,   I also can't use UDF in worker02 again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9236) Ported native-toolchain to work on aarch64

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9236.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Ported native-toolchain to work on aarch64
> --
>
> Key: IMPALA-9236
> URL: https://issues.apache.org/jira/browse/IMPALA-9236
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
> Fix For: Impala 4.0
>
>
> Make native-toolchain to work on aarch64 platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10144) Add a statement of platforms that Impala runs on

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10144.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add a statement of platforms that Impala runs on
> 
>
> Key: IMPALA-10144
> URL: https://issues.apache.org/jira/browse/IMPALA-10144
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now Impala can build and run all tests on arm64 successful, it's good to add 
> a statement that Impala can runs on arm64 platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9780) All FE tests should explicitly set/unset the test flag

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-9780.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> All FE tests should explicitly set/unset the test flag
> --
>
> Key: IMPALA-9780
> URL: https://issues.apache.org/jira/browse/IMPALA-9780
> Project: IMPALA
>  Issue Type: Test
>Reporter: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.0
>
>
> To avoid test failures like IMPALA-9743, instead of depending on the default 
> value of the test flag, all FE tests should explicitly set/unset the test 
> flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10134) Implement ds_hll_union_f()

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10134.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Implement ds_hll_union_f()
> --
>
> Key: IMPALA-10134
> URL: https://issues.apache.org/jira/browse/IMPALA-10134
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Adam Tamas
>Assignee: Fucun Chu
>Priority: Major
> Fix For: Impala 4.0
>
>
> Implement ds_hll_union_f() and make sure it's behaveing similarly as in Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8614) Parameterize query_test/test_kudu.py to run with/without HMS integration enabled

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8614.
---
Resolution: Later

>  Parameterize query_test/test_kudu.py to run with/without HMS integration 
> enabled
> -
>
> Key: IMPALA-8614
> URL: https://issues.apache.org/jira/browse/IMPALA-8614
> Project: IMPALA
>  Issue Type: Test
>Reporter: Hao Hao
>Priority: Major
>
> Certain query_test/test_kudu.py can be parameterized to run with/without HMS 
> integration enabled to improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7474) Tool to identify CPU bottlenecks

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7474.
---
Resolution: Won't Fix

> Tool to identify CPU bottlenecks
> 
>
> Key: IMPALA-7474
> URL: https://issues.apache.org/jira/browse/IMPALA-7474
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.12.0
>Reporter: Anuj Phadke
>Assignee: Anuj Phadke
>Priority: Major
>  Labels: supportability
>
> We run into a bunch of issues where we run impala into hangs or impacts query 
> performance issues due to a very high CPU usage.
> A tool which periodically collects stacks from impala (when enabled) and 
> prints calls with high CPU usage would be very useful for debugging such 
> issues. 
> Running this tool should ideally incur a minimalistic overhead on impalad 
> while collecting the stacks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7263) Impala Doc: Cancel Shutdown of impalad

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7263.
---
Resolution: Won't Do

> Impala Doc: Cancel Shutdown of impalad
> --
>
> Key: IMPALA-7263
> URL: https://issues.apache.org/jira/browse/IMPALA-7263
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alexandra Rodoni
>Priority: Major
>  Labels: future_release_doc, impala_user_docs_open
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7248) Quiesce impalad without shutting down

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7248.
---
Resolution: Later

> Quiesce impalad without shutting down
> -
>
> Key: IMPALA-7248
> URL: https://issues.apache.org/jira/browse/IMPALA-7248
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Priority: Minor
>
> Follow-on from IMPALA-1760 - support quiescing the impala daemon without 
> shutting it down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7249) Cancel shutdown of impalad

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7249.
---
Resolution: Later

> Cancel shutdown of impalad
> --
>
> Key: IMPALA-7249
> URL: https://issues.apache.org/jira/browse/IMPALA-7249
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Priority: Minor
>
> Following on from IMPALA-1760, it could be useful to cancel shutdown for some 
> use cases.
> An extension would be to allow extending the deadline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7262) Impala Doc: Quiesce impalad without shutting down

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7262.
---
Resolution: Won't Do

> Impala Doc: Quiesce impalad without shutting down
> -
>
> Key: IMPALA-7262
> URL: https://issues.apache.org/jira/browse/IMPALA-7262
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alexandra Rodoni
>Priority: Major
>  Labels: future_release_doc, impala_user_docs_open
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6992) Upgrade Impala's PostgreSQL JDBC driver

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6992.
---
Resolution: Duplicate

> Upgrade Impala's PostgreSQL JDBC driver
> ---
>
> Key: IMPALA-6992
> URL: https://issues.apache.org/jira/browse/IMPALA-6992
> Project: IMPALA
>  Issue Type: Task
>Reporter: Philip Martin
>Priority: Major
>
> We use a pretty old JDBC driver in the Impala build. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6616) UUIDs for query id's never seem to start with 0

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6616.
---
Resolution: Not A Bug

We use boost's UUID generator. UUIDs are not completely random - there is a 
version field. That is what we are likely seeing here - 
https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)

> UUIDs for query id's never seem to start with 0
> ---
>
> Key: IMPALA-6616
> URL: https://issues.apache.org/jira/browse/IMPALA-6616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Philip Martin
>Priority: Minor
>
> In working with a bucket of Impala profiles, I was surprised to find that I 
> had no queries whose ID began with "0". I would have expected the id space to 
> be uniformly distributed.
> Here's a simple reproduction where I run 200 queries and find that none of 
> them had a query id that began with 0.
> {code}
> $(for i in $(seq 20); do impala-shell.sh --query 'select 1;select 2;select 
> 3;select 4;select 5;select 6;select 7; select 8; select 9;select 10;' |& grep 
> -o 'query_id=.*' | awk -F= '{ print $2 }' | cut -c 1; done) | sort | uniq -c
>  12 1
>   8 2
>  14 3
>  10 4
>  17 5
>  17 6
>  12 7
>  13 8
>  12 9
>   7 a
>  14 b
>  18 c
>  11 d
>  20 e
>  15 f
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6460) More flexible memory-based admission control policies

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6460.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> More flexible memory-based admission control policies
> -
>
> Key: IMPALA-6460
> URL: https://issues.apache.org/jira/browse/IMPALA-6460
> Project: IMPALA
>  Issue Type: Epic
>  Components: Distributed Exec
>Affects Versions: Impala 2.11.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: admission-control
> Fix For: Impala 4.0
>
>
> Currently there are only two ways to decided how much memory to reserve for 
> each query in memory-based admission control:
> * Using the memory estimates, which often makes bad decisions (e.g. huge 
> overestimates) and doesn't have any enforcement on the backend
> * Using a static pool or user-set mem_limit, which is very difficult to set 
> to a reasonable value for all queries.
> The memory reservation work will allow us to come up with more powerful and 
> flexible polices



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7389) Admission control should set aside less memory on dedicated coordinator if coordinator fragment is lightweight

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7389.
---
Resolution: Duplicate

I guess I filed two version of this - IMPALA-7486

> Admission control should set aside less memory on dedicated coordinator if 
> coordinator fragment is lightweight
> --
>
> Key: IMPALA-7389
> URL: https://issues.apache.org/jira/browse/IMPALA-7389
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management
>
> The current admission control treats all backends symmetrically and sets 
> aside the mem_limit. This makes sense for now given that we have the same 
> mem_limit setting for all backends. 
> One case where this could be somewhat problematic is if you have dedicated 
> coordinators with less memory than the executors, because the coordinator's 
> process memory limit will be fully admitted before the executors.
> If you have multiple coordinators and queries are distributed between them 
> this is relatively unlikely to become a problem. If you have a single 
> coordinator this is more of an issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6465) Add more startup logging

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6465.
---
Resolution: Later

> Add more startup logging
> 
>
> Key: IMPALA-6465
> URL: https://issues.apache.org/jira/browse/IMPALA-6465
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.12.0
>Reporter: Thomas Tauber-Marshall
>Priority: Major
>
> I recently encountered a problem where Impala was failing to start up on 
> SLES12 due to an apparent kernel bug that was causing a crash when launching 
> the JVM through JNI.
> It was difficult to diagnose when an impalad was experiencing this problem, 
> in part because we don't have any logging around this particular part of 
> startup. We could improve this by adding more logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7356) Stress test for memory-based admission control

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7356.
---
Fix Version/s: Impala 3.1.0
   Resolution: Fixed

This was mostly finished.

> Stress test for memory-based admission control
> --
>
> Key: IMPALA-7356
> URL: https://issues.apache.org/jira/browse/IMPALA-7356
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: admission-control
> Fix For: Impala 3.1.0
>
>
> We should extend the existing stress test to have a new mode designed to test 
> memory-based admission control, where the stress test framework does not try 
> to throttle memory consumption but instead relies on Impala doing so. 
> The required changes would be:
> * A mode to disable throttling
> * Options for stricter pass conditions - queries should not fail with OOM 
> even if the stress test tries to submit way too many queries. 
> * However AC queue timeouts may be ok.
> * Investigation into the logic for choosing which query to run next and when 
> - does that need to change?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6420) LOCAL_FILESYSTEM missing from LOCATION statments in tests

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6420.
---
Resolution: Cannot Reproduce

> LOCAL_FILESYSTEM missing from LOCATION statments in tests
> -
>
> Key: IMPALA-6420
> URL: https://issues.apache.org/jira/browse/IMPALA-6420
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.5.0
>Reporter: Zach Amsden
>Priority: Minor
>
> Not running Impala tests during snapshot creation exposes a missing 
> LOCAL_FILESYSTEM string in tests.  This means metastore snapshots on which 
> tests are run actually paper over other test bugs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6365) Impala does not read back the numrows value set manually for kudu table.

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6365.
---
Resolution: Cannot Reproduce

> Impala does not read back the numrows value set manually for kudu table.
> 
>
> Key: IMPALA-6365
> URL: https://issues.apache.org/jira/browse/IMPALA-6365
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.8.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> Query: create TABLE pk_inline
> (
>   col1 BIGINT PRIMARY KEY,
>   col2 STRING,
>   col3 BOOLEAN
> ) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU
> Fetched 0 row(s) in 1.25s
> [nightly511-unsecure-2.gce.cloudera.com:21000] > show table stats pk_inline;
> Query: show table stats pk_inline
> ++---+--+-++
> | # Rows | Start Key | Stop Key | Leader Replica  
> | # Replicas |
> ++---+--+-++
> | -1 |   | 0001 | nightly511-unsecure-1.gce.cloudera.com:7050 
> | 3  |
> | -1 | 0001  |  | nightly511-unsecure-1.gce.cloudera.com:7050 
> | 3  |
> ++---+--+-++
> [nightly511-unsecure-2.gce.cloudera.com:21000] > alter table pk_inline set 
> tblproperties ('numRows'='3', 'STATS_GENERATED_VIA_STATS_TASK'='true');
> Query: alter table pk_inline set tblproperties ('numRows'='3', 
> 'STATS_GENERATED_VIA_STATS_TASK'='true')
> ++
> | summary|
> ++
> | Updated table. |
> ++
> Fetched 1 row(s) in 0.23s
> [nightly511-unsecure-2.gce.cloudera.com:21000] > show table stats pk_inline;
> Query: show table stats pk_inline
> ++---+--+-++
> | # Rows | Start Key | Stop Key | Leader Replica  
> | # Replicas |
> ++---+--+-++
> | -1 |   | 0001 | nightly511-unsecure-1.gce.cloudera.com:7050 
> | 3  |
> | -1 | 0001  |  | nightly511-unsecure-1.gce.cloudera.com:7050 
> | 3  |
> ++---+--+-++
> Fetched 2 row(s) in 0.05s



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6192) Add checksums for RPC sidecar payloads

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6192.
---
Resolution: Later

> Add checksums for RPC sidecar payloads
> --
>
> Key: IMPALA-6192
> URL: https://issues.apache.org/jira/browse/IMPALA-6192
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Lars Volker
>Priority: Major
>  Labels: rpc
>
> We should add some form of checksum to sidecar data in RPCs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6166) Investigate requiring support for newer instruction sets

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6166.
---
Resolution: Duplicate

> Investigate requiring support for newer instruction sets
> 
>
> Key: IMPALA-6166
> URL: https://issues.apache.org/jira/browse/IMPALA-6166
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
>  Labels: compatibility, incompatibility
>
> There may be performance benefit to compiling with flags allowing the 
> compiler to generate code for newer instruction sets like SSE4.1, AVX, CLMUL, 
> and so on.
> There will probably be code simplification, as well. See cpu-info.h.
> We might consider doing this only at a new major version. It's also worth 
> investigating (a) when the last chips without these features were shipped, 
> and (b) if Xen, VirtualBox, and so on expose that true instruction sets 
> available via /proc/cpuinfo - sometimes these are masked out



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6161) single_node_perf_run performance has high variability

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6161.
---
Resolution: Later

> single_node_perf_run performance has high variability
> -
>
> Key: IMPALA-6161
> URL: https://issues.apache.org/jira/browse/IMPALA-6161
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
>
> single_node_perf_run has different run caharacteristics for the queries 
> depening on which order they are run in. See
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/148/parameters/
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/149/parameters/
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/151/parameters/
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/152/parameters/
> When the order of the hashes changes, the performance changes as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6090) Default value for columns with schema evolution

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6090.
---
Resolution: Cannot Reproduce

> Default value for columns with schema evolution
> ---
>
> Key: IMPALA-6090
> URL: https://issues.apache.org/jira/browse/IMPALA-6090
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Zsolt Herczeg
>Priority: Minor
>
> Currently, if a query references a column which is not present for a row, 
> that row is silently ignored. In case of a schema evolution where new columns 
> has been added, it's not possible to query the new columns and the old data 
> at the same time.
> Impala could have a default value for each column, which is returned in case 
> of that column is missing instead of ignoring the row. This would allow a 
> single query to include the old data and contain the newly added columns 
> where they're present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6095) Microbenchmark for debugging KRPC issues

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6095.
---
Resolution: Later

> Microbenchmark for debugging KRPC issues
> 
>
> Key: IMPALA-6095
> URL: https://issues.apache.org/jira/browse/IMPALA-6095
> Project: IMPALA
>  Issue Type: Task
>  Components: Distributed Exec
>Reporter: Michael Ho
>Priority: Minor
>
> In order to simulate the performance and possibly the behavior in the face of 
> different types of fault (e.g. network partition, high packet loss) when 
> running with KRPC, it'd be beneficial to create some standalone client and 
> server applications which utilize the KRPC libraries so we can generate 
> artificial workload without running any queries. There is already 
> [rpc-bench.cc|https://github.com/apache/kudu/blob/master/src/kudu/rpc/rpc-bench.cc]
>  in the code which we can look into modifying for our own purposes and 
> possibly contributing back to Kudu code base if applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6079) bootstrap_system/development improvements

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6079.
---
Resolution: Later

> bootstrap_system/development improvements
> -
>
> Key: IMPALA-6079
> URL: https://issues.apache.org/jira/browse/IMPALA-6079
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Philip Martin
>Priority: Major
>
> In https://gerrit.cloudera.org/c/8262/, we discussed some additional 
> improvements that can be made to the  
> {{bin/bootstrap_{development,system}.sh}} scripts. Specifically, it would be 
> useful to de-dupe their documentation and clarify it, and maybe have ways to 
> execute parts of this scripts via command line options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5764) Prepare Impala for multiple component packages

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5764.
---
Resolution: Won't Fix

> Prepare Impala for multiple component packages
> --
>
> Key: IMPALA-5764
> URL: https://issues.apache.org/jira/browse/IMPALA-5764
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.8.0
>Reporter: Zach Amsden
>Assignee: Zach Amsden
>Priority: Major
>
> Impala currently hard-codes package locations and components that are pulled 
> separately from the native toolchain.  We'll want to move to a system that 
> allows more flexibility in what packages are pulled and lets users override 
> the default choices.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4208) Remove CDH_MAJOR_VERSION variable, adapt config paths

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4208.
---
Resolution: Fixed

> Remove CDH_MAJOR_VERSION variable, adapt config paths
> -
>
> Key: IMPALA-4208
> URL: https://issues.apache.org/jira/browse/IMPALA-4208
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: Lars Volker
>Assignee: Zach Amsden
>Priority: Major
>  Labels: asf
>
> I'm creating this separate issue related to IMPALA-4047 to track removal of 
> the CDH_MAJOR_VERSION variable from the following files:
> bin/impala-config.sh
> bin/start-impala-cluster.py
> testdata/cluster/admin
> tests/comparison/cluster.py
> Where the variable is used to parameterize a configuration path (e.g. 
> {{cdh$CDH_MAJOR_VERSION}} the config path shall be renamed to remove the 
> string 'cdh'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5151) Adding partition on impala over a table with old metadata

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5151.
---
Resolution: Fixed

We added support to pick up changes automatically with IMPALA-7954

> Adding partition on impala over a table with old metadata 
> --
>
> Key: IMPALA-5151
> URL: https://issues.apache.org/jira/browse/IMPALA-5151
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Goldszmit
>Priority: Major
>
> 1) connect on beeline and execute:
> CREATE TABLE test(id int,description string) PARTITIONED BY (year string) ;
> 2) connect on Impala and execute:
> invalidate metadata test;
> desc test;
> 3) from beeline change the table structure:
> ALTER TABLE test ADD COLUMNS  (missing_col string );
> 4) At this point Impala holds a old metadata, an impala does not know about 
> the new column "missing_col"  when execute the SQL statement:
> alter table add partition (year='2017');
> Impala will change the table structure for the table test, basically 
> reverting any changes that have been made on the table to the point from 
> impala last invalidate metadata.
> 5) from beeline session executing the command:
> desc test
> the new field "missing_col" no longer will appear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4952) Differentiate between resolved and unresolved IP addresses

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4952.
---
Resolution: Won't Fix

Nice cleanup idea but we don't need to keep it open.

> Differentiate between resolved and unresolved IP addresses
> --
>
> Key: IMPALA-4952
> URL: https://issues.apache.org/jira/browse/IMPALA-4952
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Priority: Major
>  Labels: scheduler
>
> KRPC (IMPALA-2567) requires resolved IP addresses. Since RPCs happen 
> frequently, and IP resolution can be expensive, it makes sense to retain 
> resolved addresses where possible. 
> The scheduler already knows the resolved address for every backend. However, 
> the logic is very complex, and there's not really a good way to know at 
> compile time whether or not we're using a resolved address or not.
> We can address this by adding a new type - perhaps {{ResolvedAddress : public 
> TNetworkAddress}}, and requiring that {{RpcMgr::GetProxy()}} etc take a 
> {{ResolvedAddress}}. That way the compiler will complain if we don't prove to 
> it that the address is resolved. We can make it hard to construct a 
> {{ResolvedAddress}} without actually resolving some string into an IPV4 
> address.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4875) Bound the max size of statestore UpdateState() RPCs

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4875.
---
Resolution: Won't Do

> Bound the max size of statestore UpdateState() RPCs
> ---
>
> Key: IMPALA-4875
> URL: https://issues.apache.org/jira/browse/IMPALA-4875
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Priority: Major
>
> Although IMPALA-4874 will help mitigate the effects of large messages, the 
> Statestore can still theoretically send very large messages when making 
> {{UpdateState()}} RPCs.
> We can bound the size of those payloads by only including a few topic 
> entries, up to a desired message size. This works well if those entries are 
> split up by different topic versions - we just send up to version K <= 
> topic_version on each UpdateState() call. 
> Situations where a single topic version are larger than the max size are 
> trickier. We can still stream them to clients, but the clients have to buffer 
> them until they get the complete update, at which point they can deliver the 
> full update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4140) Increase default threadpool size for statestore heartbeat and topic update

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4140.
---
Resolution: Later

We already split out priority topic updates from the main topic updates. With 
the local catalog as well, we reduced the amount of data sent via these updates 
significantly. So we likely don't need to tweak this any more.

> Increase default threadpool size for statestore heartbeat and topic update
> --
>
> Key: IMPALA-4140
> URL: https://issues.apache.org/jira/browse/IMPALA-4140
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 2.2
>Reporter: Juan Yu
>Priority: Major
>
> Default threadpool size for statestore heartbeat and topic update is 10. For 
> a large cluster this could be too small and cause membership request and 
> topic update tasks not being processed in timely manner.
> We should increase default size to probably 2~3 times of # cores.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4070) ClientCache::DoRpc() doesn't always return the right error for timeouts

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4070.
---
Resolution: Later

> ClientCache::DoRpc() doesn't always return the right error for timeouts
> ---
>
> Key: IMPALA-4070
> URL: https://issues.apache.org/jira/browse/IMPALA-4070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.7.0
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
>
> In {{DoRpc()}}:
> {code}
>  if (IsRecvTimeoutTException(e)) {
> return Status(TErrorCode::RPC_RECV_TIMEOUT, strings::Substitute(
> "Client $0 timed-out during recv call.", 
> TNetworkAddressToString(address_)));
>   }
>   // 
>   const Status& status = Reopen();
>   if (!status.ok()) {
> if (retry_is_safe != NULL) *retry_is_safe = true;
> return Status(TErrorCode::RPC_CLIENT_CONNECT_FAILURE, 
> status.GetDetail());
>   }
>   try {
> (client_->*f)(*response, request);
>   } catch (apache::thrift::TException& e) {
> // By this point the RPC really has failed.
> // TODO: Revisit this logic later. It's possible that the new 
> connection
> // works but we hit timeout here.
> // *** COULD BE A TIMEOUT EXCEPTION *
> // *** NEEDS IsRecvTimeoutTException()*
> return Status(TErrorCode::RPC_GENERAL_ERROR, e.what());
>   }
> {code}
> This mostly affects test code that tries to trigger a timeout and confirms 
> that one happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4069) Introduce startup option to create and cache backend connections on startup

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4069.
---
Resolution: Won't Fix

KRPC basically solved this issue. We could potentially create connections as 
soon as we're aware of another executor via the statestore but I don't think 
it's super-important.

> Introduce startup option to create and cache backend connections on startup
> ---
>
> Key: IMPALA-4069
> URL: https://issues.apache.org/jira/browse/IMPALA-4069
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: scalability
>
> Add impalad startup flag specifying the number of connections per backend to 
> create and cache. 
> After startup impala-server.backends.client-cache.total-clients should 
> reflect number of backends x cached connections per backend. 
> [~j...@cloudera.com] description of the problem
> {code}
> Internal Impala network connections between nodes for query execution are not 
> multiplexed. This means as the number of queries increase the number of 
> network connections increases between Impala executors. With higher #nodes, 
> the combination of query bursts and number of executors can lead to lots of 
> new connections attempts. For example, a query with 10+joins on a 100-node 
> cluster could require 1000+ connections simultaneously on coordinator.  When 
> the spike is too high or if there is not sufficient CPU available to handle 
> the bursts, this causes connection failures. 
> The total number of connections does not seem to be the issue, but there is 
> currently a practical limit on the number of simultaneous new concurrent 
> connection TCP request spikes at once. 
> Impala caches backend connections and reuse them later. With cache, the 
> simultaneous spikes of new connection request is only those above previous 
> established maximum.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3160) Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3160.
---
Resolution: Won't Fix

Seems like this is largely an academic issue at this point so no point keeping 
it open.

> Queries may not get cancelled if cancellation pool hits 
> MAX_CANCELLATION_QUEUE_SIZE
> ---
>
> Key: IMPALA-3160
> URL: https://issues.apache.org/jira/browse/IMPALA-3160
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.5.0
>Reporter: Sailesh Mukil
>Assignee: Thomas Tauber-Marshall
>Priority: Minor
>
> The ImpalaServer::MembershipCallback() function determines if a backend(s) is 
> down from the topic updates from the statestore. It also cancels all the 
> queries that are already in flight on these failed backends after comparing 
> the failed backend from the topic update to the failed backend in the 
> query_locations_ map which maps backends to queries running on it.
> If the cancellation queue is too large (tracked by 
> MAX_CANCELLATION_QUEUE_SIZE), we do not cancel the queries hoping that by the 
> next heartbeat, the cancellation queue frees up so we can re-try the 
> cancellation of these queries.
> However, by that point we already remove the failed backend from the 
> query_locations_ map. So, the next heartbeat will never find this backend to 
> cancel the queries running on it.
> {code:java}
> // Maps from query id (to be cancelled) to a list of failed Impalads that 
> are
> // the cause of the cancellation.
> map > queries_to_cancel; // : 
> LOCAL MAP
> {
>   // Build a list of queries that are running on failed hosts (as 
> evidenced by their
>   // absence from the membership list).
>   // TODO: crash-restart failures can give false negatives for failed 
> Impala demons.
>   lock_guard l(query_locations_lock_);
>   QueryLocations::const_iterator loc_entry = query_locations_.begin();
>   while (loc_entry != query_locations_.end()) {
> if (current_membership.find(loc_entry->first) == 
> current_membership.end()) {
>   unordered_set::const_iterator query_id = 
> loc_entry->second.begin();
>   // Add failed backend locations to all queries that ran on that 
> backend.
>   for(; query_id != loc_entry->second.end(); ++query_id) {
> vector& failed_hosts = 
> queries_to_cancel[*query_id];
> failed_hosts.push_back(loc_entry->first);
>   }
>   
> exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first);
>   // We can remove the location wholesale once we know backend's 
> failed. To do so
>   // safely during iteration, we have to be careful not in invalidate 
> the current
>   // iterator, so copy the iterator to do the erase(..) and advance 
> the original.
>   QueryLocations::const_iterator failed_backend = loc_entry;
>   ++loc_entry;
>   // : WE ERASE THE ENTRY FROM THE GLOBAL MAP HERE.
>   query_locations_.erase(failed_backend);
> } else {
>   ++loc_entry;
> }
>   }
> }
> if (cancellation_thread_pool_->GetQueueSize() + queries_to_cancel.size() >
> MAX_CANCELLATION_QUEUE_SIZE) {
>   // Ignore the cancellations - we'll be able to process them on the next 
> heartbeat
>   // instead.
>   LOG_EVERY_N(WARNING, 60) << "Cancellation queue is full";
>   // : WE DON'T CANCEL HERE AND BY THE NEXT HEARTBEAT, WE WON'T FIND 
> THE FAILED BACKEND AGAIN.
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-1161) Hive Server 2 error log gives inconsistent results on backend progress

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1161.
---
Resolution: Cannot Reproduce

Query lifecycle has changed a lot since this, I'll close it since it's probably 
been fixed.

> Hive Server 2 error log gives inconsistent results on backend progress
> --
>
> Key: IMPALA-1161
> URL: https://issues.apache.org/jira/browse/IMPALA-1161
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 1.4
>Reporter: Abdullah Yousufi
>Priority: Minor
>
> There appears to be some lag between a query's execution and its log results.
> For example, running the same select query for a certain table sometimes 
> accurately returns '100% Complete (3 out of 3),' but it might also return a 
> partial completion, such as '66% Complete (2 out of 3)' or '33% Complete (1 
> out of 3),' even though the results are complete each time. 
> Impyla's implementation of execute is synchronous, so it waits until the 
> query's execution status is in the finish state. Retrieving the error log 
> after this point gives inconsistent backend progress results, though waiting 
> a second before fetching the log or retrieving the error log a bit later, 
> such as after fetching the results, fixes (or perhaps masks) the issue.
> This issue also occurs in Hue.
> This is also difficult to reproduce, though I've found creating a table with 
> 10 to 30 rows and continuously fetching all rows generally results in the 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-1781) Impala does not always start-up under IPv6

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1781.
---
Resolution: Cannot Reproduce

> Impala does not always start-up under IPv6
> --
>
> Key: IMPALA-1781
> URL: https://issues.apache.org/jira/browse/IMPALA-1781
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.1.1
>Reporter: Miklos Christine
>Priority: Minor
>  Labels: usability
>
> We are repeatedly seeing these connection issues from the impala daemon logs 
> due to IPv6 bindings. 
> {code}
> I0508 10:44:48.967094 15132 thrift-util.cc:98] TSocket::read() recv()  :::192.4.26.92 Port: 35911>Connection reset by peer
> I0508 10:44:48.967281 15132 thrift-util.cc:98] TThreadedServer client died: 
> ECONNRESET
> Log file created at: 2014/05/08 10:45:29
> Running on machine: 
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0508 10:45:29.374027 16720 init.cc:72] impalad version 1.2.4 RELEASE (
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-811) add metrics identifying communication errors

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-811.
--
Resolution: Won't Fix

Good idea, but too open ended to be actionable

> add metrics identifying communication errors
> 
>
> Key: IMPALA-811
> URL: https://issues.apache.org/jira/browse/IMPALA-811
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 1.2.3
>Reporter: Chris Leroy
>Priority: Minor
>  Labels: observability
>
> This is a request for metrics in the various Impala processes that count 
> communication failures between components.
> I'd love to see counters for Impala Daemon errors communicating with HDFS, 
> other Impala Daemons, the Catalog Server, the State Store, etc. Ditto for the 
> State Store and Catalog Server's communications.
> I realize this is a slightly open ended request. I'm happy to help identify a 
> specific list of communication targets if I can be of help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-1694) Improve logging in the catalog

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1694.
---
Resolution: Later

> Improve logging in the catalog
> --
>
> Key: IMPALA-1694
> URL: https://issues.apache.org/jira/browse/IMPALA-1694
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.1
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: catalog-server, impala
>
> The existing logging in the catalog is not always very useful when trying to 
> resolve catalog issues, especially due to concurrent operations. We should 
> modify logging in the catalog to:
> 1. Record all high level operations (e.g. DDL statements, reset 
> metadata/refresh stmts, etc) that are executed in the catalog. 
> 2. Remove "spurious" logging statements that aren't very useful or assign 
> them to higher logging levels (trace).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8406) Failed REFRESH can partially modify table without bumping version number

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8406.
---
Fix Version/s: Impala 4.0
 Assignee: Quanlong Huang
   Resolution: Fixed

> Failed REFRESH can partially modify table without bumping version number
> 
>
> Key: IMPALA-8406
> URL: https://issues.apache.org/jira/browse/IMPALA-8406
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently, various incremental operations in the catalogd modify Table 
> objects in place, including REFRESH, which modifies each partition. In this 
> case, if one partition fails to refresh (eg due to incorrect partitions or 
> some other file access problem), other partitions can still be modified, 
> either because they were modified first (in a non-parallel operation) or 
> modified in parallel (for REFRESH).
> In this case, the REFRESH operation will throw an Exception back to the user, 
> but in fact it has modified the catalog entry. The version number, however, 
> is not bumped, which breaks some invariants of the catalog that an object 
> doesn't change without changing version numbers.
> This also produces some unexpected behavior such as:
> - SHOW FILES IN t;
> - REFRESH t; -- gets a failure
> - SHOW FILES in t; -- see the same result as originally
> - ALTER TABLE t SET UNCACHED; -- bumps the version number due to unrelated 
> operation
> - SHOW FILES IN t; -- the set of files has changed due to the earlier 
> partially-complete REFRESH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2807) During insert operation impala creates too many files for a table size < block size

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2807.
---
Resolution: Won't Fix

IMPALA-8125 added an option to control this. Maybe we could be smarter about 
picking parallelism to trade-off file size, but I think we need a clearer 
problem to solve.

> During insert operation impala creates too many files for a table size < 
> block size
> ---
>
> Key: IMPALA-2807
> URL: https://issues.apache.org/jira/browse/IMPALA-2807
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 2.3.0
>Reporter: Dileep Kumar
>Priority: Major
>
> When loading the "customer" table from TPC-DS based schema, total no. of 
> files created is 20 (which is equal to number of impala nodes in the cluster).
> The total size of the this table is 204.2 MiB which can fit in a single block 
> while it occupies 20 blocks in this case.
> When ran the same insert command with a single impalad running in the cluster 
> single block was able to hold all the table data and only one hdfs file was 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-3578) S3: Consider allowing table-sink to stage in HDFS when writing to S3

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3578.
---
Resolution: Won't Fix

Having HDFS +S3 co-existing is an unusual architecture, not work doing.

> S3: Consider allowing table-sink to stage in HDFS when writing to S3
> 
>
> Key: IMPALA-3578
> URL: https://issues.apache.org/jira/browse/IMPALA-3578
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 2.6.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Minor
>  Labels: performance, s3
>
> If users do not want to skip the staging step on INSERTs to S3, we could 
> allow the table sink to stage the temporary files in HDFS (if available) and 
> make the coordinator move the files to S3 on FinalizeSuccessfulInsert().
> This could improve performance in INSERTs to S3 as writes to HDFS are faster 
> than to S3 currently. Currently, when we do not skip the staging step, the 
> sinks write to a temporary loaction in S3 and the coordinator copies over 
> these files to the final location in S3 (as S3 doesn't support the rename() 
> operation). So this would bring down the number of writes to S3 from 2 to 1 
> per file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2936) Introduce table level auto create/update statistics

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2936.
---
Resolution: Won't Fix

This assumed a particular design for the epic, will just close the subtasks to 
clean things up.

> Introduce table level auto create/update statistics
> ---
>
> Key: IMPALA-2936
> URL: https://issues.apache.org/jira/browse/IMPALA-2936
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Add table level property for maintaining statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2938) Introduce resource pool/queue to run maintenance operations

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2938.
---
Resolution: Won't Fix

This assumed a particular design for the epic, will just close the subtasks to 
clean things up.

> Introduce resource pool/queue to run maintenance operations
> ---
>
> Key: IMPALA-2938
> URL: https://issues.apache.org/jira/browse/IMPALA-2938
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: scheduler, supportability
>
> Operations like compute/update stats and file compaction can be resource 
> intensive, for a better user experience these "automated to be" operations 
> should run in their own resource pool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2942) Track history for statistics create/update

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2942.
---
Resolution: Later

This assumed a particular design for the epic, will just close the subtasks to 
clean things up.

> Track history for statistics create/update
> --
>
> Key: IMPALA-2942
> URL: https://issues.apache.org/jira/browse/IMPALA-2942
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2, impala 2.3
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: supportability, usability
>
> On a Database/Table/Partition level keep track of statistics operations, so 
> that users can validate freshness of stats
> Operation should register :
> * Database name
> * Table Name
> * Partition ID
> * Statistics creation time Start/Stop
> * Duration 
> * Row count from statistics  before/after 
> * Resource pool used 
> * How much time did the operation wait in the resource pool to be scheduled 
> * Amount of data read



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2941) Introduce trigger to update stale stats

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2941.
---
Resolution: Later

This assumed a particular design for the epic, will just close the subtasks to 
clean things up.

> Introduce trigger to update stale stats
> ---
>
> Key: IMPALA-2941
> URL: https://issues.apache.org/jira/browse/IMPALA-2941
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: supportability
>
> Statistics should be updated when about 20% of the data in a table changes.
> 20% will be the default on a database level and can be overridden on a table 
> level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-2935) Introduce database level auto create/update statistics

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2935.
---
Resolution: Won't Do

This assumed a particular design for the epic, will just close the subtasks to 
clean things up.

> Introduce database level auto create/update statistics
> --
>
> Key: IMPALA-2935
> URL: https://issues.apache.org/jira/browse/IMPALA-2935
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Add database level property to update/create statistics. 
> INotify should be used for HDFS, not sure if there is something similar for 
> S3 etc.. 
> https://issues.apache.org/jira/browse/HDFS-6634



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-744) Return estimated run time in the Explain plan

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-744.
--
Resolution: Later

l

> Return estimated run time in the Explain plan
> -
>
> Key: IMPALA-744
> URL: https://issues.apache.org/jira/browse/IMPALA-744
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 1.2.3
>Reporter: Alan Choi
>Priority: Minor
>
> Having an rough estimate on the query runtime from explain plan can be very 
> useful. Knowing that query will return in seconds vs hours can help our user 
> identify potential problems in their query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-452) Add support for string concatenation operator using || construct

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-452.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Add support for string concatenation operator using || construct
> 
>
> Key: IMPALA-452
> URL: https://issues.apache.org/jira/browse/IMPALA-452
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 1.0.1
>Reporter: Hari Sekhon
>Assignee: Martin Zink
>Priority: Major
>  Labels: built-in-function, incompatibility, ramp-up, 
> sql-language, tpc-ds
> Fix For: Impala 4.0
>
>
> User has requested that we support || for string concatenation, otherwise 
> they are forced to use the concat function.
> Thanks
> Hari



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4489) Remove the ident_or_default non-terminal from the parser

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4489.
---
Resolution: Won't Fix

> Remove the ident_or_default non-terminal from the parser
> 
>
> Key: IMPALA-4489
> URL: https://issues.apache.org/jira/browse/IMPALA-4489
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: incompatibility
>
> IMPALA-3726 introduced the DEFAULT keyword. To avoid breaking applications 
> that use "DEFAULT" as an identifier for databases, tables and columns, the 
> non-terminal ident_or_default was introduced to replace IDENT in the parser. 
> Moving forward, we should disallow the use of DEFAULT and remove 
> ident_or_default. This should be a backward incompatible change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4958) Simplify binary predicates in the FE

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4958.
---
Resolution: Later

> Simplify binary predicates in the FE
> 
>
> Key: IMPALA-4958
> URL: https://issues.apache.org/jira/browse/IMPALA-4958
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: ramp-up
>
> The frontend currently allows redundant expressions like {{a < 5 and a < 3}}. 
> We should simplify these. We should also find contradictions like {{a < 3 and 
> a > 5}} reduce them to {{FALSE}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-4973) Convert UnionStmt class into to SetOperationStmt

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4973.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

Commit ea3f073881783baed17cea2d8bb718038cdfba8a in impala's branch 
refs/heads/master from Shant Hovsepian
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ea3f073 ]

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:

Added parser and analyzer tests.
Ensured no test failures or plan changes for union tests.
Added TPC-DS queries 14,38,87 to functional and planner tests.
Added functional tests test_intersect test_except
New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Reviewed-on: http://gerrit.cloudera.org:8080/16123
Tested-by: Impala Public Jenkins 
Reviewed-by: Aman Sinha 
Reviewed-by: Tim Armstrong 


> Convert UnionStmt class into to SetOperationStmt
> 
>
> Key: IMPALA-4973
> URL: https://issues.apache.org/jira/browse/IMPALA-4973
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Taras Bobrovytsky
>Priority: Major
> Fix For: Impala 4.0
>
>
> SetOperationStmt should be a new class that will be used to represent UNION 
> DISTINCT and UNION ALL operators. A SetOperationStmt should transformed into 
> a QueryStmt during the analysis rewrite phase (StmtRewriter). The resulting 
> query should capture DISTINCT by an appropriate placement of DISTINCT in the 
> Select list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5727) Join Order Optimization time increases non-linearly with the number of tables

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5727.
---
Resolution: Later

Too open ended.

> Join Order Optimization time increases non-linearly with the number of tables
> -
>
> Key: IMPALA-5727
> URL: https://issues.apache.org/jira/browse/IMPALA-5727
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Alan Choi
>Priority: Major
>  Labels: performance, planner
>
> The planning time (believed to be in join order optimization) increases 
> non-linearly with increasing number of tables in the join. By increasing the 
> number of tables in the join from 5 to 10, the planning time increases from 
> 200ms to 700+ms.
> For small data query, 700+ms planning time is significant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5726) Impala Planning time on Kudu table is much longer than HDFS table

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5726.
---
Resolution: Cannot Reproduce

Not really enough info. IMPALA-9903 is something that improved Kudu planning 
time.

> Impala Planning time on Kudu table is much longer than HDFS table
> -
>
> Key: IMPALA-5726
> URL: https://issues.apache.org/jira/browse/IMPALA-5726
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Alan Choi
>Priority: Major
>  Labels: kudu
>
> The query planning time for Kudu table is much longer than HDFS table. In a 
> query that "union all" 38 Kudu tables (i.e. select * from kudu_table union 
> all select * from... union all select * from kudu_table), the planning time 
> took 1.2sec. When switched to HDFS table, the planning time is way less.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-5637) Hive will start parsing "EXTERNAL" tbl property as case insensitive

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5637.
---
Resolution: Won't Do

> Hive will start parsing "EXTERNAL" tbl property as case insensitive
> ---
>
> Key: IMPALA-5637
> URL: https://issues.apache.org/jira/browse/IMPALA-5637
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Jacobs
>Priority: Minor
>
> Newer versions of Hive (3.0 and 2.4) will start handling the EXTERNAL table 
> property in a case insensitive manner: HIVE-16324
> While Impala master currently is tested against Hive 1.1, we should check if 
> there are any code paths that will break after this change, i.e. where Impala 
> might assume that "EXTERNAL" is upper-case per the behavior of Hive before 
> HIVE-16324, and make sure the code will work as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6145) Assign runtime filter ids lazily

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6145.
---
Resolution: Won't Fix

> Assign runtime filter ids lazily
> 
>
> Key: IMPALA-6145
> URL: https://issues.apache.org/jira/browse/IMPALA-6145
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Thomas Tauber-Marshall
>Priority: Minor
>
> Currently, when generating runtime filters we iterate over the plan tree, 
> create a filter for each equi join predicate of each hash join node, and 
> assign them ascending ids.
> Not all of the created filters always end up assigned to a target scan node, 
> and those that aren't assigned are left out of the plan, resulting in gaps in 
> the sequence of filter ids in the explain output, which could be confusing to 
> users.
> It would be better to assign filter ids only to filters that end up in the 
> final plan, so that the max filter id in the plan is equal to the total 
> number of filters in the plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6440) Impala cannot read / write HBase tables when metadata is created with newer versions of Hive

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6440.
---
Resolution: Cannot Reproduce

> Impala cannot read / write HBase tables when metadata is created with newer 
> versions of Hive
> 
>
> Key: IMPALA-6440
> URL: https://issues.apache.org/jira/browse/IMPALA-6440
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Zach Amsden
>Assignee: Adrian Ng
>Priority: Major
>
> Due to https://issues.apache.org/jira/browse/HIVE-18366 the way we fetch 
> table properties needs to be changed.  Ideally this should be backwards 
> compatible to allow both newer and older versions of Hive to be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6445) Whitespace should be stripped or detected in kudu master addresses metadata

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6445.
---
Fix Version/s: Impala 2.12.0
   Resolution: Fixed

> Whitespace should be stripped or detected in kudu master addresses metadata
> ---
>
> Key: IMPALA-6445
> URL: https://issues.apache.org/jira/browse/IMPALA-6445
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Todd Lipcon
>Priority: Major
> Fix For: Impala 2.12.0
>
>
> Currently the kudu master list metadata is split on ',' and directly fed 
> through to Kudu. This means that if a user specifies a list such as "m1, m2, 
> m3" with spaces after the commas, it will pass those hosts on to Kudu as 
> "m1", " m2", and " m3". Two of those three hostnames are of course invalid 
> and Kudu will only be able to connect when m1 is the active master.
> We should either strip those spaces or detect this case and throw an error on 
> the bad metadata. (I prefer stripping)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10210) Avoid authentication for connection from a trusted domain over http

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10210.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Avoid authentication for connection from a trusted domain over http
> ---
>
> Key: IMPALA-10210
> URL: https://issues.apache.org/jira/browse/IMPALA-10210
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Add the ability to skip authentication over Http for the both hs2 and the 
> Impala debug web service.
> The current idea is to still require that the client specify a username in 
> its request via a basic auth header so impala can attribute the connection to 
> that username.
> This change should also add the ability to use the "X-Forwarded-For" header 
> to get the  the real client ip address in case proxies are used in between.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6923) Update/Cleanup $IMPALA_HOME/tests/benchmark folder

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6923.
---
Resolution: Fixed

> Update/Cleanup  $IMPALA_HOME/tests/benchmark folder
> ---
>
> Key: IMPALA-6923
> URL: https://issues.apache.org/jira/browse/IMPALA-6923
> Project: IMPALA
>  Issue Type: Task
>Reporter: nithya
>Assignee: nithya
>Priority: Major
>
> Out of the 3 scripts in the benchmark folder (report_benchmark_results.py, 
> create_database.py and perf_result_datastore.py), only 
> report_benchmark_results.py is currently being used upstream to generate a 
> report comparing performance benchmark numbers between two given runs of the 
> performance tests. The other two scripts have some code that inserts some 
> metrics from a given performance test run to a database on a specified impala 
> instance. But these scripts depend on some internal resources to generate a 
> meaningful interpretation of these metrics and these resources are not 
> available to external apache community. Hence removing these scripts. While 
> removing these scripts report_benchmark_results.py needs to be cleaned up to 
> remove any code pointing to these scripts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6207) Avro - new column added from impala does not show up in describe on impala

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6207.
---
Resolution: Not A Problem

I believe this is the intended behaviour when you specify the avro schema

> Avro - new column added from impala does not show up in describe on impala
> --
>
> Key: IMPALA-6207
> URL: https://issues.apache.org/jira/browse/IMPALA-6207
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.8.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> When a new column is added to avro table, it does not show up on immediate 
> alter command.
> {code}
> [host-10-17-101-175.coe.cloudera.com:21000] > create table avro_1 stored as 
> avro tblproperties('avro.schema.url'='/user/admin/test_schema.avsc'); 
> Query: create table avro_1 stored as avro 
> tblproperties('avro.schema.url'='/user/admin/test_schema.avsc') 
> WARNINGS: Ignoring column definitions in favor of Avro schema. 
> The Avro schema has 2 column(s) but 0 column definition(s) were given. 
> Fetched 0 row(s) in 0.05s 
> [host-10-17-101-175.coe.cloudera.com:21000] > describe avro_1; 
> Query: describe avro_1 
> +--++---+ 
> | name | type | comment | 
> +--++---+ 
> | a | int | from deserializer | 
> | b | string | from deserializer | 
> +--++---+ 
> Fetched 2 row(s) in 3.74s 
> {code}
> New column added
> {code}
> [host-10-17-101-175.coe.cloudera.com:21000] > alter table avro_1 add columns 
> (c int) ; 
> Query: alter table avro_1 add columns (c int) 
> Fetched 0 row(s) in 0.10s 
> [host-10-17-101-175.coe.cloudera.com:21000] > describe avro_1; 
> Query: describe avro_1 
> +--++---+ 
> | name | type | comment | 
> +--++---+ 
> | a | int | from deserializer | 
> | b | string | from deserializer | 
> +--++---+ 
> Fetched 2 row(s) in 0.00s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-6797) Update Sentry version for ULP

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6797.
---
Resolution: Won't Do

> Update Sentry version for ULP
> -
>
> Key: IMPALA-6797
> URL: https://issues.apache.org/jira/browse/IMPALA-6797
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Adam Holley
>Priority: Major
>
> Update to the Sentry version that supports ULP.  TBD



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-7133) Beeswax methods return Default TException instead of real exception

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7133.
---
Resolution: Won't Fix

We're moving away from beeswax so not worth fixing at this point - MPALA-10074

> Beeswax methods return Default TException instead of real exception
> ---
>
> Key: IMPALA-7133
> URL: https://issues.apache.org/jira/browse/IMPALA-7133
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Fredy Wijaya
>Priority: Major
>  Labels: usability
>
> An instance of this bug is that we see a "Default TException" error instead 
> of "Query Id ... Not Found" because of the following issue:
> * get_results_metadata() does not declare that it throws BeeswaxException (it 
> declares throwing QueryNotFoundException) 
> * But our implementation actually throws that in the "Query Id ... Not Found" 
> case
> * The Thrift C++ server has some logic that calls .what() on unknown 
> exceptions and wraps them into a TException
> * On 3.x, we have THRIFT-727 in Thrift 0.9.3, which implements .what(), so 
> the original error message is present albeit wrapped in some garbage. 
> * On 2.x .what() does not return any useful information, so we just get a 
> weird empty message



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8080) Improve planner to use disk attributes when applicable

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8080.
---
Resolution: Later

> Improve planner to use disk attributes when applicable
> --
>
> Key: IMPALA-8080
> URL: https://issues.apache.org/jira/browse/IMPALA-8080
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Janaki Lahorani
>Priority: Major
>
> The disk seek time can vary depending on the underlying storage (Magnetic 
> Disk, SSD, Flash etc) and the scan performance can change depending on the 
> presence of buffer cache.  The plan generation, Degree of Parallelism, and 
> scheduling should look at these aspects to decide on the appropriate course 
> of action.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8709) Add Damerau-Levenshtein edit distance built-in function

2020-12-21 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8709.
---
Resolution: Fixed

> Add Damerau-Levenshtein edit distance built-in function
> ---
>
> Key: IMPALA-8709
> URL: https://issues.apache.org/jira/browse/IMPALA-8709
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Greg Rahn
>Assignee: Greg Rahn
>Priority: Major
>  Labels: built-in-function
> Fix For: Impala 3.4.0
>
>
> Algo (restricted DL / optimal string alignment)
>  [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance]
> References:
>  
> [https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_functions_expressions_fuzzy_funcs.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >