[jira] [Assigned] (IMPALA-12356) Partition created by INSERT will make the next ALTER_PARTITION event on it always treated as self-event

2023-09-11 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K reassigned IMPALA-12356:
--

Assignee: Venugopal Reddy K

> Partition created by INSERT will make the next ALTER_PARTITION event on it 
> always treated as self-event
> ---
>
> Key: IMPALA-12356
> URL: https://issues.apache.org/jira/browse/IMPALA-12356
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Venugopal Reddy K
>Priority: Critical
>  Labels: ramp-up
>
> In Impala, create a partitioned table and create one partition in it using 
> {*}INSERT{*}:
> {noformat}
> create table my_part (i int) partitioned by (p int) stored as parquet;
> insert into my_part partition(p=0) values (0),(1),(2);
> show partitions my_part
> +---+---++--+--+---+-+---+---+---+
> | p | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  
> | Incremental stats | Location  | EC 
> Policy |
> +---+---++--+--+---+-+---+---+---+
> | 0 | -1| 1  | 358B | NOT CACHED   | NOT CACHED| PARQUET 
> | false | hdfs://localhost:20500/test-warehouse/my_part/p=0 | 
> NONE  |
> | Total | -1| 1  | 358B | 0B   |   | 
> |   |   | 
>   |
> +---+---++--+--+---+-+---+---+---+
> {noformat}
> In Hive, describe the partition. We can see parameters of 
> "impala.events.catalogServiceId" and "impala.events.catalogVersion" added by 
> Impala. This is ok.
> {noformat}
> hive> desc formatted my_part partition(p=0);
> +---++---+
> | col_name  | data_type   
>|  comment  |
> +---++---+
> | i | int 
>|   |
> |   | NULL
>| NULL  |
> | # Partition Information   | NULL
>| NULL  |
> | # col_name| data_type   
>| comment   |
> | p | int 
>|   |
> |   | NULL
>| NULL  |
> | # Detailed Partition Information  | NULL
>| NULL  |
> | Partition Value:  | [0] 
>| NULL  |
> | Database: | default 
>| NULL  |
> | Table:| my_part 
>| NULL  |
> | CreateTime:   | Wed Aug 09 15:24:50 CST 2023
>| NULL  |
> | LastAccessTime:   | UNKNOWN 
>| NULL  |
> | Location: | 
> hdfs://localhost:20500/test-warehouse/my_part/p=0  | NULL 
>  |
> | Partition Parameters: | NULL
>| NULL  |
> |   | impala.events.catalogServiceId  
>| eab33ebb8a14cfd:8b2bdc12df3568df  |
> |   | impala.events.catalogVersion
>| 1882  |
> |   | numFiles
>| 1 |
> |   | 

[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-11 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763986#comment-17763986
 ] 

Maxwell Guo commented on IMPALA-12402:
--

Sorry , this is my first time to use gerrit to push code. I have use the same 
Change-Id agagin. [~MikaelSmith] Thanks for reminding

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12444) PROCESSING_COST_MIN_THREADS can get ignored by scan fragment.

2023-09-11 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-12444:
-

 Summary: PROCESSING_COST_MIN_THREADS can get ignored by scan 
fragment.
 Key: IMPALA-12444
 URL: https://issues.apache.org/jira/browse/IMPALA-12444
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 4.2.0
Reporter: Riza Suminto
Assignee: Riza Suminto


There is a bug in PlanFragment.java where scan fragment might not follow 
PROCESSING_COST_MIN_THREADS set by user even if total scan ranges allow to do 
so.

Frontend planner also need to sanity check such that 
PROCESSING_COST_MIN_THREADS <= MAX_FRAGMENT_INSTANCES_PER_NODE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9118) Add debug page for in-flight DDLs in catalogd

2023-09-11 Thread Abhishek Rawat (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rawat updated IMPALA-9118:
---
Priority: Critical  (was: Major)

> Add debug page for in-flight DDLs in catalogd
> -
>
> Key: IMPALA-9118
> URL: https://issues.apache.org/jira/browse/IMPALA-9118
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: observability, supportability
> Attachments: Selection_082.png
>
>
> In a busy cluster, it's possible that many DDL/DML queries keep in the 
> CREATED state for several minutes. Especially when using with sync_ddl=true, 
> tens of minutes are also possible. They may be waiting for the ExecDdl RPC to 
> catalogd to finish.
> It'd be helpful for debugging DDL/DML hangs if we can show the in-flight DDLs 
> in catalogd. I think the following fields are important:
>  * thread id
>  * coordinator
>  * db name / table name
>  * ddl type, e.g. AddPartition, DropTable, CreateTable, etc. More types 
> [here|https://github.com/apache/impala/blob/3.3.0/common/thrift/JniCatalog.thrift#L31].
>  * last event, e.g. waiting for table lock, got table lock, loading file 
> metadata, waiting for sync ddl version etc.
>  * start time
>  * time elapsed
>  * (optional) params link to show the TDdlExecRequest in json format
> It'd be better to also include running REFRESH/INVALIDATE METADATA commands 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12443) Add catalog timeline for all DDL profiles

2023-09-11 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-12443:
---

 Summary: Add catalog timeline for all DDL profiles
 Key: IMPALA-12443
 URL: https://issues.apache.org/jira/browse/IMPALA-12443
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


We've added the catalog timeline in profiles of CreateTable statements 
(IMPALA-12024). This Jira targets at adding such timeline for all other DDL 
profiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12411) TSAN ThreadSanitizer: data race during expr-test teardown

2023-09-11 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763968#comment-17763968
 ] 

Michael Smith commented on IMPALA-12411:


Ok, this looks like a quirk of the ImpaladQueryExecutor, which is only used in 
expr-test.cc.

ImpaladQueryExecutor closes the previous query when you start the next query. 
It relies on the destructor to close the last query. expr-test creates 
ImpaladQueryExecutor as a global as part of setting up an in-memory cluster 
(with Statestore and InProcessImpalaServer). So the last query isn't guaranteed 
to be closed until global destruction, which leads to a race with other global 
destruction.

This is pretty easy to fix with TearDownTestCase.

> TSAN ThreadSanitizer: data race during expr-test teardown
> -
>
> Key: IMPALA-12411
> URL: https://issues.apache.org/jira/browse/IMPALA-12411
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.3.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
> Attachments: expr-test-tsan-failure.log
>
>
> The racing threads are
> {code:java}
> 20:14:05   Read of size 8 at 0x0a8d3348 by main thread:
> 20:14:05 #0 std::vector std::allocator >::~vector() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:680:54
>  (unifiedbetests+0x3fcd9b9)
> 20:14:05 #1 
> impala::TGetJvmMemoryMetricsResponse::~TGetJvmMemoryMetricsResponse() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.cpp:4158:1
>  (unifiedbetests+0x3fc1397)
> 20:14:05 #2 impala::JvmMetricCache::~JvmMetricCache() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.h:170:7
>  (unifiedbetests+0x4b2989d)
> 20:14:05 #3 at_exit_wrapper(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p7/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:361:31
>  (unifiedbetests+0x21b3554)
> {code}
> and
> {code:java}
> 20:14:05   Previous write of size 8 at 0x0a8d3348 by thread T586:
> 20:14:05 #0 std::vector std::allocator 
> >::_M_erase_at_end(impala::TJvmMemoryPool*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1798:30
>  (unifiedbetests+0x4afabcc)
> 20:14:05 #1 std::vector std::allocator >::clear() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1499:9
>  (unifiedbetests+0x4afa4b4)
> 20:14:05 #2 unsigned int 
> impala::TGetJvmMemoryMetricsResponse::read(apache::thrift::protocol::TProtocol*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.tcc:5673:32
>  (unifiedbetests+0x4afa21d)
> 20:14:05 #3 impala::Status 
> impala::DeserializeThriftMsg(unsigned 
> char const*, unsigned int*, bool, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/thrift-util.h:136:23
>  (unifiedbetests+0x4af9da6)
> 20:14:05 #4 impala::Status 
> impala::DeserializeThriftMsg(JNIEnv_*, 
> _jbyteArray*, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/jni-thrift-util.h:61:3
>  (unifiedbetests+0x4af9c62)
> 20:14:05 #5 impala::Status 
> impala::JniCall::ObjectToResult(_jobject*,
>  impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:493:3
>  (unifiedbetests+0x4af9b24)
> 20:14:05 #6 impala::Status 
> impala::JniCall::Call(impala::TGetJvmMemoryMetricsResponse*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:486:3
>  (unifiedbetests+0x4af92a6)
> 20:14:05 #7 
> impala::JniUtil::GetJvmMemoryMetrics(impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.cc:299:72
>  (unifiedbetests+0x4af89a3)
> 20:14:05 #8 impala::JvmMetricCache::GrabMetricsIfNecessary() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:294:19
>  (unifiedbetests+0x4b2780d)
> 20:14:05 #9 impala::JvmMetricCache::GetCounterMetric(long 
> (*)(impala::TGetJvmMemoryMetricsResponse const&)) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:305:3
>  (unifiedbetests+0x4b27711)
> 

[jira] [Work started] (IMPALA-12411) TSAN ThreadSanitizer: data race during expr-test teardown

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12411 started by Michael Smith.
--
> TSAN ThreadSanitizer: data race during expr-test teardown
> -
>
> Key: IMPALA-12411
> URL: https://issues.apache.org/jira/browse/IMPALA-12411
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.3.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
> Attachments: expr-test-tsan-failure.log
>
>
> The racing threads are
> {code:java}
> 20:14:05   Read of size 8 at 0x0a8d3348 by main thread:
> 20:14:05 #0 std::vector std::allocator >::~vector() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:680:54
>  (unifiedbetests+0x3fcd9b9)
> 20:14:05 #1 
> impala::TGetJvmMemoryMetricsResponse::~TGetJvmMemoryMetricsResponse() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.cpp:4158:1
>  (unifiedbetests+0x3fc1397)
> 20:14:05 #2 impala::JvmMetricCache::~JvmMetricCache() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.h:170:7
>  (unifiedbetests+0x4b2989d)
> 20:14:05 #3 at_exit_wrapper(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p7/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:361:31
>  (unifiedbetests+0x21b3554)
> {code}
> and
> {code:java}
> 20:14:05   Previous write of size 8 at 0x0a8d3348 by thread T586:
> 20:14:05 #0 std::vector std::allocator 
> >::_M_erase_at_end(impala::TJvmMemoryPool*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1798:30
>  (unifiedbetests+0x4afabcc)
> 20:14:05 #1 std::vector std::allocator >::clear() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1499:9
>  (unifiedbetests+0x4afa4b4)
> 20:14:05 #2 unsigned int 
> impala::TGetJvmMemoryMetricsResponse::read(apache::thrift::protocol::TProtocol*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.tcc:5673:32
>  (unifiedbetests+0x4afa21d)
> 20:14:05 #3 impala::Status 
> impala::DeserializeThriftMsg(unsigned 
> char const*, unsigned int*, bool, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/thrift-util.h:136:23
>  (unifiedbetests+0x4af9da6)
> 20:14:05 #4 impala::Status 
> impala::DeserializeThriftMsg(JNIEnv_*, 
> _jbyteArray*, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/jni-thrift-util.h:61:3
>  (unifiedbetests+0x4af9c62)
> 20:14:05 #5 impala::Status 
> impala::JniCall::ObjectToResult(_jobject*,
>  impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:493:3
>  (unifiedbetests+0x4af9b24)
> 20:14:05 #6 impala::Status 
> impala::JniCall::Call(impala::TGetJvmMemoryMetricsResponse*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:486:3
>  (unifiedbetests+0x4af92a6)
> 20:14:05 #7 
> impala::JniUtil::GetJvmMemoryMetrics(impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.cc:299:72
>  (unifiedbetests+0x4af89a3)
> 20:14:05 #8 impala::JvmMetricCache::GrabMetricsIfNecessary() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:294:19
>  (unifiedbetests+0x4b2780d)
> 20:14:05 #9 impala::JvmMetricCache::GetCounterMetric(long 
> (*)(impala::TGetJvmMemoryMetricsResponse const&)) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:305:3
>  (unifiedbetests+0x4b27711)
> 20:14:05 #10 impala::JvmMemoryCounterMetric::GetValue() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:270:41
>  (unifiedbetests+0x4b276bf)
> 20:14:05 #11 
> impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
> impala::TExecPlanFragmentInfo const&)::$_3::operator()() const 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/runtime/query-state.cc:185:50
>  (unifiedbetests+0x455bc15)
> 20:14:05 #12 
> boost::detail::function::function_obj_invoker0  const*, 

[jira] [Commented] (IMPALA-12411) TSAN ThreadSanitizer: data race during expr-test teardown

2023-09-11 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763966#comment-17763966
 ] 

Michael Smith commented on IMPALA-12411:


>From the stack, somewhere a query was started via QueryExecMgr::StartQuery, 
>which spawned a thread to run QueryExecMgr::ExecuteQueryHelper, and that query 
>didn't finish until we were shutting down. It then tried to get a metric value 
>while updating query state, which accessed the JvmMetricCache at the same time 
>it was shutting down. Not sure I have enough info to determine where the query 
>came from.

> TSAN ThreadSanitizer: data race during expr-test teardown
> -
>
> Key: IMPALA-12411
> URL: https://issues.apache.org/jira/browse/IMPALA-12411
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.3.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
> Attachments: expr-test-tsan-failure.log
>
>
> The racing threads are
> {code:java}
> 20:14:05   Read of size 8 at 0x0a8d3348 by main thread:
> 20:14:05 #0 std::vector std::allocator >::~vector() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:680:54
>  (unifiedbetests+0x3fcd9b9)
> 20:14:05 #1 
> impala::TGetJvmMemoryMetricsResponse::~TGetJvmMemoryMetricsResponse() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.cpp:4158:1
>  (unifiedbetests+0x3fc1397)
> 20:14:05 #2 impala::JvmMetricCache::~JvmMetricCache() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.h:170:7
>  (unifiedbetests+0x4b2989d)
> 20:14:05 #3 at_exit_wrapper(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p7/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:361:31
>  (unifiedbetests+0x21b3554)
> {code}
> and
> {code:java}
> 20:14:05   Previous write of size 8 at 0x0a8d3348 by thread T586:
> 20:14:05 #0 std::vector std::allocator 
> >::_M_erase_at_end(impala::TJvmMemoryPool*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1798:30
>  (unifiedbetests+0x4afabcc)
> 20:14:05 #1 std::vector std::allocator >::clear() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1499:9
>  (unifiedbetests+0x4afa4b4)
> 20:14:05 #2 unsigned int 
> impala::TGetJvmMemoryMetricsResponse::read(apache::thrift::protocol::TProtocol*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.tcc:5673:32
>  (unifiedbetests+0x4afa21d)
> 20:14:05 #3 impala::Status 
> impala::DeserializeThriftMsg(unsigned 
> char const*, unsigned int*, bool, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/thrift-util.h:136:23
>  (unifiedbetests+0x4af9da6)
> 20:14:05 #4 impala::Status 
> impala::DeserializeThriftMsg(JNIEnv_*, 
> _jbyteArray*, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/jni-thrift-util.h:61:3
>  (unifiedbetests+0x4af9c62)
> 20:14:05 #5 impala::Status 
> impala::JniCall::ObjectToResult(_jobject*,
>  impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:493:3
>  (unifiedbetests+0x4af9b24)
> 20:14:05 #6 impala::Status 
> impala::JniCall::Call(impala::TGetJvmMemoryMetricsResponse*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:486:3
>  (unifiedbetests+0x4af92a6)
> 20:14:05 #7 
> impala::JniUtil::GetJvmMemoryMetrics(impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.cc:299:72
>  (unifiedbetests+0x4af89a3)
> 20:14:05 #8 impala::JvmMetricCache::GrabMetricsIfNecessary() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:294:19
>  (unifiedbetests+0x4b2780d)
> 20:14:05 #9 impala::JvmMetricCache::GetCounterMetric(long 
> (*)(impala::TGetJvmMemoryMetricsResponse const&)) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:305:3
>  (unifiedbetests+0x4b27711)
> 20:14:05 #10 impala::JvmMemoryCounterMetric::GetValue() 
> 

[jira] [Commented] (IMPALA-12411) TSAN ThreadSanitizer: data race during expr-test teardown

2023-09-11 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763964#comment-17763964
 ] 

Michael Smith commented on IMPALA-12411:


Something calls QueryState::Init while at_exit_wrapper is called. I'll take a 
look at the test and see if something's not waiting for a query to finish.

> TSAN ThreadSanitizer: data race during expr-test teardown
> -
>
> Key: IMPALA-12411
> URL: https://issues.apache.org/jira/browse/IMPALA-12411
> Project: IMPALA
>  Issue Type: Bug
>  Components: be
>Affects Versions: Impala 4.3.0
>Reporter: Andrew Sherman
>Assignee: Michael Smith
>Priority: Critical
> Attachments: expr-test-tsan-failure.log
>
>
> The racing threads are
> {code:java}
> 20:14:05   Read of size 8 at 0x0a8d3348 by main thread:
> 20:14:05 #0 std::vector std::allocator >::~vector() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:680:54
>  (unifiedbetests+0x3fcd9b9)
> 20:14:05 #1 
> impala::TGetJvmMemoryMetricsResponse::~TGetJvmMemoryMetricsResponse() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.cpp:4158:1
>  (unifiedbetests+0x3fc1397)
> 20:14:05 #2 impala::JvmMetricCache::~JvmMetricCache() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.h:170:7
>  (unifiedbetests+0x4b2989d)
> 20:14:05 #3 at_exit_wrapper(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p7/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:361:31
>  (unifiedbetests+0x21b3554)
> {code}
> and
> {code:java}
> 20:14:05   Previous write of size 8 at 0x0a8d3348 by thread T586:
> 20:14:05 #0 std::vector std::allocator 
> >::_M_erase_at_end(impala::TJvmMemoryPool*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1798:30
>  (unifiedbetests+0x4afabcc)
> 20:14:05 #1 std::vector std::allocator >::clear() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/lib/gcc/x86_64-pc-linux-gnu/10.4.0/../../../../include/c++/10.4.0/bits/stl_vector.h:1499:9
>  (unifiedbetests+0x4afa4b4)
> 20:14:05 #2 unsigned int 
> impala::TGetJvmMemoryMetricsResponse::read(apache::thrift::protocol::TProtocol*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/generated-sources/gen-cpp/Frontend_types.tcc:5673:32
>  (unifiedbetests+0x4afa21d)
> 20:14:05 #3 impala::Status 
> impala::DeserializeThriftMsg(unsigned 
> char const*, unsigned int*, bool, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/thrift-util.h:136:23
>  (unifiedbetests+0x4af9da6)
> 20:14:05 #4 impala::Status 
> impala::DeserializeThriftMsg(JNIEnv_*, 
> _jbyteArray*, impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/rpc/jni-thrift-util.h:61:3
>  (unifiedbetests+0x4af9c62)
> 20:14:05 #5 impala::Status 
> impala::JniCall::ObjectToResult(_jobject*,
>  impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:493:3
>  (unifiedbetests+0x4af9b24)
> 20:14:05 #6 impala::Status 
> impala::JniCall::Call(impala::TGetJvmMemoryMetricsResponse*)
>  
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.h:486:3
>  (unifiedbetests+0x4af92a6)
> 20:14:05 #7 
> impala::JniUtil::GetJvmMemoryMetrics(impala::TGetJvmMemoryMetricsResponse*) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/jni-util.cc:299:72
>  (unifiedbetests+0x4af89a3)
> 20:14:05 #8 impala::JvmMetricCache::GrabMetricsIfNecessary() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:294:19
>  (unifiedbetests+0x4b2780d)
> 20:14:05 #9 impala::JvmMetricCache::GetCounterMetric(long 
> (*)(impala::TGetJvmMemoryMetricsResponse const&)) 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:305:3
>  (unifiedbetests+0x4b27711)
> 20:14:05 #10 impala::JvmMemoryCounterMetric::GetValue() 
> /data/jenkins/workspace/impala-cdw-master-core-tsan/repos/Impala/be/src/util/memory-metrics.cc:270:41
>  (unifiedbetests+0x4b276bf)
> 20:14:05 #11 
> impala::QueryState::Init(impala::ExecQueryFInstancesRequestPB const*, 
> impala::TExecPlanFragmentInfo const&)::$_3::operator()() const 
> 

[jira] [Updated] (IMPALA-12442) Avoid running stress tests twice

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12442:
---
Description: 
run-tests.py runs tests in phases. From its "main" block
{quote}
First run[s] query tests that need to be executed serially: -m execute_serially
Run[s] the stress tests tests: -m stress
Run[s] the remaining query tests in parallel: -m not execute_serially and not 
stress
{quote}

Most of Impala's stress tests are marked with both "stress" and 
"execute_serially", which means for example that 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
{{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
 even though exploration_strategy=core jobs skip stress tests.

When we run exhaustive tests, it runs the stress tests twice. Marking the test 
with both stress and execute_serially also means they'll be run serially during 
the serial run, and concurrently during the stress test.


  was:
run-tests.py runs tests in phases. From its "main" block
{quote}
First run[s] query tests that need to be executed serially: -m execute_serially
Run[s] the stress tests tests: -m stress
Run[s] the remaining query tests in parallel: -m not execute_serially and not 
stress
{quote}

Most of Impala's stress tests are marked with both "stress" and 
"execute_serially", which means for example that 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
{{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
 even though exploration_strategy=core jobs skip stress tests.

When we run exhaustive tests, it runs the stress tests twice.



> Avoid running stress tests twice
> 
>
> Key: IMPALA-12442
> URL: https://issues.apache.org/jira/browse/IMPALA-12442
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Priority: Major
>
> run-tests.py runs tests in phases. From its "main" block
> {quote}
> First run[s] query tests that need to be executed serially: -m 
> execute_serially
> Run[s] the stress tests tests: -m stress
> Run[s] the remaining query tests in parallel: -m not execute_serially and not 
> stress
> {quote}
> Most of Impala's stress tests are marked with both "stress" and 
> "execute_serially", which means for example that 
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
> {{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
>  even though exploration_strategy=core jobs skip stress tests.
> When we run exhaustive tests, it runs the stress tests twice. Marking the 
> test with both stress and execute_serially also means they'll be run serially 
> during the serial run, and concurrently during the stress test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12442) Avoid running stress tests twice

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12442:
---
Description: 
run-tests.py runs tests in phases. From its "main" block
{quote}
First run[s] query tests that need to be executed serially: -m execute_serially
Run[s] the stress tests tests: -m stress
Run[s] the remaining query tests in parallel: -m not execute_serially and not 
stress
{quote}

Most of Impala's stress tests are marked with both "stress" and 
"execute_serially", which means for example that 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
{{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
 even though exploration_strategy=core jobs skip stress tests.

When we run exhaustive tests, it runs the stress tests twice.


  was:
run-tests.py runs tests in phases. From its "main" block
{quote}
First run[s] query tests that need to be executed serially: -m execute_serially
Run[s] the stress tests tests: -m stress
Run[s] the remaining query tests in parallel: -m not execute_serially and not 
stress
{quote}

Most of Impala's stress tests are marked with both "stress" and 
"execute_serially", which means for example that 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
{{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
 even though exploration_strategy=core jobs skip stress tests.



> Avoid running stress tests twice
> 
>
> Key: IMPALA-12442
> URL: https://issues.apache.org/jira/browse/IMPALA-12442
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Priority: Major
>
> run-tests.py runs tests in phases. From its "main" block
> {quote}
> First run[s] query tests that need to be executed serially: -m 
> execute_serially
> Run[s] the stress tests tests: -m stress
> Run[s] the remaining query tests in parallel: -m not execute_serially and not 
> stress
> {quote}
> Most of Impala's stress tests are marked with both "stress" and 
> "execute_serially", which means for example that 
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
> {{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
>  even though exploration_strategy=core jobs skip stress tests.
> When we run exhaustive tests, it runs the stress tests twice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12442) Avoid running stress tests twice

2023-09-11 Thread Michael Smith (Jira)
Michael Smith created IMPALA-12442:
--

 Summary: Avoid running stress tests twice
 Key: IMPALA-12442
 URL: https://issues.apache.org/jira/browse/IMPALA-12442
 Project: IMPALA
  Issue Type: Task
Reporter: Michael Smith


run-tests.py runs tests in phases. From its "main" block
{quote}
First run[s] query tests that need to be executed serially: -m execute_serially
Run[s] the stress tests tests: -m stress
Run[s] the remaining query tests in parallel: -m not execute_serially and not 
stress
{quote}

Most of Impala's stress tests are marked with both "stress" and 
"execute_serially", which means for example that 
https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/338/consoleText runs 
{{metadata/test_recursive_listing.py::TestRecursiveListing::test_large_staging_dirs}}
 even though exploration_strategy=core jobs skip stress tests.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12432) Keep LdapKerberosImpalaShellTest* compatible with older guava versions

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12432.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

This was a test-only change.

> Keep LdapKerberosImpalaShellTest* compatible with older guava versions
> --
>
> Key: IMPALA-12432
> URL: https://issues.apache.org/jira/browse/IMPALA-12432
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> LdapKerberosImpalaShellTestBase.java and LdapKerberosImpalaShellTest.java use 
> the ImmutableMap.of function with 8+ pairs. Older versions of guava like 
> 28.1-jre do not have ImmutableMap.of() for that number of arguments.
> Since we often want to use the guava version that the underlying Hadoop/Hive 
> use, it can be useful for compatibility to be able to build against older 
> guava (like 28.1-jre).
> Most other code is fine, so if we switch these locations to use 
> ImmutableMap.builder(), then the whole codebase can compile 
> with the older guava (while remaining forward compatible as well).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-12432) Keep LdapKerberosImpalaShellTest* compatible with older guava versions

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-12432:
---
Issue Type: Task  (was: Improvement)

> Keep LdapKerberosImpalaShellTest* compatible with older guava versions
> --
>
> Key: IMPALA-12432
> URL: https://issues.apache.org/jira/browse/IMPALA-12432
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Joe McDonnell
>Priority: Major
>
> LdapKerberosImpalaShellTestBase.java and LdapKerberosImpalaShellTest.java use 
> the ImmutableMap.of function with 8+ pairs. Older versions of guava like 
> 28.1-jre do not have ImmutableMap.of() for that number of arguments.
> Since we often want to use the guava version that the underlying Hadoop/Hive 
> use, it can be useful for compatibility to be able to build against older 
> guava (like 28.1-jre).
> Most other code is fine, so if we switch these locations to use 
> ImmutableMap.builder(), then the whole codebase can compile 
> with the older guava (while remaining forward compatible as well).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12441) Simplify local toolchain development

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith reassigned IMPALA-12441:
--

Assignee: Michael Smith

> Simplify local toolchain development
> 
>
> Key: IMPALA-12441
> URL: https://issues.apache.org/jira/browse/IMPALA-12441
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Testing updates to https://github.com/cloudera/native-toolchain currently 
> takes some work. 
> https://cwiki.apache.org/confluence/display/IMPALA/Building+native-toolchain+from+scratch+and+using+with+Impala
>  mentions it's for advanced users only, and the instructions don't quite work 
> anymore. They also don't work very well when switching between different 
> branches.
> For aarch64 builds we make this somewhat simpler. Expand on that to make it 
> easier to work with local native-toolchain checkouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12441) Simplify local toolchain development

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12441 started by Michael Smith.
--
> Simplify local toolchain development
> 
>
> Key: IMPALA-12441
> URL: https://issues.apache.org/jira/browse/IMPALA-12441
> Project: IMPALA
>  Issue Type: Task
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
>
> Testing updates to https://github.com/cloudera/native-toolchain currently 
> takes some work. 
> https://cwiki.apache.org/confluence/display/IMPALA/Building+native-toolchain+from+scratch+and+using+with+Impala
>  mentions it's for advanced users only, and the instructions don't quite work 
> anymore. They also don't work very well when switching between different 
> branches.
> For aarch64 builds we make this somewhat simpler. Expand on that to make it 
> easier to work with local native-toolchain checkouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12441) Simplify local toolchain development

2023-09-11 Thread Michael Smith (Jira)
Michael Smith created IMPALA-12441:
--

 Summary: Simplify local toolchain development
 Key: IMPALA-12441
 URL: https://issues.apache.org/jira/browse/IMPALA-12441
 Project: IMPALA
  Issue Type: Task
Reporter: Michael Smith


Testing updates to https://github.com/cloudera/native-toolchain currently takes 
some work. 
https://cwiki.apache.org/confluence/display/IMPALA/Building+native-toolchain+from+scratch+and+using+with+Impala
 mentions it's for advanced users only, and the instructions don't quite work 
anymore. They also don't work very well when switching between different 
branches.

For aarch64 builds we make this somewhat simpler. Expand on that to make it 
easier to work with local native-toolchain checkouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-3192) Toolchain build should be able to use prebuilt artifacts

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith closed IMPALA-3192.
-
Resolution: Won't Do

This is tricky to do reliably, and hasn't been a priority. We make changes to 
toolchain builds besides version updates, and we would have to identify whether 
the changes require rebuilding any particular package.

> Toolchain build should be able to use prebuilt artifacts
> 
>
> Key: IMPALA-3192
> URL: https://issues.apache.org/jira/browse/IMPALA-3192
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
>Reporter: casey
>Priority: Minor
>
> The toolchain build should have an option (maybe the default) to only build 
> what isn't already available for download. Currently, if you want to build 
> the toolchain locally it builds everything. I think the most common use case 
> for a local build is when you want to add something. In that case, you don't 
> want to redo the work of building existing components, they can just be 
> downloaded.
> This would also help avoid issues like 
> https://issues.cloudera.org/browse/IMPALA-3191



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10086) SqlCastException when comparing char with varchar

2023-09-11 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763857#comment-17763857
 ] 

Michael Smith commented on IMPALA-10086:


Regression for anyone upgrading from Impala 3.2.0.

> SqlCastException when comparing char with varchar
> -
>
> Key: IMPALA-10086
> URL: https://issues.apache.org/jira/browse/IMPALA-10086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Michael Smith
>Priority: Critical
>  Labels: newbie, ramp-up
>
> {noformat}
> [localhost:21000] default> select 'expected 2',count(*) from ax where cast(t 
> as string) = cast('a ' as varchar(10));
> +--+--+
> | 'expected 2' | count(*) |
> +--+--+
> | expected 2   | 2|
> +--+--+
> Fetched 1 row(s) in 0.44s
> [localhost:21000] default> create table chartbl (c char(10));
> +-+
> | summary |
> +-+
> | Table has been created. |
> +-+
> Fetched 1 row(s) in 0.23s
> [localhost:21000] default> select * from chartbl where c = cast('test' as 
> varchar(10));
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(10)
> {noformat}
> Also using the functional dataset:
> {noformat}
> [localhost:21000] functional> select * from chars_tiny where cs = vc;
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(5)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10086) SqlCastException when comparing char with varchar

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith updated IMPALA-10086:
---
Priority: Critical  (was: Major)

> SqlCastException when comparing char with varchar
> -
>
> Key: IMPALA-10086
> URL: https://issues.apache.org/jira/browse/IMPALA-10086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 4.0.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Michael Smith
>Priority: Critical
>  Labels: newbie, ramp-up
>
> {noformat}
> [localhost:21000] default> select 'expected 2',count(*) from ax where cast(t 
> as string) = cast('a ' as varchar(10));
> +--+--+
> | 'expected 2' | count(*) |
> +--+--+
> | expected 2   | 2|
> +--+--+
> Fetched 1 row(s) in 0.44s
> [localhost:21000] default> create table chartbl (c char(10));
> +-+
> | summary |
> +-+
> | Table has been created. |
> +-+
> Fetched 1 row(s) in 0.23s
> [localhost:21000] default> select * from chartbl where c = cast('test' as 
> varchar(10));
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(10)
> {noformat}
> Also using the functional dataset:
> {noformat}
> [localhost:21000] functional> select * from chars_tiny where cs = vc;
> ERROR: SqlCastException: targetType=VARCHAR(*) type=VARCHAR(5)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-12383) Aggregation with num_nodes=1 and limit returns too many rows

2023-09-11 Thread Michael Smith (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-12383.

Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> Aggregation with num_nodes=1 and limit returns too many rows
> 
>
> Key: IMPALA-12383
> URL: https://issues.apache.org/jira/browse/IMPALA-12383
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 4.1.0
>Reporter: Michael Smith
>Assignee: Michael Smith
>Priority: Major
> Fix For: Impala 4.3.0
>
>
> With {{set num_nodes=1}} to select SingleNodePlanner, aggregations return too 
> many rows:
> {code}
> > select distinct l_orderkey from tpch.lineitem limit 10;
> ...
> Fetched 16 row(s) in 0.12s
> > select ss_cdemo_sk from tpcds.store_sales group by ss_cdemo_sk limit 3;
> ...
> Fetched 7 row(s) in 0.14s
> {code}
> This looks like it's caused by changes in IMPALA-2581, which attempts to push 
> down limits to pre-aggregation. In SingleNodePlanner, there is no 
> pre-aggregation, which the patch appears to have failed to account for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-11 Thread Michael Smith (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763850#comment-17763850
 ] 

Michael Smith commented on IMPALA-12402:


You keep pushing up new commits with different Change-Id. That makes busywork 
for us and hard to track your changes. Please use the same Change-Id for new 
revisions of work on a ticket.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12440) Dont write profiles for set commands

2023-09-11 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari reassigned IMPALA-12440:
--

Assignee: Jason Fehr

> Dont write profiles for set commands
> 
>
> Key: IMPALA-12440
> URL: https://issues.apache.org/jira/browse/IMPALA-12440
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Manish Maheshwari
>Assignee: Jason Fehr
>Priority: Major
>
> Dont write profiles for set commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12440) Dont write profiles for set commands

2023-09-11 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-12440:
--

 Summary: Dont write profiles for set commands
 Key: IMPALA-12440
 URL: https://issues.apache.org/jira/browse/IMPALA-12440
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Manish Maheshwari


Dont write profiles for set commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12430) Optimize sending rows within the same process

2023-09-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763737#comment-17763737
 ] 

ASF subversion and git services commented on IMPALA-12430:
--

Commit fb2d2b27641a95f51b6789639fab73b60abd7bc5 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fb2d2b276 ]

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch is moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing compression as impalad processes often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Reviewed-on: http://gerrit.cloudera.org:8080/20462
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 


> Optimize sending rows within the same process
> -
>
> Key: IMPALA-12430
> URL: https://issues.apache.org/jira/browse/IMPALA-12430
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: performance
>
> Currently sending row batches to exchange nodes always goes through KRPC even 
> if the sender and receiver are within the same process.
> This means that the following work is done without actually being necessary:
> sender:
> 1. serialize RowBatch to a single buffer
> 2. compress the buffer with LZ4
> 3. send the buffer as a sidecar in KRPC
> receiver:
> 4. fetch buffer from KRPC
> 5. decompress the buffer
> 6. convert the buffer to RowBatch
> Ideally a single deep copy from the sender's RowBatch to the destination's 
> RowBatch is enough (this is needed to cleanup the memory referenced in the 
> original RowBatch during send).
> The most expensive part is 2, the compression with LZ4 (decompression is much 
> faster) and can be avoided with minimal changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12439) Impala Daemon stucks on random executors

2023-09-11 Thread Evgeniy (Jira)
Evgeniy created IMPALA-12439:


 Summary: Impala Daemon stucks on random executors
 Key: IMPALA-12439
 URL: https://issues.apache.org/jira/browse/IMPALA-12439
 Project: IMPALA
  Issue Type: Bug
  Components: Distributed Exec
Affects Versions: Impala 3.4.0
Reporter: Evgeniy
 Attachments: resolved_420a96bf.txt, resolved_d7750c55.txt

Hi!

In our cluster we face the next problem periodically: 

1. The query fails with the error like this "Exec() rpc failed: Timed out: 
ExecQueryFInstances RPC to :27000 timed out after 300.000s". Every 
time when the problem appears the problem node may be different.



2. We have analyzed minidumps of the impala daemon from two different cases 
(there are resolving minidumps in attachment).  It seems that impala daemon 
stuck on cancelation query fragment:  

Thread 244
 0  libpthread-2.17.so + 0xba35
    rax = 0xfe00   rdx = 0x0002
    rcx = 0x   rbx = 0x7cd81b10
    rsi = 0x0080   rdi = 0x7cd81b14
    rbp = 0x7f7ba5ae8580   rsp = 0x7f7ba5ae8520
     r8 = 0x7cd81b00    r9 = 0x
    r10 = 0x   r11 = 0x0246
    r12 = 0xeafe6400   r13 = 0x7f7ba5ae85c0
    r14 = 0x7f845b7287d0   r15 = 0x7f7ba5ae8660
    rip = 0x7f845b727a35
    Found by: given as instruction pointer in context
 1  impalad!impala::QueryState::Cancel() + 0xdb
    rbp = 0x7f7ba5ae8600   rsp = 0x7f7ba5ae8590
    rip = 0x011791bb
    Found by: previous frame's frame pointer
 2  
impalad!impala::ControlService::CancelQueryFInstances(impala::CancelQueryFInstancesRequestPB
 const*, impala::CancelQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) + 
0x177
    rbx = 0x7f8458e136a0   rbp = 0x7f7ba5ae8780
    rsp = 0x7f7ba5ae8610   r12 = 0x7f7ba5ae8720
    r13 = 0x7f7ba5ae86a0   rip = 0x01218f77
    Found by: call frame info
 3  impalad!kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) + 
0x17c
    rbx = 0x15e4e460   rbp = 0x7f7ba5ae87e0
    rsp = 0x7f7ba5ae8790   r12 = 0x0007a6bf8ee0
    r13 = 0x14f86740   r14 = 0x14f86f00
    r15 = 0x14f87480   rip = 0x01788ffc
    Found by: call frame info
 4  impalad!impala::ImpalaServicePool::RunThread() + 0x1be
    rbx = 0x7f84000d   rbp = 0x7f7ba5ae88a0
    rsp = 0x7f7ba5ae87f0   r12 = 0x18b30f80
    r13 = 0x   r14 = 0x0051
    r15 = 0x7f84000d   rip = 0x010dbdee
    Found by: call frame info
 5  impalad!impala::Thread::SuperviseThread(std::string const&, std::string 
const&, boost::function, impala::ThreadDebugInfo const*, 
impala::Promise*) + 0x30b
    rbx = 0x7f7ba5ae8970   rbp = 0x7f7ba5ae8be0
    rsp = 0x7f7ba5ae88b0   r12 = 0x7ffed2cdb298
    r13 = 0x0592ee20   r14 = 0x7f7ba5ae8910
    r15 = 0x7f8458e136a0   rip = 0x01435f8b
    Found by: call frame info
 6  impalad!boost::detail::thread_data, 
impala::ThreadDebugInfo const*, impala::Promise*), boost::_bi::list5, 
boost::_bi::value, boost::_bi::value >, 
boost::_bi::value, 
boost::_bi::value*> > > >::run() 
+ 0x7a
    rbx = 0x15e34e00   rbp = 0x7f7ba5ae8c40
    rsp = 0x7f7ba5ae8bf0   r12 = 0x7f7ba5ae8c00
    r13 = 0x01435c80   r14 = 0x
    r15 = 0x7f7ba5ae9700   rip = 0x01436e5a
    Found by: call frame info
 7  impalad!thread_proxy + 0xea
    rbx = 0x15e34e00   rbp = 0x
    rsp = 0x7f7ba5ae8c50   r12 = 0x7f7ba5ae8c50
    r13 = 0x00801000   r14 = 0x
    r15 = 0x7f7ba5ae9700   rip = 0x01c18e1a
    Found by: call frame info
 8  libpthread-2.17.so + 0x7ea5
    rbx = 0x   rbp = 0x
    rsp = 0x7f7ba5ae8ca0   r12 = 0x
    r13 = 0x00801000   r14 = 0x
    r15 = 0x7f7ba5ae9700   rip = 0x7f845b723ea5
    Found by: call frame info
 9  libc-2.17.so + 0xfeb0d
    rsp = 0x7f7ba5ae8d40   rip = 0x7f8458321b0d
    Found by: stack scanning



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12419) TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build

2023-09-11 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763691#comment-17763691
 ] 

Steve Loughran commented on IMPALA-12419:
-

test run hasn't included any credentials; not running in EC2 either
{code}
rm: impala-test-uswest2-2: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
com.amazonaws.SdkClientException: Unable to load AWS credentials from 
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY 
(or AWS_SECRET_ACCESS_KEY))

{code}



> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> --
>
> Key: IMPALA-12419
> URL: https://issues.apache.org/jira/browse/IMPALA-12419
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Wenzhe Zhou
>Priority: Major
>
> TestIcebergTable.test_migrated_table_field_id_resolution fails in S3 build
> {code:java}
> query_test/test_iceberg.py:246: in test_migrated_table_field_id_resolution
> "iceberg_migrated_alter_test", "parquet")
> common/file_utils.py:58: in create_iceberg_table_from_directory
> check_call(['hdfs', 'dfs', '-rm', '-f', '-r', hdfs_dir])
> /data/jenkins/workspace/impala-cdw-master-core-s3/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190:
>  in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hdfs', 'dfs', '-rm', '-f', '-r', 
> '/test-warehouse/iceberg_migrated_alter_test']' returned non-zero exit status 
> 1
> {code}
> Standard Error
> {code:java}
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_migrated_table_field_id_resolution_eb4581e8` 
> CASCADE;
> -- 2023-09-04 03:37:39,538 INFO MainThread: Started query 
> 4149ca931eb6d16c:97d680a8
> SET 
> client_identifier=query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_migrated_table_field_id_resolution_eb4581e8`;
> -- 2023-09-04 03:37:45,054 INFO MainThread: Started query 
> 3d4215586e7766ad:333cb5bd
> -- 2023-09-04 03:37:45,356 INFO MainThread: Created database 
> "test_migrated_table_field_id_resolution_eb4581e8" for test ID 
> "query_test/test_iceberg.py::TestIcebergTable::()::test_migrated_table_field_id_resolution[protocol:
>  beeswax | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]"
> Picked up JAVA_TOOL_OPTIONS:  
> -javaagent:/data/jenkins/workspace/impala-cdw-master-core-s3/repos/Impala/fe/target/dependency/jamm-0.4.0.jar
> 23/09/04 03:37:46 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 23/09/04 03:37:46 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 23/09/04 03:37:46 INFO Configuration.deprecation: No unit for 
> fs.s3a.connection.request.timeout(0) assuming SECONDS
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 23/09/04 03:37:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
> 23/09/04 03:37:48 WARN fs.FileSystem: Failed to initialize fileystem 
> s3a://impala-test-uswest2-2: java.nio.file.AccessDeniedException: 
> impala-test-uswest2-2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: 
> No AWS Credentials provided by TemporaryAWSCredentialsProvider 
> SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
> IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to 
> load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or 
> AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
> rm: impala-test-uswest2-2: 
> 

[jira] [Updated] (IMPALA-12406) OPTIMIZE statement as an alias for INSERT OVERWRITE

2023-09-11 Thread Noemi Pap-Takacs (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs updated IMPALA-12406:
--
Description: 
If an Iceberg table is frequently updated/written to in small batches, a lot of 
small files are created. This decreases read performance. Similarly, frequent 
row-level deletes contribute to this problem by creating delete files which 
have to be merged on read.

Currently INSERT OVERWRITE is used as a workaround to rewrite and compact 
Iceberg tables.

OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this 
problem.

This patch introduces the new syntax as an alias for INSERT OVERWRITE.
{code:java}
Syntax: OPTIMIZE TABLE ;{code}

  was:
If an Iceberg table is frequently updated/written to in small batches, a lot of 
small files are created. This decreases read performance. Similarly, frequent 
row-level deletes contribute to this problem by creating delete files which 
have to be merged on read.

Currently INSERT OVERWRITE is used as a workaround to rewrite and compact 
Iceberg tables.

OPTIMIZE statement offers a new syntax and an Iceberg specific solution to this 
problem.

This patch introduces the new syntax as an alias for INSERT OVERWRITE.
{code:java}
Syntax: OPTIMIZE [TABLE] ;{code}


> OPTIMIZE statement as an alias for INSERT OVERWRITE
> ---
>
> Key: IMPALA-12406
> URL: https://issues.apache.org/jira/browse/IMPALA-12406
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Noemi Pap-Takacs
>Assignee: Noemi Pap-Takacs
>Priority: Major
>  Labels: impala-iceberg
>
> If an Iceberg table is frequently updated/written to in small batches, a lot 
> of small files are created. This decreases read performance. Similarly, 
> frequent row-level deletes contribute to this problem by creating delete 
> files which have to be merged on read.
> Currently INSERT OVERWRITE is used as a workaround to rewrite and compact 
> Iceberg tables.
> OPTIMIZE statement offers a new syntax and an Iceberg specific solution to 
> this problem.
> This patch introduces the new syntax as an alias for INSERT OVERWRITE.
> {code:java}
> Syntax: OPTIMIZE TABLE ;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-12438) Write View Expanded Query to Impala Profiles

2023-09-11 Thread Manish Maheshwari (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Maheshwari reassigned IMPALA-12438:
--

Assignee: Jason Fehr

> Write View Expanded Query to Impala Profiles
> 
>
> Key: IMPALA-12438
> URL: https://issues.apache.org/jira/browse/IMPALA-12438
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Manish Maheshwari
>Assignee: Jason Fehr
>Priority: Major
>
> When running queries with multi level views, Currently Impala profiles do not 
> log the expanded query with views replaced. This makes it harder to 
> understand the actual tables involved in the query and also to understand the 
> complxity of the views used
> A change would be to write into the query profiles the original query and the 
> same query with all the views replaced with their actual definitions - 
> e.g. 
> {code:java}
> Analyzed query: SELECT * from db.table limit 10
> Expanded query: Select * from (select d.c1, d.c2, e.c3 from d inner join 
> (select c3 from f inner join  ) e on d.c1 = e.c2 left join .   ) 
> limit 10 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-12438) Write View Expanded Query to Impala Profiles

2023-09-11 Thread Manish Maheshwari (Jira)
Manish Maheshwari created IMPALA-12438:
--

 Summary: Write View Expanded Query to Impala Profiles
 Key: IMPALA-12438
 URL: https://issues.apache.org/jira/browse/IMPALA-12438
 Project: IMPALA
  Issue Type: Improvement
Reporter: Manish Maheshwari


When running queries with multi level views, Currently Impala profiles do not 
log the expanded query with views replaced. This makes it harder to understand 
the actual tables involved in the query and also to understand the complxity of 
the views used

A change would be to write into the query profiles the original query and the 
same query with all the views replaced with their actual definitions - 

e.g. 
{code:java}
Analyzed query: SELECT * from db.table limit 10
Expanded query: Select * from (select d.c1, d.c2, e.c3 from d inner join 
(select c3 from f inner join  ) e on d.c1 = e.c2 left join .   ) limit 
10 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12402) Add some configurations for CatalogdMetaProvider's cache_

2023-09-11 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763589#comment-17763589
 ] 

Maxwell Guo commented on IMPALA-12402:
--

[~MikaelSmith]Thank you for your reiview, I have update the code agagin.

> Add some configurations for CatalogdMetaProvider's cache_
> -
>
> Key: IMPALA-12402
> URL: https://issues.apache.org/jira/browse/IMPALA-12402
> Project: IMPALA
>  Issue Type: Improvement
>  Components: fe
>Reporter: Maxwell Guo
>Assignee: Maxwell Guo
>Priority: Minor
>  Labels: pull-request-available
>
> when the cluster contains many db and tables such as if there are more than 
> 10 tables, and if we restart the impalad , the local cache_ 
> CatalogMetaProvider's need to doing some loading process. 
> As we know that the goole's guava cache 's concurrencyLevel os set to 4 by 
> default. 
> but if there is many tables the loading process will need more time and 
> increase the probability of lock contention, see 
> [here|https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L437].
>  
> So we propose to add some configurations here, the first is the concurrency 
> of cache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org