[jira] [Commented] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured
[ https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863987#comment-17863987 ] Quanlong Huang commented on IMPALA-13202: - Debug in gdb, I can verify libkudu_client.so is using its own methods and flags. There are two variables of FLAGS_rpc_max_message_size: {code:cpp} (gdb) info variables FLAGS_rpc_max_message_size$ All variables matching regular expression "FLAGS_rpc_max_message_size$": File /home/quanlong/workspace/Impala/be/src/kudu/rpc/transfer.cc: 48: google::int64 fLI64::FLAGS_rpc_max_message_size; File /mnt/source/kudu/kudu-e742f86f6d/src/kudu/rpc/transfer.cc: 46: google::int64 fLI64::FLAGS_rpc_max_message_size;{code} The second one comes from libkudu_client.so. The current Kudu version used in the Impala master branch is e742f86f6d (corresponding to kudu-1.17 release). Here is where the flag is used: {code:cpp} 102 Status InboundTransfer::ReceiveBuffer(Socket* socket, faststring* extra_4) { ... 130 if (PREDICT_FALSE(total_length_ > FLAGS_rpc_max_message_size)) { 131 return Status::NetworkError(Substitute( 132 "RPC frame had a length of $0, but we only support messages up to $1 bytes " 133 "long.", total_length_, FLAGS_rpc_max_message_size)); 134 }{code} [https://github.com/apache/kudu/blob/e742f86f6d8e687dd02d9891f33e068477163016/src/kudu/rpc/transfer.cc#L130] Add a breakpoint in that source file where the code uses this flag. {noformat} (gdb) b /mnt/source/kudu/kudu-e742f86f6d/src/kudu/rpc/transfer.cc:130{noformat} Continue in gdb and run the query in Impala. When the breakpoint is hitted: {code:cpp} Thread 276 "rpc reactor-250" hit Breakpoint 1, kudu::rpc::InboundTransfer::ReceiveBuffer (this=0xd03cfc0, socket=0x14b4ed20, extra_4=0x7fc72c74e8e0) at /mnt/source/kudu/kudu-e742f86f6d/src/kudu/rpc/transfer.cc:130 130 if (PREDICT_FALSE(total_length_ > FLAGS_rpc_max_message_size)) { (gdb) x/i $pc => 0x7fc7dd68bf49 :cmp%r9,%rdx (gdb) p $rdx $1 = 53477464 (gdb) p $r9 $2 = 52428800{code} The assembly code is comparing two registers. Their values match what we see in the error message. 52428800 is the unmodified default value of FLAGS_rpc_max_message_size. Looking into the assembly codes, register r9 is loaded from memory address 0x7fc7ddd631d8 which is the hidden variable FLAGS_rpc_max_message_size: {code:java} lea0x6d729e(%rip),%rdx# 0x7fc7ddd631d8 <_ZN5fLI6426FLAGS_rpc_max_message_sizeE> mov(%rax),%ecx mov(%rdx),%r9 bswap %ecx lea0x4(%rcx),%edi mov%edi,%edx mov%edi,0x38(%rbx) cmp%r9,%rdx {code} Print the variable shows the global one used in impalad. But print the value used by libkudu_client.so shows 52428800: {noformat} (gdb) p FLAGS_rpc_max_message_size $25 = 2147483647 (gdb) p *((int64_t*)0x7fc7ddd631d8) $26 = 52428800{noformat} > KRPC flags used by libkudu_client.so can't be configured > > > Key: IMPALA-13202 > URL: https://issues.apache.org/jira/browse/IMPALA-13202 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: data.parquet > > > The way Impala integrates with KRPC is porting the KRPC codes into the Impala > code base. Flags and methods of KRPC are defined as GLOBAL in the impalad > executable. libkudu_client.so also compiles from the same KRPC codes and have > duplicate flags and methods defined as HIDDEN. > To be specifit, both the impalad executable and libkudu_client.so have the > symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() > {noformat} > $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer > 8: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > 81380: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ readelf -s --wide > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so > | grep ReceiveBuffer > 1601: 00086e4a 108 FUNCLOCAL DEFAULT 12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold > 11905: 001fec60 2076 FUNCLOCAL HIDDEN12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ c++filt > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) > {noformat}
[jira] [Commented] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured
[ https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863983#comment-17863983 ] Quanlong Huang commented on IMPALA-13202: - YoungJi Nam figured out a way to reproduce the issue by using the attached [^data.parquet]. We need the following Kudu configs: {code:java} --unlock_unsafe_flags=true --max_cell_size_bytes=1073741824 --max_cfile_block_size=1073741824{code} In the Impala dev env, add them in testdata/cluster/cdh7/node-*/etc/kudu/tserver.conf Then start Impala cluster with the following configs: {code:java} -kudu_mutation_buffer_size=56477399 -kudu_error_buffer_size=56477399 -rpc_max_message_size=2147483647{code} In the Impala dev env, the command is {code:java} bin/start-impala-cluster.py -r --impalad_args="-kudu_mutation_buffer_size=56477399 -kudu_error_buffer_size=56477399 -rpc_max_message_size=2147483647"{code} Create a Parquet table and a Kudu table {code:sql} create external table test_parquet (str string) stored as parquet; create table test_kudu_large ( id int, str string, primary key(id) ) stored as kudu;{code} Put the file [^data.parquet] into the location of test_parquet table and REFRESH the table. Then INSERT the value into the Kudu table. {code:sql} insert into test_kudu_large select 1, str from test_parquet;{code} Run a SELECT query on the Kudu table can see the error messages in impalad logs. {code:sql} select * from test_kudu_large;{code} > KRPC flags used by libkudu_client.so can't be configured > > > Key: IMPALA-13202 > URL: https://issues.apache.org/jira/browse/IMPALA-13202 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: data.parquet > > > The way Impala integrates with KRPC is porting the KRPC codes into the Impala > code base. Flags and methods of KRPC are defined as GLOBAL in the impalad > executable. libkudu_client.so also compiles from the same KRPC codes and have > duplicate flags and methods defined as HIDDEN. > To be specifit, both the impalad executable and libkudu_client.so have the > symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() > {noformat} > $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer > 8: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > 81380: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ readelf -s --wide > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so > | grep ReceiveBuffer > 1601: 00086e4a 108 FUNCLOCAL DEFAULT 12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold > 11905: 001fec60 2076 FUNCLOCAL HIDDEN12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ c++filt > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) > {noformat} > KRPC flags like rpc_max_message_size are also defined in both the impalad > executable and libkudu_client.so: > {noformat} > $ readelf -s --wide be/build/latest/service/impalad | grep > FLAGS_rpc_max_message_size > 14380: 06006738 8 OBJECT GLOBAL DEFAULT 30 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > 80396: 06006741 1 OBJECT GLOBAL DEFAULT 30 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 81399: 06006741 1 OBJECT GLOBAL DEFAULT 30 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 117873: 06006738 8 OBJECT GLOBAL DEFAULT 30 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > $ readelf -s --wide > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so > | grep FLAGS_rpc_max_message_size > 11882: 008d61e1 1 OBJECT LOCAL HIDDEN27 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 11906: 008d61d8 8 OBJECT LOCAL DEFAULT 27 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE > fLI64::FLAGS_rpc_max_message_size {noformat} > libkudu_client.so uses its own methods and flags. The flags are HIDDEN so > can't be modified by Impala codes. E.g. IMPALA-4874 bumps > FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable > FLAGS_rpc_max_message_size used in libkudu_client.so is still the default > value 50MB (52428800). We've seen error messages like this in the master > branch: > {code:java} > I0708 10:23:31.784974 2943 meta_cache.cc:294] > c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: > replica
[jira] [Updated] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured
[ https://issues.apache.org/jira/browse/IMPALA-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13202: Attachment: data.parquet > KRPC flags used by libkudu_client.so can't be configured > > > Key: IMPALA-13202 > URL: https://issues.apache.org/jira/browse/IMPALA-13202 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: data.parquet > > > The way Impala integrates with KRPC is porting the KRPC codes into the Impala > code base. Flags and methods of KRPC are defined as GLOBAL in the impalad > executable. libkudu_client.so also compiles from the same KRPC codes and have > duplicate flags and methods defined as HIDDEN. > To be specifit, both the impalad executable and libkudu_client.so have the > symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() > {noformat} > $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer > 8: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > 81380: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ readelf -s --wide > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so > | grep ReceiveBuffer > 1601: 00086e4a 108 FUNCLOCAL DEFAULT 12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold > 11905: 001fec60 2076 FUNCLOCAL HIDDEN12 > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > $ c++filt > _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE > kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) > {noformat} > KRPC flags like rpc_max_message_size are also defined in both the impalad > executable and libkudu_client.so: > {noformat} > $ readelf -s --wide be/build/latest/service/impalad | grep > FLAGS_rpc_max_message_size > 14380: 06006738 8 OBJECT GLOBAL DEFAULT 30 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > 80396: 06006741 1 OBJECT GLOBAL DEFAULT 30 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 81399: 06006741 1 OBJECT GLOBAL DEFAULT 30 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 117873: 06006738 8 OBJECT GLOBAL DEFAULT 30 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > $ readelf -s --wide > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so > | grep FLAGS_rpc_max_message_size > 11882: 008d61e1 1 OBJECT LOCAL HIDDEN27 > _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE > 11906: 008d61d8 8 OBJECT LOCAL DEFAULT 27 > _ZN5fLI6426FLAGS_rpc_max_message_sizeE > $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE > fLI64::FLAGS_rpc_max_message_size {noformat} > libkudu_client.so uses its own methods and flags. The flags are HIDDEN so > can't be modified by Impala codes. E.g. IMPALA-4874 bumps > FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable > FLAGS_rpc_max_message_size used in libkudu_client.so is still the default > value 50MB (52428800). We've seen error messages like this in the master > branch: > {code:java} > I0708 10:23:31.784974 2943 meta_cache.cc:294] > c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: > replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: > Network error: TS failed: RPC frame had a length of 53477464, but we only > support messages up to 52428800 bytes long.{code} > CC [~joemcdonnell] [~wzhou] [~aserbin] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured
Quanlong Huang created IMPALA-13202: --- Summary: KRPC flags used by libkudu_client.so can't be configured Key: IMPALA-13202 URL: https://issues.apache.org/jira/browse/IMPALA-13202 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang The way Impala integrates with KRPC is porting the KRPC codes into the Impala code base. Flags and methods of KRPC are defined as GLOBAL in the impalad executable. libkudu_client.so also compiles from the same KRPC codes and have duplicate flags and methods defined as HIDDEN. To be specifit, both the impalad executable and libkudu_client.so have the symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() {noformat} $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer 8: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE 81380: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE $ readelf -s --wide toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so | grep ReceiveBuffer 1601: 00086e4a 108 FUNCLOCAL DEFAULT 12 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold 11905: 001fec60 2076 FUNCLOCAL HIDDEN12 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE $ c++filt _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) {noformat} KRPC flags like rpc_max_message_size are also defined in both the impalad executable and libkudu_client.so: {noformat} $ readelf -s --wide be/build/latest/service/impalad | grep FLAGS_rpc_max_message_size 14380: 06006738 8 OBJECT GLOBAL DEFAULT 30 _ZN5fLI6426FLAGS_rpc_max_message_sizeE 80396: 06006741 1 OBJECT GLOBAL DEFAULT 30 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 81399: 06006741 1 OBJECT GLOBAL DEFAULT 30 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 117873: 06006738 8 OBJECT GLOBAL DEFAULT 30 _ZN5fLI6426FLAGS_rpc_max_message_sizeE $ readelf -s --wide toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so | grep FLAGS_rpc_max_message_size 11882: 008d61e1 1 OBJECT LOCAL HIDDEN27 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 11906: 008d61d8 8 OBJECT LOCAL DEFAULT 27 _ZN5fLI6426FLAGS_rpc_max_message_sizeE $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE fLI64::FLAGS_rpc_max_message_size {noformat} libkudu_client.so uses its own methods and flags. The flags are HIDDEN so can't be modified by Impala codes. E.g. IMPALA-4874 bumps FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable FLAGS_rpc_max_message_size used in libkudu_client.so is still the default value 50MB (52428800). We've seen error messages like this in the master branch: {code:java} I0708 10:23:31.784974 2943 meta_cache.cc:294] c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: Network error: TS failed: RPC frame had a length of 53477464, but we only support messages up to 52428800 bytes long.{code} CC [~joemcdonnell] [~wzhou] [~aserbin] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13202) KRPC flags used by libkudu_client.so can't be configured
Quanlong Huang created IMPALA-13202: --- Summary: KRPC flags used by libkudu_client.so can't be configured Key: IMPALA-13202 URL: https://issues.apache.org/jira/browse/IMPALA-13202 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang The way Impala integrates with KRPC is porting the KRPC codes into the Impala code base. Flags and methods of KRPC are defined as GLOBAL in the impalad executable. libkudu_client.so also compiles from the same KRPC codes and have duplicate flags and methods defined as HIDDEN. To be specifit, both the impalad executable and libkudu_client.so have the symbol for kudu::rpc::InboundTransfer::ReceiveBuffer() {noformat} $ readelf -s --wide be/build/latest/service/impalad | grep ReceiveBuffer 8: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE 81380: 022f5c88 1936 FUNCGLOBAL DEFAULT 13 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE $ readelf -s --wide toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so | grep ReceiveBuffer 1601: 00086e4a 108 FUNCLOCAL DEFAULT 12 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE.cold 11905: 001fec60 2076 FUNCLOCAL HIDDEN12 _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE $ c++filt _ZN4kudu3rpc15InboundTransfer13ReceiveBufferEPNS_6SocketEPNS_10faststringE kudu::rpc::InboundTransfer::ReceiveBuffer(kudu::Socket*, kudu::faststring*) {noformat} KRPC flags like rpc_max_message_size are also defined in both the impalad executable and libkudu_client.so: {noformat} $ readelf -s --wide be/build/latest/service/impalad | grep FLAGS_rpc_max_message_size 14380: 06006738 8 OBJECT GLOBAL DEFAULT 30 _ZN5fLI6426FLAGS_rpc_max_message_sizeE 80396: 06006741 1 OBJECT GLOBAL DEFAULT 30 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 81399: 06006741 1 OBJECT GLOBAL DEFAULT 30 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 117873: 06006738 8 OBJECT GLOBAL DEFAULT 30 _ZN5fLI6426FLAGS_rpc_max_message_sizeE $ readelf -s --wide toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/debug/lib/libkudu_client.so | grep FLAGS_rpc_max_message_size 11882: 008d61e1 1 OBJECT LOCAL HIDDEN27 _ZN3fLB44FLAGS_rpc_max_message_size_enable_validationE 11906: 008d61d8 8 OBJECT LOCAL DEFAULT 27 _ZN5fLI6426FLAGS_rpc_max_message_sizeE $ c++filt _ZN5fLI6426FLAGS_rpc_max_message_sizeE fLI64::FLAGS_rpc_max_message_size {noformat} libkudu_client.so uses its own methods and flags. The flags are HIDDEN so can't be modified by Impala codes. E.g. IMPALA-4874 bumps FLAGS_rpc_max_message_size to 2GB in RpcMgr::Init(), but the HIDDEN variable FLAGS_rpc_max_message_size used in libkudu_client.so is still the default value 50MB (52428800). We've seen error messages like this in the master branch: {code:java} I0708 10:23:31.784974 2943 meta_cache.cc:294] c243bda4702a5ab9:0ba93d240001] tablet 0c8f3446538449ee9d3df5056afe775e: replica e0e1db54dab74f208e37ea1b975595e5 (127.0.0.1:31202) has failed: Network error: TS failed: RPC frame had a length of 53477464, but we only support messages up to 52428800 bytes long.{code} CC [~joemcdonnell] [~wzhou] [~aserbin] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13200) Auto refresh on S3 tables based on S3 notification
Quanlong Huang created IMPALA-13200: --- Summary: Auto refresh on S3 tables based on S3 notification Key: IMPALA-13200 URL: https://issues.apache.org/jira/browse/IMPALA-13200 Project: IMPALA Issue Type: New Feature Components: Catalog Reporter: Quanlong Huang S3 Event Notifications can be used to get updates on new files or file deletions: [https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html] Snowflake uses it to auto refresh external tables: https://docs.snowflake.com/en/user-guide/tables-external-s3 Other object storages like Google Cloud Storage and Azure Blob Storage also have the similar notification mechanism: https://docs.snowflake.com/en/user-guide/tables-external-gcs https://docs.snowflake.com/en/user-guide/tables-external-azure CC [~mylogi...@gmail.com] [~hemanth619] [~VenuReddy] [~ngangam] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13200) Auto refresh on S3 tables based on S3 notification
Quanlong Huang created IMPALA-13200: --- Summary: Auto refresh on S3 tables based on S3 notification Key: IMPALA-13200 URL: https://issues.apache.org/jira/browse/IMPALA-13200 Project: IMPALA Issue Type: New Feature Components: Catalog Reporter: Quanlong Huang S3 Event Notifications can be used to get updates on new files or file deletions: [https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html] Snowflake uses it to auto refresh external tables: https://docs.snowflake.com/en/user-guide/tables-external-s3 Other object storages like Google Cloud Storage and Azure Blob Storage also have the similar notification mechanism: https://docs.snowflake.com/en/user-guide/tables-external-gcs https://docs.snowflake.com/en/user-guide/tables-external-azure CC [~mylogi...@gmail.com] [~hemanth619] [~VenuReddy] [~ngangam] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13170: Priority: Critical (was: Major) > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885 MetastoreEvents.java:505] EventId: 141467562 > EventType: DROP_DATABASE Removed Database test_hive > {code} > The case is similar to IMPALA-9441. We may want to handle the error in a > better way in Frontend.getDbs(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13170: Priority: Major (was: Critical) > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > Fix For: Impala 4.5.0 > > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885 MetastoreEvents.java:505] EventId: 141467562 > EventType: DROP_DATABASE Removed Database test_hive > {code} > The case is similar to IMPALA-9441. We may want to handle the error in a > better way in Frontend.getDbs(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13192) Impala Coordinator stuck and Full GC when execute query from nested temporary table.
[ https://issues.apache.org/jira/browse/IMPALA-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13192: Priority: Critical (was: Major) > Impala Coordinator stuck and Full GC when execute query from nested temporary > table. > > > Key: IMPALA-13192 > URL: https://issues.apache.org/jira/browse/IMPALA-13192 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Environment: impalad version 4.3.0-RELEASE RELEASE (build > 14bb13e67e48742df72f9e1dd73be15ec7ba31bd) >Reporter: LiuYuan >Priority: Critical > > 1.Create a table as below: > > {code:java} > CREATE TABLE trunck_info ( > user_id BIGINT , > truck_length DOUBLE, > length_type STRING, > point_km DOUBLE, > estimate_mileage DOUBLE, > dep_rate DOUBLE, > line_day_cnt_01 BIGINT, > line_ly_cnt_01 BIGINT, > line_day_cnt_30 BIGINT, > line_ly_cnt_30 BIGINT, > line_day_cnt_60 BIGINT, > line_ly_cnt_60 BIGINT, > num_all_60 BIGINT, > num_est_60 BIGINT, > num_est_order_60 BIGINT, > num_act_60 BIGINT, > num_inh_60 BIGINT, > num_all_30 BIGINT, > num_est_30 BIGINT, > num_est_order_30 BIGINT, > num_act_30 BIGINT, > num_inh_30 BIGINT, > conn_num_60 BIGINT, > conn_num_30 BIGINT, > hp_num_60 INT, > hp_num_30 INT, > bzj_num INT, > feidan8_num_60 BIGINT, > feidan8_num_30 INT, > ts_num_60 BIGINT, > ts_num_30 INT, > new_mile_point_60 BIGINT, > new_mile_point_30 BIGINT > ) > WITH SERDEPROPERTIES ('serialization.format'='1') > STORED AS TEXTFILE {code} > > 2.Query from nested temporary table, we can see coordinator hung and full gc > > > > {panel:title=hung.sql} > with t1 > as > ( > select user_id > ,nvl(num_inh_60,0)+nvl(conn_num_60,0)+nvl(new_mile_point_60,0) as > score_all > , nvl(conn_num_60,0)+nvl(new_mile_point_60,0) as > score_noinh > from trunck_info > ) > ,t2 > as > ( > select user_id > ,score_noinh + score_inh as score_all > ,score_noinh > from > ( > select user_id > ,score_noinh > ,case when score_all >= 800 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 600 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 450 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 300 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all > 0 then if(score_all*0.5 >= > 450,450,score_all*0.5) > end as score_inh > from t1 > where score_noinh > 0 > ) a > ) > ,t3 > as > ( > select user_id > ,score_noinh + score_inh as score_all > ,score_noinh > from > ( > select user_id > ,score_noinh > ,case when score_all >= 800 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 600 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 450 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 300 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all > 0 then if(score_all*0.5 >= > 450,450,score_all*0.5) > end as score_inh > from t2 > where score_noinh > 0 > ) a > ) > ,t4 > as > ( > select user_id > ,score_noinh + score_inh as score_all > ,score_noinh > from > ( > select user_id > ,score_noinh > ,case when score_all >= 800 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 600 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 450 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 300 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all > 0 then if(score_all*0.5 >= > 450,450,score_all*0.5) > end as score_inh > from t3 > where score_noinh > 0 > ) a > ) > ,t5 > as > ( > select user_id > ,score_noinh + score_inh as score_all > ,score_noinh > from > ( > select user_id > ,score_noinh > ,case when score_all >= 800 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 600 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 450 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all >= 300 then if(score_all*0.5 >= > 450,450,score_all*0.5) > when score_all > 0 then if(score_all*0.5 >= > 450,450,score_all*0.5) > end as score_inh > from t4 > where score_noinh > 0 > ) a > ) > ,t6 > as > ( > select user_id > ,score_noinh +
[jira] [Updated] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values
[ https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13193: Affects Version/s: Impala 4.4.0 Impala 4.3.0 Impala 4.1.2 Impala 4.1.1 Impala 4.2.0 Impala 4.1.0 > RuntimeFilter on parquet dictionary should evaluate null values > --- > > Key: IMPALA-13193 > URL: https://issues.apache.org/jira/browse/IMPALA-13193 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, > Impala 4.3.0, Impala 4.4.0 >Reporter: Quanlong Huang >Priority: Critical > > IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime > filter on parquet dictionary values. If non of the values can pass the check, > the whole row group will be skipped. However, NULL values are not included in > the parquet dictionary. Runtime filters that accept NULL values might > incorrectly reject the row group if none of the dictionary values can pass > the check. > Here are steps to reproduce the bug: > {code:sql} > create table parq_tbl (id bigint, name string) stored as parquet; > insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); > create table dim_tbl (name string); > insert into dim_tbl values (NULL); > select * from parq_tbl p join dim_tbl d > on COALESCE(p.name, '') = COALESCE(d.name, '');{code} > The SELECT query should return 2 rows but now it returns 0 rows. > A workaround is to disable this optimization: > {code:sql} > set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values
Quanlong Huang created IMPALA-13193: --- Summary: RuntimeFilter on parquet dictionary should evaluate null values Key: IMPALA-13193 URL: https://issues.apache.org/jira/browse/IMPALA-13193 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime filter on parquet dictionary values. If non of the values can pass the check, the whole row group will be skipped. However, NULL values are not included in the parquet dictionary. Runtime filters that accept NULL values might incorrectly reject the row group if none of the dictionary values can pass the check. Here are steps to reproduce the bug: {code:sql} create table parq_tbl (id bigint, name string) stored as parquet; insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); create table dim_tbl (name string); insert into dim_tbl values (NULL); select * from parq_tbl p join dim_tbl d on COALESCE(p.name, '') = COALESCE(d.name, '');{code} The SELECT query should return 2 rows but now it returns 0 rows. A workaround is to disable this optimization: {code:sql} set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13193) RuntimeFilter on parquet dictionary should evaluate null values
Quanlong Huang created IMPALA-13193: --- Summary: RuntimeFilter on parquet dictionary should evaluate null values Key: IMPALA-13193 URL: https://issues.apache.org/jira/browse/IMPALA-13193 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime filter on parquet dictionary values. If non of the values can pass the check, the whole row group will be skipped. However, NULL values are not included in the parquet dictionary. Runtime filters that accept NULL values might incorrectly reject the row group if none of the dictionary values can pass the check. Here are steps to reproduce the bug: {code:sql} create table parq_tbl (id bigint, name string) stored as parquet; insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc"); create table dim_tbl (name string); insert into dim_tbl values (NULL); select * from parq_tbl p join dim_tbl d on COALESCE(p.name, '') = COALESCE(d.name, '');{code} The SELECT query should return 2 rows but now it returns 0 rows. A workaround is to disable this optimization: {code:sql} set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-5509) Runtime filter : Extend runtime filter to support Dictionary values
[ https://issues.apache.org/jira/browse/IMPALA-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-5509: --- Fix Version/s: Impala 4.1.0 > Runtime filter : Extend runtime filter to support Dictionary values > --- > > Key: IMPALA-5509 > URL: https://issues.apache.org/jira/browse/IMPALA-5509 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Alan Choi >Assignee: Csaba Ringhofer >Priority: Major > Labels: performance, runtime-filters > Fix For: Impala 4.1.0 > > > For runtime filter on a single column, it can be run against the dictionary > values in Parquet to enable efficient block filtering. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13170. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > Fix For: Impala 4.5.0 > > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885 MetastoreEvents.java:505] EventId: 141467562 > EventType: DROP_DATABASE Removed Database test_hive > {code} > The case is similar to IMPALA-9441. We may want to handle the error in a > better way in Frontend.getDbs(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9441) TestHS2.test_get_schemas is flaky in local catalog mode
[ https://issues.apache.org/jira/browse/IMPALA-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-9441. Fix Version/s: Impala 4.5.0 Resolution: Fixed > TestHS2.test_get_schemas is flaky in local catalog mode > --- > > Key: IMPALA-9441 > URL: https://issues.apache.org/jira/browse/IMPALA-9441 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sahil Takiar >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > Saw this once on a ubuntu-16.04-dockerised-tests job: > {code:java} > Error Message > hs2/hs2_test_suite.py:63: in add_session lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper fn() > hs2/hs2_test_suite.py:63: in lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) hs2/hs2_test_suite.py:131: in > check_response assert response.status.statusCode == expected_status_code > E assert 3 == 0 E+ where 3 = 3 E+where 3 = > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode E+ where > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = > TGetSchemasResp(status=TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, > operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status > Stacktrace > hs2/hs2_test_suite.py:63: in add_session > lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper > fn() > hs2/hs2_test_suite.py:63: in > lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) > hs2/hs2_test_suite.py:131: in check_response > assert response.status.statusCode == expected_status_code > E assert 3 == 0 > E+ where 3 = 3 > E+where 3 = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode > E+ where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) > E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = > TGetSchemasResp(status=TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, > operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13170. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > Fix For: Impala 4.5.0 > > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885 MetastoreEvents.java:505] EventId: 141467562 > EventType: DROP_DATABASE Removed Database test_hive > {code} > The case is similar to IMPALA-9441. We may want to handle the error in a > better way in Frontend.getDbs(). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-9441) TestHS2.test_get_schemas is flaky in local catalog mode
[ https://issues.apache.org/jira/browse/IMPALA-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-9441. Fix Version/s: Impala 4.5.0 Resolution: Fixed > TestHS2.test_get_schemas is flaky in local catalog mode > --- > > Key: IMPALA-9441 > URL: https://issues.apache.org/jira/browse/IMPALA-9441 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Sahil Takiar >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > Saw this once on a ubuntu-16.04-dockerised-tests job: > {code:java} > Error Message > hs2/hs2_test_suite.py:63: in add_session lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper fn() > hs2/hs2_test_suite.py:63: in lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) hs2/hs2_test_suite.py:131: in > check_response assert response.status.statusCode == expected_status_code > E assert 3 == 0 E+ where 3 = 3 E+where 3 = > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode E+ where > TStatus(errorCode=None, errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = > TGetSchemasResp(status=TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, > operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status > Stacktrace > hs2/hs2_test_suite.py:63: in add_session > lambda: fn(self)) > hs2/hs2_test_suite.py:44: in add_session_helper > fn() > hs2/hs2_test_suite.py:63: in > lambda: fn(self)) > hs2/test_hs2.py:423: in test_get_schemas > TestHS2.check_response(get_schemas_resp) > hs2/hs2_test_suite.py:131: in check_response > assert response.status.statusCode == expected_status_code > E assert 3 == 0 > E+ where 3 = 3 > E+where 3 = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3).statusCode > E+ where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) > E+where TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_impala_2201_e794b8f' not found\n", sqlState='HY000', > infoMessages=None, statusCode=3) = > TGetSchemasResp(status=TStatus(errorCode=None, > errorMessage="DatabaseNotFoundException: Database > 'test_compute_stats_i...nHandle(hasResultSet=False, modifiedRowCount=None, > operationType=3, operationId=THandleIdentifier(secret='', guid=''))).status > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860997#comment-17860997 ] Quanlong Huang commented on IMPALA-13161: - Uploaded a fix for review: https://gerrit.cloudera.org/c/21559/ > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Assignee: Quanlong Huang >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860808#comment-17860808 ] Quanlong Huang commented on IMPALA-13161: - Got the stacktrace in gdb: {noformat} Thread 304 "impalad" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f0abfe7a700 (LWP 10172)] 0x023d4273 in impala::DelimitedTextParser::ReturnCurrentColumn (this=0xde37f40) at /home/quanlong/workspace/Impala/be/src/exec/delimited-text-parser.h:113 113 bool ReturnCurrentColumn() const { (gdb) bt #0 0x023d4273 in impala::DelimitedTextParser::ReturnCurrentColumn (this=0xde37f40) at /home/quanlong/workspace/Impala/be/src/exec/delimited-text-parser.h:113 #1 impala::DelimitedTextParser::AddColumn (field_locations=0x0, num_fields=0x7f0abfe78824, next_column_start=0x7f0abfe78828, len=0, this=0xde37f40) at /home/quanlong/workspace/Impala/be/src/exec/delimited-text-parser.inline.h:62 #2 impala::DelimitedTextParser::ParseSse (this=this@entry=0xde37f40, max_tuples=max_tuples@entry=1, remaining_len=remaining_len@entry=0x7f0abfe78718, byte_buffer_ptr=byte_buffer_ptr@entry=0xd074f88, row_end_locations=row_end_locations@entry=0xcf5, field_locations=0x0, num_tuples=0x7f0abfe78a80, num_fields=0x7f0abfe78824, next_column_start=0x7f0abfe78828) at /home/quanlong/workspace/Impala/be/src/exec/delimited-text-parser.inline.h:189 #3 0x023d4981 in impala::DelimitedTextParser::ParseFieldLocations (this=0xde37f40, max_tuples=max_tuples@entry=1, remaining_len=, byte_buffer_ptr=byte_buffer_ptr@entry=0xd074f88, row_end_locations=0xcf5, field_locations=0x0, num_tuples=0x7f0abfe78a80, num_fields=0x7f0abfe78824, next_column_start=0x7f0abfe78828) at /home/quanlong/workspace/Impala/be/src/common/status.h:105 #4 0x02057247 in impala::HdfsTextScanner::ProcessRange (this=this@entry=0xd074dc0, row_batch=row_batch@entry=0x1618f760, num_tuples=num_tuples@entry=0x7f0abfe78a80) at /home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/stl_vector.h:1168 #5 0x0205961f in impala::HdfsTextScanner::FinishScanRange (this=this@entry=0xd074dc0, row_batch=row_batch@entry=0x1618f760) at /home/quanlong/workspace/Impala/be/src/exec/text/hdfs-text-scanner.cc:361 #6 0x02059d6d in impala::HdfsTextScanner::GetNextInternal (this=0xd074dc0, row_batch=0x1618f760) at /home/quanlong/workspace/Impala/be/src/exec/text/hdfs-text-scanner.cc:491 #7 0x01b34223 in impala::HdfsScanner::ProcessSplit (this=0xd074dc0) at /home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:421 ... {noformat} It crashed in ReturnCurrentColumn() which has only one line: {code:cpp} 110 /// Will we return the current column to the query? 111 /// Hive allows cols at the end of the table that are not in the schema. We'll 112 /// just ignore those columns 113 bool ReturnCurrentColumn() const { 114 return column_idx_ < num_cols_ && is_materialized_col_[column_idx_]; 115 } {code} The type of column_idx_ is int but it overflows and become negative (-2147483648): {noformat} (gdb) p *this $1 = {xmm_tuple_search_ = {3338, 0}, xmm_delim_search_ = {3338, 0}, xmm_escape_search_ = {5216405793391866985, 5651570509107196769}, is_materialized_col_ = 0x798b740, num_tuple_delims_ = 2, num_delims_ = 3, num_cols_ = 1, num_partition_keys_ = 0, column_idx_ = -2147483648, last_row_delim_offset_ = -1, low_mask_ = {0 }, high_mask_ = {0 }, field_delim_ = 0 '\000', process_escapes_ = false, escape_char_ = 0 '\000', collection_item_delim_ = 0 '\000', tuple_delim_ = 10 '\n', current_column_has_escape_ = false, last_char_is_escape_ = false, unfinished_tuple_ = true} (gdb) p/x column_idx_ $2 = 0x8000{noformat} I think the overflow happens here: https://github.com/apache/impala/blob/333902afcccb8a45c25ae558cc67ceb719bccbfc/be/src/exec/delimited-text-parser.inline.h#L74 \x00 is considered as the default field delimiter. The number of columns overflow the int type. > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Assignee: Quanlong Huang >Priority:
[jira] [Assigned] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13161: --- Assignee: Quanlong Huang > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Assignee: Quanlong Huang >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13120) Failed table loads are not tried to load again even though hive metastore is UP
[ https://issues.apache.org/jira/browse/IMPALA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13120: --- Assignee: Venugopal Reddy K > Failed table loads are not tried to load again even though hive metastore is > UP > --- > > Key: IMPALA-13120 > URL: https://issues.apache.org/jira/browse/IMPALA-13120 > Project: IMPALA > Issue Type: Improvement >Reporter: Venugopal Reddy K >Assignee: Venugopal Reddy K >Priority: Major > > *Description:* > If the metastore is down at the time when the table load is triggered, > catalogd creates a new IncompleteTable instance with > cause=TableLoadingException and updates catalog with a new version. And on > coordinator/impalad, StmtMetadataLoader loadTables() that has been waiting > for table load to complete, considers table as loaded/failed load. Then > during the analyzer’s table resolve step, if the table is incomplete, > TableLoadingException is thrown to user. > Note: IncompleteTable with cause not being null is considered as loaded. > *Henceforth, queries on the table doesn’t trigger the table load(at > StmtMetadataLoader) since the table is IncompleteTable with non-null > cause(i.e.,TableLoadingException). Even though metastore is UP later at some > time, queries continue to fail with same TableLoadingException:* > {{CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.t1. Running 'invalidate metadata default.t1' may resolve this > problem.}} > {{CAUSED BY: MetaException: Could not connect to meta store using any of the > URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: java.net.ConnectException: > Connection refused (Connection refused)}} > *At present, explicit Invalidate metadata is the only way to recover table > from this state.* {*}Queries executed after metastore is up should succeed > without the need for explicit invalidate metadata{*}{*}{{*}} > *Steps to Reproduce:* > # create a table from hive and insert some data into it. > # Bring down the hive metastore process > # Run a query on impala that triggers the table load. Query fails with > TableLoadingException. > # Bring up the hive metastore process > # Run the query on impala again. It still fails with same > TableLoadingException. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-12141) IllegalMonitorStateException when trying to release the table lock
[ https://issues.apache.org/jira/browse/IMPALA-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860738#comment-17860738 ] Quanlong Huang commented on IMPALA-12141: - [~VenuReddy], [~hemanth619] Do you want to take this? > IllegalMonitorStateException when trying to release the table lock > -- > > Key: IMPALA-12141 > URL: https://issues.apache.org/jira/browse/IMPALA-12141 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Priority: Critical > > We saw event-processor went into the ERROR state due to an > IllegalMonitorStateException: > {noformat} > I0504 12:28:45.272922 189771 MetastoreEvents.java:401] EventId: 56369449 > EventType: INSERT Incremented events skipped counter to 283902 > I0504 12:28:45.272941 189771 MetastoreEvents.java:401] EventId: 56369449 > EventType: INSERT Not processing the event as it is a self-event > I0504 14:28:45.283041 189771 MetastoreEvents.java:412] EventId: 56369450 > EventType: INSERT Received exception Error during self-event evaluation for > table xxx. due to lock contention. Ignoring self-event evaluation > E0504 16:28:45.286149 189771 MetastoreEventsProcessor.java:684] Unexpected > exception received while processing event > Java exception follows: > java.lang.IllegalMonitorStateException > at > java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:372) > at > java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1302) > at > java.base/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1147) > at org.apache.impala.catalog.Table.releaseWriteLock(Table.java:262) > at > org.apache.impala.service.CatalogOpExecutor.reloadPartitionIfExists(CatalogOpExecutor.java:3788) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartition(MetastoreEvents.java:633) > at > org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.processPartitionInserts(MetastoreEvents.java:851) > at > org.apache.impala.catalog.events.MetastoreEvents$InsertEvent.process(MetastoreEvents.java:835) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:346) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:772) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:670) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > E0504 16:28:45.286345 189771 MetastoreEventsProcessor.java:795] Notification > event is null > {noformat} > It's due to the following try-clause: > {code:java} > try { > tryWriteLock(table, reason); // throws InternalException if timeout (2h) to > get write lock > ... > return numOfPartsReloaded; > } catch (TableLoadingException e) { > ... > } catch (InternalException e) { > throw new CatalogException( > "Could not acquire lock on the table " + table.getFullName(), e); > } finally { > UnlockWriteLockIfErronouslyLocked(); > table.releaseWriteLock(); > } > {code} > https://github.com/apache/impala/blob/3608ab25f13708b1ba73b0f81abe37c1cda4e342/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4604-L4641 > tryWriteLock() will wait until timeout(2h) to get the table write lock. If > fails, it throws an InternalException. The finally-clause of > releaseWriteLock() is always invoked so it fails by lock not held by current > thread. > {code:java} > private void tryWriteLock(Table tbl, String operation) throws > InternalException { > String type = tbl instanceof View ? "view" : "table"; > if (!catalog_.tryWriteLock(tbl)) { > throw new InternalException(String.format("Error %s (for) %s %s due to > " + > "lock contention.", operation, type, tbl.getFullName())); > } > }{code} > https://github.com/apache/impala/blob/3608ab25f13708b1ba73b0f81abe37c1cda4e342/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L7309-L7323 > We should check if the lock is held by the current
[jira] [Commented] (IMPALA-12461) Avoid write lock on the table during self-event detection
[ https://issues.apache.org/jira/browse/IMPALA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860280#comment-17860280 ] Quanlong Huang commented on IMPALA-12461: - [~gsaihemanth] I think this is not resolved yet since partition level events are not handled in commit 78b9285da457c6853e513f3852730867d4dbe632. > Avoid write lock on the table during self-event detection > - > > Key: IMPALA-12461 > URL: https://issues.apache.org/jira/browse/IMPALA-12461 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Critical > Fix For: Impala 4.5.0 > > > Saw some callstacks like this: > {code} > at > org.apache.impala.catalog.CatalogServiceCatalog.tryLock(CatalogServiceCatalog.java:468) > at > org.apache.impala.catalog.CatalogServiceCatalog.tryWriteLock(CatalogServiceCatalog.java:436) > at > org.apache.impala.catalog.CatalogServiceCatalog.evaluateSelfEvent(CatalogServiceCatalog.java:1008) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.isSelfEvent(MetastoreEvents.java:609) > at > org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process(MetastoreEvents.java:1942) > {code} > At this point it was already checked that the event comes from Impala based > on service id and now we are checking the table's self event list. Taking the > table lock can be problematic as other DDL may took write lock at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13178) Flush the metadata cache to remote storage instead of just invalidating them in full GCs
Quanlong Huang created IMPALA-13178: --- Summary: Flush the metadata cache to remote storage instead of just invalidating them in full GCs Key: IMPALA-13178 URL: https://issues.apache.org/jira/browse/IMPALA-13178 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When invalidate_tables_on_memory_pressure is enabled, catalogd will invalidate 10% (configured by invalidate_tables_fraction_on_memory_pressure) of the tables if the old gen usage of JVM still exceeds 60% (configured by invalidate_tables_gc_old_gen_full_threshold) after a full GC. Later if the table is used again, catalogd will try to load its metadata. The loading process could also lead to OOM (see IMPALA-13117). On the other hand, the metadata might have no changes so it's a waste to evict and reload them again. Fetching all the partitions from HMS and file listing on the storage are expensive. It'd be better to flush out the metadata cache of a table instead of just invalidating it. If there are no more invalidates (either implicit ones from HMS event processing or explicit ones from user commands) on the table, we can reuse the flushed metadata. They can be flushed to the remote storage (e.g. HDFS/Ozone/S3) so catalogd has unlimited space to use. We can consider just flushing out the encodedFileDescriptors (the file metadata) and incremental stats which are usually the majority of the metadata cache. Or use a well-defined format (e.g. Iceberg manifest files) so we can incrementally flush the metadata even with catalog changes (DDL/DMLs). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13178) Flush the metadata cache to remote storage instead of just invalidating them in full GCs
Quanlong Huang created IMPALA-13178: --- Summary: Flush the metadata cache to remote storage instead of just invalidating them in full GCs Key: IMPALA-13178 URL: https://issues.apache.org/jira/browse/IMPALA-13178 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When invalidate_tables_on_memory_pressure is enabled, catalogd will invalidate 10% (configured by invalidate_tables_fraction_on_memory_pressure) of the tables if the old gen usage of JVM still exceeds 60% (configured by invalidate_tables_gc_old_gen_full_threshold) after a full GC. Later if the table is used again, catalogd will try to load its metadata. The loading process could also lead to OOM (see IMPALA-13117). On the other hand, the metadata might have no changes so it's a waste to evict and reload them again. Fetching all the partitions from HMS and file listing on the storage are expensive. It'd be better to flush out the metadata cache of a table instead of just invalidating it. If there are no more invalidates (either implicit ones from HMS event processing or explicit ones from user commands) on the table, we can reuse the flushed metadata. They can be flushed to the remote storage (e.g. HDFS/Ozone/S3) so catalogd has unlimited space to use. We can consider just flushing out the encodedFileDescriptors (the file metadata) and incremental stats which are usually the majority of the metadata cache. Or use a well-defined format (e.g. Iceberg manifest files) so we can incrementally flush the metadata even with catalog changes (DDL/DMLs). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
[ https://issues.apache.org/jira/browse/IMPALA-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859823#comment-17859823 ] Quanlong Huang commented on IMPALA-13117: - Ideally the overhead of metadata loading, i.e. temp objects created during metadata loading, should be negligible comparing to the HdfsTable itself. However, a heap dump during the metadata loading reveals that we are holding the FileDescriptor objects until the parallel file metadata loading finishes. !Selection_125.png|width=561,height=365! Note that the table has small files issue so the memory space is mostly occupied by file metadata. Each FileDescriptor object takes 256B. The encodedFileDescriptor (the byte array inside it) just takes 160B. The FileDescriptors are unwrapped after all the loads on all partitions are finished: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L161] [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1585-L1586] This introduces an overhead of 60% mem space during metadata loading comparing to the actual space needed to cache the metadata. We should unwrap the FileDescriptors in time just after they are generated. > Improve the heap usage during metadata loading and DDL/DML executions > - > > Key: IMPALA-13117 > URL: https://issues.apache.org/jira/browse/IMPALA-13117 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-2024 > Attachments: Selection_125.png > > > The JVM heap size of catalogd is not just used by the metadata cache. The > in-progress metadata loading threads and DDL/DML executions also creates temp > objects, which introduces spikes in the heap usage. We should improve the > heap usage in this part, especially when the metadata loading is slow due to > external slowness (e.g. listing files on S3). > CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
[ https://issues.apache.org/jira/browse/IMPALA-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13117: Attachment: Selection_125.png > Improve the heap usage during metadata loading and DDL/DML executions > - > > Key: IMPALA-13117 > URL: https://issues.apache.org/jira/browse/IMPALA-13117 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-2024 > Attachments: Selection_125.png > > > The JVM heap size of catalogd is not just used by the metadata cache. The > in-progress metadata loading threads and DDL/DML executions also creates temp > objects, which introduces spikes in the heap usage. We should improve the > heap usage in this part, especially when the metadata loading is slow due to > external slowness (e.g. listing files on S3). > CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
[ https://issues.apache.org/jira/browse/IMPALA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13177: Description: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] Files of that table are created by Spark jobs. Here are some file names inside the same partition: {noformat} part-0-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-1-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-2-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-3-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-4-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-5-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-6-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-7-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-8-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-9-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00010-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00011-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00012-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00013-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00014-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 part-00015-14015d2b-b534-4747-8c42-c83a7af0f006-71fda97e-a41d-488f-aa15-6fd9112b6c5b.c000 {noformat} By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. We can consider only do this for partitions whose number of files exceeds a threshold (e.g. 10). was: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png|width=410,height=172! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. We can consider only do this for partitions whose number of files exceeds a threshold (e.g. 10). > Compress encodedFileDescriptors inside the same partition > - > > Key: IMPALA-13177 > URL: https://issues.apache.org/jira/browse/IMPALA-13177 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-2024 > Attachments: Selection_124.png > > > File names under a table usually share some substrings, e.g. query id, job > id, task id, etc. We can compress them to save some memory space. Especially > in the case of small files issue, the memory footprint of the metadata cache > is occupied by encodedFileDescriptors. > An experiment shows that an HdfsTable with 67708 partitions and 3167561 files > on S3
[jira] [Updated] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
[ https://issues.apache.org/jira/browse/IMPALA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13177: Description: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png|width=410,height=172! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. We can consider only do this for partitions whose number of files exceeds a threshold (e.g. 10). was: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png|width=410,height=172! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. > Compress encodedFileDescriptors inside the same partition > - > > Key: IMPALA-13177 > URL: https://issues.apache.org/jira/browse/IMPALA-13177 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-2024 > Attachments: Selection_124.png > > > File names under a table usually share some substrings, e.g. query id, job > id, task id, etc. We can compress them to save some memory space. Especially > in the case of small files issue, the memory footprint of the metadata cache > is occupied by encodedFileDescriptors. > An experiment shows that an HdfsTable with 67708 partitions and 3167561 files > on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each > encodedFileDescriptor is a byte array that takes 160B. Codes: > [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] > Files of that table are created by Spark jobs. An example file name: > part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 > Here are some file names inside the same partition: > !Selection_124.png|width=410,height=172! > By compressing the encodedFileDescriptors inside the same partition, we > should be able to save a significant memory space in this case. Compressing > all of them inside the same table might be even better, but it impacts the > performance when coordinator loading specific partitions from catalogd. > We can consider only do this for partitions whose number of files exceeds a > threshold (e.g. 10). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
[ https://issues.apache.org/jira/browse/IMPALA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13177: Labels: catalog-2024 (was: ) > Compress encodedFileDescriptors inside the same partition > - > > Key: IMPALA-13177 > URL: https://issues.apache.org/jira/browse/IMPALA-13177 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: catalog-2024 > Attachments: Selection_124.png > > > File names under a table usually share some substrings, e.g. query id, job > id, task id, etc. We can compress them to save some memory space. Especially > in the case of small files issue, the memory footprint of the metadata cache > is occupied by encodedFileDescriptors. > An experiment shows that an HdfsTable with 67708 partitions and 3167561 files > on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each > encodedFileDescriptor is a byte array that takes 160B. Codes: > [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] > Files of that table are created by Spark jobs. An example file name: > part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 > Here are some file names inside the same partition: > !Selection_124.png|width=410,height=172! > By compressing the encodedFileDescriptors inside the same partition, we > should be able to save a significant memory space in this case. Compressing > all of them inside the same table might be even better, but it impacts the > performance when coordinator loading specific partitions from catalogd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
Quanlong Huang created IMPALA-13177: --- Summary: Compress encodedFileDescriptors inside the same partition Key: IMPALA-13177 URL: https://issues.apache.org/jira/browse/IMPALA-13177 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang Attachments: Selection_124.png File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
Quanlong Huang created IMPALA-13177: --- Summary: Compress encodedFileDescriptors inside the same partition Key: IMPALA-13177 URL: https://issues.apache.org/jira/browse/IMPALA-13177 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang Attachments: Selection_124.png File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
[ https://issues.apache.org/jira/browse/IMPALA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13177: Description: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png|width=410,height=172! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. was: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. > Compress encodedFileDescriptors inside the same partition > - > > Key: IMPALA-13177 > URL: https://issues.apache.org/jira/browse/IMPALA-13177 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Attachments: Selection_124.png > > > File names under a table usually share some substrings, e.g. query id, job > id, task id, etc. We can compress them to save some memory space. Especially > in the case of small files issue, the memory footprint of the metadata cache > is occupied by encodedFileDescriptors. > An experiment shows that an HdfsTable with 67708 partitions and 3167561 files > on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each > encodedFileDescriptor is a byte array that takes 160B. Codes: > [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723] > Files of that table are created by Spark jobs. An example file name: > part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 > Here are some file names inside the same partition: > !Selection_124.png|width=410,height=172! > By compressing the encodedFileDescriptors inside the same partition, we > should be able to save a significant memory space in this case. Compressing > all of them inside the same table might be even better, but it impacts the > performance when coordinator loading specific partitions from catalogd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13177) Compress encodedFileDescriptors inside the same partition
[ https://issues.apache.org/jira/browse/IMPALA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13177: Description: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: !Selection_124.png! By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. was: File names under a table usually share some substrings, e.g. query id, job id, task id, etc. We can compress them to save some memory space. Especially in the case of small files issue, the memory footprint of the metadata cache is occupied by encodedFileDescriptors. An experiment shows that an HdfsTable with 67708 partitions and 3167561 files on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each encodedFileDescriptor is a byte array that takes 160B. Codes: https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 Files of that table are created by Spark jobs. An example file name: part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 Here are some file names inside the same partition: By compressing the encodedFileDescriptors inside the same partition, we should be able to save a significant memory space in this case. Compressing all of them inside the same table might be even better, but it impacts the performance when coordinator loading specific partitions from catalogd. > Compress encodedFileDescriptors inside the same partition > - > > Key: IMPALA-13177 > URL: https://issues.apache.org/jira/browse/IMPALA-13177 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Attachments: Selection_124.png > > > File names under a table usually share some substrings, e.g. query id, job > id, task id, etc. We can compress them to save some memory space. Especially > in the case of small files issue, the memory footprint of the metadata cache > is occupied by encodedFileDescriptors. > An experiment shows that an HdfsTable with 67708 partitions and 3167561 files > on S3 takes 605MB. 80% of it is spent in encodedFileDescriptors. Each > encodedFileDescriptor is a byte array that takes 160B. Codes: > https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L723 > Files of that table are created by Spark jobs. An example file name: > part-6-f7e5265d-5a63-4477-8954-ac6cbaef553b-face6153-588c-4b44-a277-2836396bc57a.c000 > Here are some file names inside the same partition: > !Selection_124.png! > By compressing the encodedFileDescriptors inside the same partition, we > should be able to save a significant memory space in this case. Compressing > all of them inside the same table might be even better, but it impacts the > performance when coordinator loading specific partitions from catalogd. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859619#comment-17859619 ] Quanlong Huang commented on IMPALA-13170: - Uploaded a patch for review: https://gerrit.cloudera.org/#/c/21546/ > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType: DROP_DATABASE Creating event 141467562 of type DROP_DATABASE on > database test_hive > I0610 02:18:32.277876 222885 MetastoreEvents.java:254] Total number of events > received: 6 Total number of events filtered out: 0 > I0610 02:18:32.277910 222885 MetastoreEvents.java:258] Incremented skipped > metric to 2564 > I0610 02:18:32.279537 222885 MetastoreEvents.java:505] EventId: 141467562 > EventType: DROP_DATABASE Removed Database test_hive > {code} > The case is similar to IMPALA-9441. We may want to handle the error in a > better way in Frontend.getDbs(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package
[ https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12979. - Resolution: Fixed > Wildcard in CLASSPATH might not work in the RPM package > --- > > Key: IMPALA-12979 > URL: https://issues.apache.org/jira/browse/IMPALA-12979 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.2 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 3.4.2 > > > I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS > 7.9 and found launching catalogd failed by the following error (in > catalogd.INFO): > {noformat} > Wrote minidump to > /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build > 1.8.0_141-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [catalogd+0x27af14c] > llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, > llvm::ScalarEvolution&) const+0x73c > # > # Core dump written. Default location: /opt/impala/core or core.156082 > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid156082.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # {noformat} > There are other logs in catalogd.ERROR > {noformat} > Log file created at: 2024/04/08 04:49:28 > Running on machine: ccycloud-1.quanlong.root.comops.site > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this > file. > Wrote minidump to > /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp > could not find method getRootCauseMessage from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > could not find method getStackTrace from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > FileSystem: loadFileSystems failed error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError){noformat} > Resolving the minidump shows me the following stacktrace: > {noformat} > (gdb) bt > #0 0x02baf14c in ?? () > #1 0x02baee24 in getJNIEnv () > #2 0x02bacb71 in hdfsBuilderConnect () > #3 0x012e6ae2 in impala::JniUtil::InitLibhdfs() () > #4 0x012e7897 in impala::JniUtil::Init() () > #5 0x00be9297 in impala::InitCommonRuntime(int, char**, bool, > impala::TestInfo::Mode) () > #6 0x00bb604a in CatalogdMain(int, char**) () > #7 0x00b33f97 in main (){noformat} > It indicates something wrong in initializing the JVM. Here are the env vars: > {noformat} > Environment Variables: > JAVA_HOME=/usr/java/jdk1.8.0_141 > CLASSPATH=/opt/impala/conf:/opt/impala/jar/* > PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin > LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64 > SHELL=/bin/bash{noformat} > We use wildcard "*" in the classpath which seems to be the cause. The issue > was resolved after using explicit paths in the classpath. Here are what I > changed in bin/impala-env.sh: > {code:bash} > #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*" > CLASSPATH=/opt/impala/conf > for jar in /opt/impala/jar/*.jar; do > CLASSPATH="$CLASSPATH:$jar" > done > export CLASSPATH > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12979) Wildcard in CLASSPATH might not work in the RPM package
[ https://issues.apache.org/jira/browse/IMPALA-12979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12979. - Resolution: Fixed > Wildcard in CLASSPATH might not work in the RPM package > --- > > Key: IMPALA-12979 > URL: https://issues.apache.org/jira/browse/IMPALA-12979 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.4.2 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 3.4.2 > > > I tried deploying the RPM package of Impala-3.4.2 (commit 8e9c5a5) on CentOS > 7.9 and found launching catalogd failed by the following error (in > catalogd.INFO): > {noformat} > Wrote minidump to > /var/log/impala-minidumps/catalogd/5e3c8819-0593-4943-555addbc-665470ad.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x02baf14c, pid=156082, tid=0x7fec0dce59c0 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_141-b15) (build > 1.8.0_141-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [catalogd+0x27af14c] > llvm::SCEVAddRecExpr::getNumIterationsInRange(llvm::ConstantRange const&, > llvm::ScalarEvolution&) const+0x73c > # > # Core dump written. Default location: /opt/impala/core or core.156082 > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid156082.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # {noformat} > There are other logs in catalogd.ERROR > {noformat} > Log file created at: 2024/04/08 04:49:28 > Running on machine: ccycloud-1.quanlong.root.comops.site > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > E0408 04:49:28.979386 158187 logging.cc:146] stderr will be logged to this > file. > Wrote minidump to > /var/log/impala-minidumps/catalogd/6c3f550c-be96-4a5b-61171aac-0de15155.dmp > could not find method getRootCauseMessage from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > could not find method getStackTrace from class (null) with signature > (Ljava/lang/Throwable;)Ljava/lang/String; > FileSystem: loadFileSystems failed error: > (unable to get root cause for java.lang.NoClassDefFoundError) > (unable to get stack trace for java.lang.NoClassDefFoundError){noformat} > Resolving the minidump shows me the following stacktrace: > {noformat} > (gdb) bt > #0 0x02baf14c in ?? () > #1 0x02baee24 in getJNIEnv () > #2 0x02bacb71 in hdfsBuilderConnect () > #3 0x012e6ae2 in impala::JniUtil::InitLibhdfs() () > #4 0x012e7897 in impala::JniUtil::Init() () > #5 0x00be9297 in impala::InitCommonRuntime(int, char**, bool, > impala::TestInfo::Mode) () > #6 0x00bb604a in CatalogdMain(int, char**) () > #7 0x00b33f97 in main (){noformat} > It indicates something wrong in initializing the JVM. Here are the env vars: > {noformat} > Environment Variables: > JAVA_HOME=/usr/java/jdk1.8.0_141 > CLASSPATH=/opt/impala/conf:/opt/impala/jar/* > PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin > LD_LIBRARY_PATH=/opt/impala/lib/:/usr/java/jdk1.8.0_141/jre/lib/amd64/server:/usr/java/jdk1.8.0_141/jre/lib/amd64 > SHELL=/bin/bash{noformat} > We use wildcard "*" in the classpath which seems to be the cause. The issue > was resolved after using explicit paths in the classpath. Here are what I > changed in bin/impala-env.sh: > {code:bash} > #export CLASSPATH="/opt/impala/conf:/opt/impala/jar/*" > CLASSPATH=/opt/impala/conf > for jar in /opt/impala/jar/*.jar; do > CLASSPATH="$CLASSPATH:$jar" > done > export CLASSPATH > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13170) InconsistentMetadataFetchException due to database dropped when showing databases
[ https://issues.apache.org/jira/browse/IMPALA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856633#comment-17856633 ] Quanlong Huang commented on IMPALA-13170: - [~baggio000] The exception from JniFrontend.getCatalogMetrics() should be resolved after IMPALA-8675 (see IMPALA-11409). After IMPALA-8675, local-catalog mode coordinators no longer update the db and table count in getCatalogMetrics() so can avoid hitting this. The current issue happens when SHOW DATABASES want to check visibility for the current user. To be specific, Frontend.getDbs() only invokes db.getOwernerUser() when 'needsAuthChecks' is true: {code:java} public List getDbs(PatternMatcher matcher, User user) throws InternalException { List dbs = getCatalog().getDbs(matcher); boolean needsAuthChecks = authzFactory_.getAuthorizationConfig().isEnabled() && !userHasAccessForWholeServer(user); // Filter out the databases the user does not have permissions on. if (needsAuthChecks) { Iterator iter = dbs.iterator(); List> pendingCheckTasks = Lists.newArrayList(); while (iter.hasNext()) { FeDb db = iter.next(); pendingCheckTasks.add(checkAuthorizationPool_.submit( new CheckAuthorization(db.getName(), null, db.getOwnerUser(), user))); <-- Calls db.getOwernerUser() here } filterUnaccessibleElements(pendingCheckTasks, dbs); } return dbs; }{code} [https://github.com/apache/impala/blob/6632fd00e17867c9f8f40d6905feafa049368a98/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429] In local-catalog mode, db.getOwnerUser() could trigger new catalog RPC to fetch the metadata of the db. If a db exists when coordinator is calling getCatalog().getDbs(matcher), and then being dropped in catalogd before coordinator calling db.getOwnerUser(), the error occurs. The workaround can be retrying the SHOW DATABASES command. > InconsistentMetadataFetchException due to database dropped when showing > databases > - > > Key: IMPALA-13170 > URL: https://issues.apache.org/jira/browse/IMPALA-13170 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Yida Wu >Assignee: Quanlong Huang >Priority: Major > > Using impalad 3.4.0, an InconsistentMetadataFetchException occurs when > running "show databases" in Impala while simultaneously executing "drop > database" to drop the newly created database in Hive. > Step is: > 1, Creates database (Hive) > 2, Creates tables (Hive) > 3, Drops tables (Hive) > 4, Run show databases (Impala) Drop database (Hive) > Logs in Impalad: > {code:java} > I0610 02:18:32.435815 278475 CatalogdMetaProvider.java:1354] 1:2] > Invalidated objects in cache: [list of database names, HMS_METADATA for DB > test_hive] > I0610 02:18:32.436224 278475 jni-util.cc:288] 1:2] > org.apache.impala.catalog.local.InconsistentMetadataFetchException: Fetching > DATABASE failed. Could not find TCatalogObject(type:DATABASE, > catalog_version:0, db:TDatabase(db_name:test_hive)) > > > > at > org.apache.impala.catalog.local.CatalogdMetaProvider.sendRequest(CatalogdMetaProvider.java:424) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.access$100(CatalogdMetaProvider.java:185) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:643) > at > org.apache.impala.catalog.local.CatalogdMetaProvider$2.call(CatalogdMetaProvider.java:638) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadWithCaching(CatalogdMetaProvider.java:521) > at > org.apache.impala.catalog.local.CatalogdMetaProvider.loadDb(CatalogdMetaProvider.java:635) > at org.apache.impala.catalog.local.LocalDb.getMetaStoreDb(LocalDb.java:91) > at org.apache.impala.catalog.local.LocalDb.getOwnerUser(LocalDb.java:294) > at org.apache.impala.service.Frontend.getDbs(Frontend.java:1066) > at org.apache.impala.service.JniFrontend.getDbs(JniFrontend.java:301) > I0610 02:18:32.436257 278475 status.cc:129] 1:2] > InconsistentMetadataFetchException: Fetching DATABASE failed. Could not find > TCatalogObject(type:DATABASE, catalog_version:0, > {code} > Logs in Catalog: > {code:java} > I0610 02:18:16.190133 222885 MetastoreEvents.java:505] EventId: 141467532 > EventType: CREATE_DATABASE Successfully added database test_hive > ... > I0610 02:18:32.276082 222885 MetastoreEvents.java:516] EventId: 141467562 > EventType:
[jira] [Updated] (IMPALA-12051) Propagate analytic tuple predicates of outer-joined InlineView
[ https://issues.apache.org/jira/browse/IMPALA-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12051: Target Version: Impala 4.1.3 > Propagate analytic tuple predicates of outer-joined InlineView > -- > > Key: IMPALA-12051 > URL: https://issues.apache.org/jira/browse/IMPALA-12051 > Project: IMPALA > Issue Type: Bug >Reporter: ZhuMinghui >Assignee: ZhuMinghui >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: image-2023-04-07-11-57-13-571.png, > image-2023-04-07-11-57-59-883.png > > > In some cases, direct pushing down predicates that reference analytic tuple > into inline view leads to incorrect query results. such as sql: > {code:java} > WITH detail_measure AS ( > SELECT > * > FROM > ( > VALUES > ( > 1 AS `isqbiuar`, > 1 AS `bgsfrbun`, > 1 AS `result_type`, > 1 AS `bjuzzevg` > ), > (2, 2, 2, 2) > ) a > ), > order_measure_sql0 AS ( > SELECT > row_number() OVER ( > ORDER BY > row_number_0 DESC NULLS LAST, > isqbiuar ASC NULLS LAST > ) AS `row_number_0`, > `isqbiuar` > FROM > ( > VALUES > (1 AS `row_number_0`, 1 AS `isqbiuar`), > (2, 2) > ) b > ) > SELECT > detail_measure.`isqbiuar` AS `isqbiuar`, > detail_measure.`bgsfrbun` AS `bgsfrbun`, > detail_measure.`result_type` AS `result_type`, > detail_measure.`bjuzzevg` AS `bjuzzevg`, > `row_number_0` AS `row_number_0` > FROM > detail_measure > LEFT JOIN order_measure_sql0 ON order_measure_sql0.isqbiuar = > detail_measure.isqbiuar > WHERE > row_number_0 BETWEEN 1 > AND 1 > ORDER BY > `row_number_0` ASC NULLS LAST, > `bgsfrbun` ASC NULLS LAST{code} > The current query result is: > !image-2023-04-07-11-57-13-571.png! > The correct query result is: > !image-2023-04-07-11-57-59-883.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12051) Propagate analytic tuple predicates of outer-joined InlineView
[ https://issues.apache.org/jira/browse/IMPALA-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12051: Fix Version/s: Impala 4.3.0 > Propagate analytic tuple predicates of outer-joined InlineView > -- > > Key: IMPALA-12051 > URL: https://issues.apache.org/jira/browse/IMPALA-12051 > Project: IMPALA > Issue Type: Bug >Reporter: ZhuMinghui >Assignee: ZhuMinghui >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: image-2023-04-07-11-57-13-571.png, > image-2023-04-07-11-57-59-883.png > > > In some cases, direct pushing down predicates that reference analytic tuple > into inline view leads to incorrect query results. such as sql: > {code:java} > WITH detail_measure AS ( > SELECT > * > FROM > ( > VALUES > ( > 1 AS `isqbiuar`, > 1 AS `bgsfrbun`, > 1 AS `result_type`, > 1 AS `bjuzzevg` > ), > (2, 2, 2, 2) > ) a > ), > order_measure_sql0 AS ( > SELECT > row_number() OVER ( > ORDER BY > row_number_0 DESC NULLS LAST, > isqbiuar ASC NULLS LAST > ) AS `row_number_0`, > `isqbiuar` > FROM > ( > VALUES > (1 AS `row_number_0`, 1 AS `isqbiuar`), > (2, 2) > ) b > ) > SELECT > detail_measure.`isqbiuar` AS `isqbiuar`, > detail_measure.`bgsfrbun` AS `bgsfrbun`, > detail_measure.`result_type` AS `result_type`, > detail_measure.`bjuzzevg` AS `bjuzzevg`, > `row_number_0` AS `row_number_0` > FROM > detail_measure > LEFT JOIN order_measure_sql0 ON order_measure_sql0.isqbiuar = > detail_measure.isqbiuar > WHERE > row_number_0 BETWEEN 1 > AND 1 > ORDER BY > `row_number_0` ASC NULLS LAST, > `bgsfrbun` ASC NULLS LAST{code} > The current query result is: > !image-2023-04-07-11-57-13-571.png! > The correct query result is: > !image-2023-04-07-11-57-59-883.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13150) Possible buffer overflow in StringVal::CopyFrom()
[ https://issues.apache.org/jira/browse/IMPALA-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13150: Fix Version/s: Impala 4.5.0 > Possible buffer overflow in StringVal::CopyFrom() > - > > Key: IMPALA-13150 > URL: https://issues.apache.org/jira/browse/IMPALA-13150 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > Fix For: Impala 4.5.0 > > > In {{{}StringVal::CopyFrom(){}}}, we take the 'len' parameter as a > {{{}size_t{}}}, which is usually a 64-bit unsigned integer. We pass it to the > constructor of {{{}StringVal{}}}, which takes it as an {{{}int{}}}, which is > usually a 32-bit signed integer. The constructor then allocates memory for > the length using the {{int}} value, but back in {{{}CopyFrom(){}}}, we copy > the buffer with the {{size_t}} length. If {{size_t}} is indeed 64 bits and > {{int}} is 32 bits, and the value is truncated, we may copy more bytes that > what we have allocated the destination for. See > https://github.com/apache/impala/blob/ce8078204e5995277f79e226e26fe8b9eaca408b/be/src/udf/udf.cc#L546 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13160) Impala query stuck after query from special partition 'hour=0' and 'hour=00' which hour type is int
[ https://issues.apache.org/jira/browse/IMPALA-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855779#comment-17855779 ] Quanlong Huang commented on IMPALA-13160: - CC [~mylogi...@gmail.com] [~VenuReddy] [~hemanth619] [~ngangam] for more thoughts. > Impala query stuck after query from special partition 'hour=0' and 'hour=00' > which hour type is int > --- > > Key: IMPALA-13160 > URL: https://issues.apache.org/jira/browse/IMPALA-13160 > Project: IMPALA > Issue Type: Bug > Components: Catalog, fe >Affects Versions: Impala 3.4.0, Impala 4.3.0 >Reporter: LiuYuan >Priority: Critical > > 1.When create table as below: > {code:java} > CREATE TABLE hive_partition.two_partition ( > id INT, > name STRING > ) > PARTITIONED BY ( > day INT, > hour INT > ) > WITH SERDEPROPERTIES ('serialization.format'='1') > STORED AS ORC > LOCATION 'hdfs://ly-pfs/hive/hive_partition/two_partition'{code} > 2.Then create dir as below: > > {code:java} > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=0 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=00 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=01 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=02 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=03 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=04 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=05 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=06 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=07 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=08 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=09 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=1 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=10 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=11 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=12 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=13 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=14 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=15 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=16 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=17 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=18 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=19 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=2 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=20 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=21 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=22 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=23 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=3 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=4 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=5 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=6 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=7 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=8 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=9{code} > > 3. Execute Refresh hive_partition.two_partition more times > on Impala 3.4.0, total parititons grow after refresh, partitions grows from > 34 to 74 after refresh three times > > {code:java} > I0617 17:01:36.244355 18605 CatalogServiceCatalog.java:2225] Refreshing table > metadata: hive_partition.two_partition > I0617 17:01:38.033699 18605 HdfsTable.java:995] Reloading metadata for table > definition and all partition(s) of hive_partition.two_partition (REFRESH > issued by root) > I0617 17:01:39.245016 18605 ParallelFileMetadataLoader.java:147] Loading file > and block metadata for 10 paths for table hive_partition.two_partition using > a thread pool of size 10 > I0617 17:01:39.336242 18605 HdfsTable.java:690] Loaded file and block > metadata for hive_partition.two_partition partitions: day=20240613/hour=0, > day=20240613/hour=1, day=20240613/hour=2, and 7 others. Time taken: 91.234ms > I0617 17:01:39.336658 18605
[jira] [Commented] (IMPALA-13160) Impala query stuck after query from special partition 'hour=0' and 'hour=00' which hour type is int
[ https://issues.apache.org/jira/browse/IMPALA-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855778#comment-17855778 ] Quanlong Huang commented on IMPALA-13160: - I can reproduce the issue now. The key is the partitions should be created by Hive. When running SHOW PARTITIONS in Hive, I can see duplicated partitions (e.g. hour=00 duplicates hour=00, hour=01 duplicates hour=1): {noformat} +---+ | partition | +---+ | day=20240613/hour=0 | | day=20240613/hour=00 | | day=20240613/hour=01 | | day=20240613/hour=02 | | day=20240613/hour=03 | | day=20240613/hour=04 | | day=20240613/hour=05 | | day=20240613/hour=06 | | day=20240613/hour=07 | | day=20240613/hour=08 | | day=20240613/hour=09 | | day=20240613/hour=1 | | day=20240613/hour=10 | | day=20240613/hour=11 | | day=20240613/hour=12 | | day=20240613/hour=13 | | day=20240613/hour=14 | | day=20240613/hour=15 | | day=20240613/hour=16 | | day=20240613/hour=17 | | day=20240613/hour=18 | | day=20240613/hour=19 | | day=20240613/hour=2 | | day=20240613/hour=20 | | day=20240613/hour=21 | | day=20240613/hour=22 | | day=20240613/hour=23 | | day=20240613/hour=3 | | day=20240613/hour=4 | | day=20240613/hour=5 | | day=20240613/hour=6 | | day=20240613/hour=7 | | day=20240613/hour=8 | | day=20240613/hour=9 | +---+ 34 rows selected (0.103 seconds){noformat} However, partitions are not referenced correctly in the query. E.g. inserting a row to hour=00 actually inserts to hour=0 {code:sql} hive> insert into hive_partition.two_partition partition(day=20240613, hour=00) select 1, 'name'; {code} The file is created as 'hdfs://localhost:20500/test-warehouse/hive_partition.db/two_partition/day=20240613/hour=0/00_0' which is under the partition dir of hour=0. Using local-catalog mode in Impala can fix the hanging issue. However, query results could be unexpected. This seems to be a gray area of both Hive and Impala. > Impala query stuck after query from special partition 'hour=0' and 'hour=00' > which hour type is int > --- > > Key: IMPALA-13160 > URL: https://issues.apache.org/jira/browse/IMPALA-13160 > Project: IMPALA > Issue Type: Bug > Components: Catalog, fe >Affects Versions: Impala 3.4.0, Impala 4.3.0 >Reporter: LiuYuan >Priority: Critical > > 1.When create table as below: > {code:java} > CREATE TABLE hive_partition.two_partition ( > id INT, > name STRING > ) > PARTITIONED BY ( > day INT, > hour INT > ) > WITH SERDEPROPERTIES ('serialization.format'='1') > STORED AS ORC > LOCATION 'hdfs://ly-pfs/hive/hive_partition/two_partition'{code} > 2.Then create dir as below: > > {code:java} > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=0 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=00 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=01 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=02 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=03 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=04 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=05 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=06 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=07 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=08 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=09 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=1 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=10 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=11 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=12 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=13 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=14 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=15 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=16 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=17 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=18 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=19 >
[jira] [Updated] (IMPALA-13160) Impala query stuck after query from special partition 'hour=0' and 'hour=00' which hour type is int
[ https://issues.apache.org/jira/browse/IMPALA-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13160: Priority: Critical (was: Major) > Impala query stuck after query from special partition 'hour=0' and 'hour=00' > which hour type is int > --- > > Key: IMPALA-13160 > URL: https://issues.apache.org/jira/browse/IMPALA-13160 > Project: IMPALA > Issue Type: Bug > Components: Catalog, fe >Affects Versions: Impala 3.4.0, Impala 4.3.0 >Reporter: LiuYuan >Priority: Critical > > 1.When create table as below: > {code:java} > CREATE TABLE hive_partition.two_partition ( > id INT, > name STRING > ) > PARTITIONED BY ( > day INT, > hour INT > ) > WITH SERDEPROPERTIES ('serialization.format'='1') > STORED AS ORC > LOCATION 'hdfs://ly-pfs/hive/hive_partition/two_partition'{code} > 2.Then create dir as below: > > {code:java} > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=0 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=00 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=01 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=02 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=03 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=04 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=05 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=06 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=07 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=08 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=09 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=1 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=10 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=11 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=12 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=13 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=14 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=15 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=16 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=17 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=18 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=19 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=2 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=20 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=21 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=22 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=23 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=3 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=4 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=5 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=6 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=7 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=8 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=9{code} > > 3. Execute Refresh hive_partition.two_partition more times > on Impala 3.4.0, total parititons grow after refresh, partitions grows from > 34 to 74 after refresh three times > > {code:java} > I0617 17:01:36.244355 18605 CatalogServiceCatalog.java:2225] Refreshing table > metadata: hive_partition.two_partition > I0617 17:01:38.033699 18605 HdfsTable.java:995] Reloading metadata for table > definition and all partition(s) of hive_partition.two_partition (REFRESH > issued by root) > I0617 17:01:39.245016 18605 ParallelFileMetadataLoader.java:147] Loading file > and block metadata for 10 paths for table hive_partition.two_partition using > a thread pool of size 10 > I0617 17:01:39.336242 18605 HdfsTable.java:690] Loaded file and block > metadata for hive_partition.two_partition partitions: day=20240613/hour=0, > day=20240613/hour=1, day=20240613/hour=2, and 7 others. Time taken: 91.234ms > I0617 17:01:39.336658 18605 ParallelFileMetadataLoader.java:147] Refreshing > file and block metadata for 34 paths for
[jira] [Updated] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13161: Component/s: Backend (was: be) > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13161: Affects Version/s: Impala 4.4.0 > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.0.0, Impala 4.4.0 >Reporter: nyq >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13161) impalad crash -- impala::DelimitedTextParser::ParseFieldLocations
[ https://issues.apache.org/jira/browse/IMPALA-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855619#comment-17855619 ] Quanlong Huang commented on IMPALA-13161: - [~nyq] Thanks for reporting this! I can still reproduce the crash in the master branch (commit cce6b349f). {noformat} C [impalad+0x1fc3283] impala::Status impala::DelimitedTextParser::ParseSse(int, long*, char**, char**, impala::FieldLocation*, int*, int*, char**)+0x293 C [impalad+0x1fc3991] impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, char**, impala::FieldLocation*, int*, int*, char**)+0x1c9 C [impalad+0x1c48c45] impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x257 C [impalad+0x1c4b01d] impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x178b C [impalad+0x1c4b76b] impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x457 C [impalad+0x1725d41] impala::HdfsScanner::ProcessSplit()+0xcf C [impalad+0x181b0a4] impala::HdfsScanNode::ProcessSplit(std::vector > const&, impala::MemPool*, impala::io::ScanRange*, long*)+0xc00 C [impalad+0x181be8a] impala::HdfsScanNode::ScannerThread(bool, long)+0x508 C [impalad+0x181c583] impala::ClientRequestState::LogAuditRecord(impala::Status const&)+0x6b3 C [impalad+0x165525a] impala::Thread::SuperviseThread {noformat} > impalad crash -- impala::DelimitedTextParser::ParseFieldLocations > --- > > Key: IMPALA-13161 > URL: https://issues.apache.org/jira/browse/IMPALA-13161 > Project: IMPALA > Issue Type: Bug > Components: be >Affects Versions: Impala 4.0.0 >Reporter: nyq >Priority: Critical > > Impala version: 4.0.0 > Problem: > impalad crash, by operating a text table, which has a 3GB data file that only > contains '\x00' char > Steps: > python -c 'f=open("impala_0_3gb.data.csv", "wb");tmp="\x00"*1024*1024*3; > [f.write(tmp) for i in range(1024)] ;f.close()' > create table impala_0_3gb (id int) > hdfs dfs -put impala_0_3gb.data.csv /user/hive/warehouse/impala_0_3gb/ > refresh impala_0_3gb > select count(1) from impala_0_3gb > Errors: > Wrote minidump to 1dcf110f-5a2e-49a2-be4eb7a5-4709ed19.dmp > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0181861c, pid=956182, tid=0x7fc6b340e700 > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0) > # Java VM: OpenJDK 64-Bit Server VM > # Problematic frame: > # C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /tmp/hs_err_pid956182.log > # > # > C [impalad+0x141861c] > impala::DelimitedTextParser::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**)+0x7cc > C [impalad+0x136fe11] > impala::HdfsTextScanner::ProcessRange(impala::RowBatch*, int*)+0x1a1 > C [impalad+0x137100e] > impala::HdfsTextScanner::FinishScanRange(impala::RowBatch*)+0x3be > C [impalad+0x13721ac] > impala::HdfsTextScanner::GetNextInternal(impala::RowBatch*)+0x12c > C [impalad+0x131cdfc] impala::HdfsScanner::ProcessSplit()+0x19c > C [impalad+0x1443e17] > impala::HdfsScanNode::ProcessSplit(std::vector std::allocator > const&, impala::MemPool*, > impala::io::ScanRange*, long*)+0x7e7 > C [impalad+0x1447001] impala::HdfsScanNode::ScannerThread(bool, long)+0x541 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13160) Impala query stuck after query from special partition 'hour=0' and 'hour=00' which hour type is int
[ https://issues.apache.org/jira/browse/IMPALA-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855590#comment-17855590 ] Quanlong Huang commented on IMPALA-13160: - [~liuyuan43] Thanks for reporting this! Unfortunately, I can't reproduce the issue using your steps. Did you run ALTER TABLE RECOVER PARTITIONS before the REFRESH? If we just run REFRESH after creating the table and hdfs dirs, the table will still have 0 partitions. I can't reproduce the issue even after running ALTER TABLE RECOVER PARTITIONS. Please share the commit hash of your version. A complete version string like this helps: {code:java} impalad version 4.5.0-SNAPSHOT DEBUG (build cce6b349f1103c167e2e9ef49fa181ede301b94f){code} You can find it in the WebUI. > Impala query stuck after query from special partition 'hour=0' and 'hour=00' > which hour type is int > --- > > Key: IMPALA-13160 > URL: https://issues.apache.org/jira/browse/IMPALA-13160 > Project: IMPALA > Issue Type: Bug > Components: Catalog, fe >Affects Versions: Impala 3.4.0, Impala 4.3.0 >Reporter: LiuYuan >Priority: Major > > 1.When create table as below: > {code:java} > CREATE TABLE hive_partition.two_partition ( > id INT, > name STRING > ) > PARTITIONED BY ( > day INT, > hour INT > ) > WITH SERDEPROPERTIES ('serialization.format'='1') > STORED AS ORC > LOCATION 'hdfs://ly-pfs/hive/hive_partition/two_partition'{code} > 2.Then create dir as below: > > {code:java} > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=0 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=00 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=01 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=02 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=03 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=04 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=05 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=06 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=07 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=08 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=09 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=1 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=10 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=11 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=12 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=13 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=14 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=15 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=16 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=17 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=18 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=19 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=2 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=20 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=21 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=22 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=23 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=3 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=4 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=5 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=6 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=7 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=8 > hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=9{code} > > 3. Execute Refresh hive_partition.two_partition more times > on Impala 3.4.0, total parititons grow after refresh, partitions grows from > 34 to 74 after refresh three times > > {code:java} > I0617 17:01:36.244355 18605 CatalogServiceCatalog.java:2225] Refreshing table > metadata: hive_partition.two_partition > I0617 17:01:38.033699 18605 HdfsTable.java:995] Reloading metadata for table > definition and all partition(s) of
[jira] [Updated] (IMPALA-11648) validate-java-pom-versions.sh should skip pom.xml in toolchain
[ https://issues.apache.org/jira/browse/IMPALA-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11648: Fix Version/s: Impala 3.4.2 > validate-java-pom-versions.sh should skip pom.xml in toolchain > -- > > Key: IMPALA-11648 > URL: https://issues.apache.org/jira/browse/IMPALA-11648 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 4.2.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Blocker > Fix For: Impala 3.4.2, Impala 4.2.0, Impala 4.1.1 > > > Building the RC1 tarball of 4.1.1 release failed by > bin/validate-java-pom-versions.sh: > {noformat} > Check for Java pom.xml versions FAILED > Expected 4.1.1-RELEASE > Not found in: > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/accumulo-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/beeline/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/classification/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/cli/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/common/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/contrib/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/druid-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hbase-handler/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/core/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/hcatalog-pig-adapter/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/server-extensions/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/streaming/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/java-client/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hcatalog/webhcat/svr/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/hplsql/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/impala/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/catalogd-unit/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-serde/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf1/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-udf2/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-classloader-util/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/custom-udfs/udf-vectorized-badexample/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hcatalog-unit/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hive-blobstore/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hive-jmh/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hive-minikdc/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hive-unit-hadoop2/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/hive-unit/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/qtest-accumulo/pom.xml > > /root/apache-impala-4.1.1/toolchain/cdp_components-23144489/hive-3.1.3000.7.2.15.0-88/itests/qtest-druid/pom.xml > >
[jira] [Assigned] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
[ https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13077: --- Assignee: (was: Quanlong Huang) > Equality predicate on partition column and uncorrelated subquery doesn't > reduce the cardinality estimate > > > Key: IMPALA-13077 > URL: https://issues.apache.org/jira/browse/IMPALA-13077 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Priority: Critical > > Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. > Consider the following query: > {code:sql} > select xxx from part_tbl > where part_key=(select ... from dim_tbl); > {code} > Its query plan is a JoinNode with two ScanNodes. When estimating the > cardinality of the JoinNode, the planner is not aware that 'part_key' is the > partition column and the cardinality of the JoinNode should not be larger > than the max row count across partitions. > The recent work in IMPALA-12018 (Consider runtime filter for cardinality > reduction) helps in some cases since there are runtime filters on the > partition column. But there are still some cases that we overestimate the > cardinality. For instance, 'ss_sold_date_sk' is the only partition key of > tpcds.store_sales. The following query > {code:sql} > select count(*) from tpcds.store_sales > where ss_sold_date_sk=( > select min(d_date_sk) + 1000 from tpcds.date_dim);{code} > has query plan: > {noformat} > +-+ > | Explain String | > +-+ > | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | > | Per-Host Resource Estimates: Memory=243MB | > | | > | PLAN-ROOT SINK | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:AGGREGATE| > | | output: count(*) | > | | row-size=8B cardinality=1| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| > | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | > | | runtime filters: RF000 <- min(d_date_sk) + 1000 | > | | row-size=4B cardinality=2.88M < Should be max(numRows) across > partitions > | | | > | |--07:EXCHANGE [BROADCAST] | > | | || > | | 06:AGGREGATE [FINALIZE] | > | | | output: min:merge(d_date_sk) | > | | | row-size=4B cardinality=1 | > | | || > | | 05:EXCHANGE [UNPARTITIONED] | > | | || > | | 02:AGGREGATE | > | | | output: min(d_date_sk)| > | | | row-size=4B cardinality=1 | > | | || > | | 01:SCAN HDFS [tpcds.date_dim]| > | | HDFS partitions=1/1 files=1 size=9.84MB | > | | row-size=4B cardinality=73.05K| > | | | > | 00:SCAN HDFS [tpcds.store_sales]| > |HDFS partitions=1824/1824 files=1824 size=346.60MB | > |runtime filters: RF000 -> ss_sold_date_sk| > |row-size=4B cardinality=2.88M| > +-+{noformat} > CC [~boroknagyz], [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements
Quanlong Huang created IMPALA-13154: --- Summary: Some tables are missing in Top-N Tables with Highest Memory Requirements Key: IMPALA-13154 URL: https://issues.apache.org/jira/browse/IMPALA-13154 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with Highest Memory Requirements". However, not all tables are counted there. E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata loading on it. When it's done, the table is not shown in the WebUI. The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 'type' is ThriftObjectType.FULL: [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459] This used to be the place that all code paths using the table will go to. However, we've done bunch of optimizations to not getting the FULL thrift object of the table. We should move the code of updating the list of largest tables somewhere that all table usages can reach, e.g. after loading the metadata of the table, we can update its estimatedMetadataSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements
Quanlong Huang created IMPALA-13154: --- Summary: Some tables are missing in Top-N Tables with Highest Memory Requirements Key: IMPALA-13154 URL: https://issues.apache.org/jira/browse/IMPALA-13154 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with Highest Memory Requirements". However, not all tables are counted there. E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata loading on it. When it's done, the table is not shown in the WebUI. The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 'type' is ThriftObjectType.FULL: [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459] This used to be the place that all code paths using the table will go to. However, we've done bunch of optimizations to not getting the FULL thrift object of the table. We should move the code of updating the list of largest tables somewhere that all table usages can reach, e.g. after loading the metadata of the table, we can update its estimatedMetadataSize. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924 ] Quanlong Huang commented on IMPALA-13152: - Assiging this to [~rizaon] who knows more about this. > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
Quanlong Huang created IMPALA-13152: --- Summary: IllegalStateException in computing processing cost when there are predicates on analytic output columns Key: IMPALA-13152 URL: https://issues.apache.org/jira/browse/IMPALA-13152 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Assignee: Riza Suminto Saw an error in the following query when is on: {code:sql} create table tbl (a int, b int, c int); set COMPUTE_PROCESSING_COST=1; explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! {code} Exception in the logs: {noformat} I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) at org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) at org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
Quanlong Huang created IMPALA-13152: --- Summary: IllegalStateException in computing processing cost when there are predicates on analytic output columns Key: IMPALA-13152 URL: https://issues.apache.org/jira/browse/IMPALA-13152 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Assignee: Riza Suminto Saw an error in the following query when is on: {code:sql} create table tbl (a int, b int, c int); set COMPUTE_PROCESSING_COST=1; explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! {code} Exception in the logs: {noformat} I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) at org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) at org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed
[ https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843 ] Quanlong Huang commented on IMPALA-13093: - It seems adding this to hdfs-site.xml can also fix the issue: {code:xml} fs.obs.file.visibility.enable true {code} I'll check whether OBS returns the real block size. CC [~michaelsmith] [~eyizoha] > Insert into Huawei OBS table failed > --- > > Key: IMPALA-13093 > URL: https://issues.apache.org/jira/browse/IMPALA-13093 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Insert into a table using Huawei OBS (Object Storage Service) as the storage > will failed by the following error: > {noformat} > Query: insert into test_obs1 values (1, 'abc') > ERROR: Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory {noformat} > Looking into the logs: > {noformat} > I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] > Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory > @ 0xfc6d44 impala::Status::Status() > @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() > @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() > @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() > @ 0x1c46569 impala::HdfsTableSink::Send() > @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() > @ 0x14efca3 impala::FragmentInstanceState::Exec() > @ 0x148dc4c impala::QueryState::ExecFInstance() > @ 0x1b3bab9 impala::Thread::SuperviseThread() > @ 0x1b3cdb1 boost::detail::thread_data<>::run() > @ 0x2474a87 thread_proxy > @ 0x7fe5a562dea5 start_thread > @ 0x7fe5a25ddb0d __clone{noformat} > Note that impalad is started with {{--symbolize_stacktrace=true}} so the > stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI
Quanlong Huang created IMPALA-13149: --- Summary: Show JVM info in the WebUI Key: IMPALA-13149 URL: https://issues.apache.org/jira/browse/IMPALA-13149 Project: IMPALA Issue Type: New Feature Reporter: Quanlong Huang It'd be helpful to show the JVM info in the WebUI, e.g. show the output of "java -version": {code:java} openjdk version "1.8.0_412" OpenJDK Runtime Environment (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code} On nodes just have JRE deployed, we'd like to deploy the same version of JDK to perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI
Quanlong Huang created IMPALA-13149: --- Summary: Show JVM info in the WebUI Key: IMPALA-13149 URL: https://issues.apache.org/jira/browse/IMPALA-13149 Project: IMPALA Issue Type: New Feature Reporter: Quanlong Huang It'd be helpful to show the JVM info in the WebUI, e.g. show the output of "java -version": {code:java} openjdk version "1.8.0_412" OpenJDK Runtime Environment (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code} On nodes just have JRE deployed, we'd like to deploy the same version of JDK to perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations
[ https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13148: Attachment: Selection_123.png Selection_122.png > Show the number of in-progress Catalog operations > - > > Key: IMPALA-13148 > URL: https://issues.apache.org/jira/browse/IMPALA-13148 > Project: IMPALA > Issue Type: Improvement >Reporter: Quanlong Huang >Priority: Major > Labels: newbie, ramp-up > Attachments: Selection_122.png, Selection_123.png > > > In the /operations page of catalogd WebUI, the list of In-progress Catalog > Operations are shown. It'd be helpful to also show the number of such > operations. Like in the /queries page of coordinator WebUI, it shows 100 > queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations
Quanlong Huang created IMPALA-13148: --- Summary: Show the number of in-progress Catalog operations Key: IMPALA-13148 URL: https://issues.apache.org/jira/browse/IMPALA-13148 Project: IMPALA Issue Type: Improvement Reporter: Quanlong Huang Attachments: Selection_122.png, Selection_123.png In the /operations page of catalogd WebUI, the list of In-progress Catalog Operations are shown. It'd be helpful to also show the number of such operations. Like in the /queries page of coordinator WebUI, it shows 100 queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations
Quanlong Huang created IMPALA-13148: --- Summary: Show the number of in-progress Catalog operations Key: IMPALA-13148 URL: https://issues.apache.org/jira/browse/IMPALA-13148 Project: IMPALA Issue Type: Improvement Reporter: Quanlong Huang Attachments: Selection_122.png, Selection_123.png In the /operations page of catalogd WebUI, the list of In-progress Catalog Operations are shown. It'd be helpful to also show the number of such operations. Like in the /queries page of coordinator WebUI, it shows 100 queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock
Quanlong Huang created IMPALA-13126: --- Summary: ReloadEvent.isOlderEvent() should hold the table read lock Key: IMPALA-13126 URL: https://issues.apache.org/jira/browse/IMPALA-13126 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Sai Hemanth Gantasala Saw an exception like this: {noformat} E0601 09:11:25.275251 246 MetastoreEventsProcessor.java:990] Unexpected exception received while processing event Java exception follows: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$ValueIterator.next(HashMap.java:1498) at org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616) at org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489) at org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {noformat} For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check whether the corresponding partition is reloaded after the event. This should be done after holding the table read lock. Otherwise, EventProcessor could hit the error above when there are concurrent DDLs/DMLs modifying the partition list. CC [~VenuReddy] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock
Quanlong Huang created IMPALA-13126: --- Summary: ReloadEvent.isOlderEvent() should hold the table read lock Key: IMPALA-13126 URL: https://issues.apache.org/jira/browse/IMPALA-13126 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Sai Hemanth Gantasala Saw an exception like this: {noformat} E0601 09:11:25.275251 246 MetastoreEventsProcessor.java:990] Unexpected exception received while processing event Java exception follows: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$ValueIterator.next(HashMap.java:1498) at org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616) at org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489) at org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {noformat} For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check whether the corresponding partition is reloaded after the event. This should be done after holding the table read lock. Otherwise, EventProcessor could hit the error above when there are concurrent DDLs/DMLs modifying the partition list. CC [~VenuReddy] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13122) Show file stats in table loading logs
Quanlong Huang created IMPALA-13122: --- Summary: Show file stats in table loading logs Key: IMPALA-13122 URL: https://issues.apache.org/jira/browse/IMPALA-13122 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Here is an example for table loading logs on a table: {noformat} I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table definition and all partition(s) of tpcds.store_sales (needed by coordinator) I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. Actual columns: 23 I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. Time taken: 26.699us I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata from the Metastore: tpcds.store_sales I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions for: tpcds.store_sales using partition batch size: 1000 I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 partitions for table tpcds.store_sales I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 partitions for table tpcds.store_sales I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata from the Metastore: tpcds.store_sales I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file and block metadata for 1824 paths for table tpcds.store_sales using a thread pool of size 5 I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816, ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 569.107ms I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: tpcds.store_sales set to: -1 I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: tpcds.store_sales (4026ms){noformat} >From the logs, we know the table has 23 columns and 1824 partitions. Time >spent in loading the table schema and file metadata are also shown. However, it's unknown whether there are small files issue under the partitions. The underlying storage could also be slow (e.g. S3) which results in a long time in loading file metadata. It'd be helpful to add these in the logs: * number of files loaded * min/avg/max of file sizes * total file size * number of files * number of blocks (HDFS only) * number of hosts, disks (HDFS/Ozone only) * Stats of accessTime and lastModifiedTime These can be aggregated in FileMetadataLoader#loadInternal() and logged in ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions(). [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177] [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172] [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13122) Show file stats in table loading logs
Quanlong Huang created IMPALA-13122: --- Summary: Show file stats in table loading logs Key: IMPALA-13122 URL: https://issues.apache.org/jira/browse/IMPALA-13122 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Here is an example for table loading logs on a table: {noformat} I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table definition and all partition(s) of tpcds.store_sales (needed by coordinator) I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. Actual columns: 23 I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. Time taken: 26.699us I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata from the Metastore: tpcds.store_sales I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions for: tpcds.store_sales using partition batch size: 1000 I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 partitions for table tpcds.store_sales I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 partitions for table tpcds.store_sales I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata from the Metastore: tpcds.store_sales I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file and block metadata for 1824 paths for table tpcds.store_sales using a thread pool of size 5 I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816, ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 569.107ms I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: tpcds.store_sales set to: -1 I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: tpcds.store_sales (4026ms){noformat} >From the logs, we know the table has 23 columns and 1824 partitions. Time >spent in loading the table schema and file metadata are also shown. However, it's unknown whether there are small files issue under the partitions. The underlying storage could also be slow (e.g. S3) which results in a long time in loading file metadata. It'd be helpful to add these in the logs: * number of files loaded * min/avg/max of file sizes * total file size * number of files * number of blocks (HDFS only) * number of hosts, disks (HDFS/Ozone only) * Stats of accessTime and lastModifiedTime These can be aggregated in FileMetadataLoader#loadInternal() and logged in ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions(). [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177] [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172] [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
Quanlong Huang created IMPALA-13117: --- Summary: Improve the heap usage during metadata loading and DDL/DML executions Key: IMPALA-13117 URL: https://issues.apache.org/jira/browse/IMPALA-13117 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang The JVM heap size of catalogd is not just used by the metadata cache. The in-progress metadata loading threads and DDL/DML executions also creates temp objects, which introduces spikes in the heap usage. We should improve the heap usage in this part, especially when the metadata loading is slow due to external slowness (e.g. listing files on S3). CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
[ https://issues.apache.org/jira/browse/IMPALA-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13116: --- Assignee: Quanlong Huang > In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if > the table is invalidated > --- > > Key: IMPALA-13116 > URL: https://issues.apache.org/jira/browse/IMPALA-13116 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > A table can be invalidated when there are DDL/DML/REFRESHs running in flight: > * User can explictly trigger an INVALIDATE METADATA command > * The table could be invalidated by CatalogdTableInvalidator when > invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned > on > Note that invalidating a table doesn't require holding the lock of the > HdfsTable object so it can finish even if there are on-going updates on the > table. > The updated HdfsTable object won't be added to the metadata cache since it > has been replaced with an IncompleteTable object. It's only used in the > DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal > representation which is mostly the table name and catalog version. We don't > need the updates on the HdfsTable object to be finished. Thus, we can > consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
Quanlong Huang created IMPALA-13117: --- Summary: Improve the heap usage during metadata loading and DDL/DML executions Key: IMPALA-13117 URL: https://issues.apache.org/jira/browse/IMPALA-13117 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang The JVM heap size of catalogd is not just used by the metadata cache. The in-progress metadata loading threads and DDL/DML executions also creates temp objects, which introduces spikes in the heap usage. We should improve the heap usage in this part, especially when the metadata loading is slow due to external slowness (e.g. listing files on S3). CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
Quanlong Huang created IMPALA-13116: --- Summary: In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated Key: IMPALA-13116 URL: https://issues.apache.org/jira/browse/IMPALA-13116 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang A table can be invalidated when there are DDL/DML/REFRESHs running in flight: * User can explictly trigger an INVALIDATE METADATA command * The table could be invalidated by CatalogdTableInvalidator when invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on Note that invalidating a table doesn't require holding the lock of the HdfsTable object so it can finish even if there are on-going updates on the table. The updated HdfsTable object won't be added to the metadata cache since it has been replaced with an IncompleteTable object. It's only used in the DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal representation which is mostly the table name and catalog version. We don't need the updates on the HdfsTable object to be finished. Thus, we can consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
Quanlong Huang created IMPALA-13116: --- Summary: In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated Key: IMPALA-13116 URL: https://issues.apache.org/jira/browse/IMPALA-13116 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang A table can be invalidated when there are DDL/DML/REFRESHs running in flight: * User can explictly trigger an INVALIDATE METADATA command * The table could be invalidated by CatalogdTableInvalidator when invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on Note that invalidating a table doesn't require holding the lock of the HdfsTable object so it can finish even if there are on-going updates on the table. The updated HdfsTable object won't be added to the metadata cache since it has been replaced with an IncompleteTable object. It's only used in the DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal representation which is mostly the table name and catalog version. We don't need the updates on the HdfsTable object to be finished. Thus, we can consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients
Quanlong Huang created IMPALA-13115: --- Summary: Always add the query id in the error message to clients Key: IMPALA-13115 URL: https://issues.apache.org/jira/browse/IMPALA-13115 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang We have some errors like "Failed due to unreachable impalad(s)". We should improve them to mention the query id, e.g. "Query ${query_id} failed due to unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in the /queries page. Coordinator logs are also flushed out quickly. It's hard to find the query id there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients
Quanlong Huang created IMPALA-13115: --- Summary: Always add the query id in the error message to clients Key: IMPALA-13115 URL: https://issues.apache.org/jira/browse/IMPALA-13115 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang We have some errors like "Failed due to unreachable impalad(s)". We should improve them to mention the query id, e.g. "Query ${query_id} failed due to unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in the /queries page. Coordinator logs are also flushed out quickly. It's hard to find the query id there. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IMPALA-12834) Add query load information to the query profile
[ https://issues.apache.org/jira/browse/IMPALA-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-12834: --- Assignee: YifanZhang > Add query load information to the query profile > --- > > Key: IMPALA-12834 > URL: https://issues.apache.org/jira/browse/IMPALA-12834 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Reporter: YifanZhang >Assignee: YifanZhang >Priority: Minor > Fix For: Impala 4.4.0 > > > Add query load information to the query profile to track if the performance > regression is related to the insufficient resources of the node, and also > recommend if the current pool configurations or host configurations are > optimal. > The load information should include: > * Number of running queries of the executor group on which the query is > scheduled > * Number of running fragment instances of the hosts on which the query is > scheduled > * Used/Reserved memory of the hosts on which the query is scheduled > * Some other useful metrics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12182) Add CPU utilization time series graph for RuntimeProfile's sampled values
[ https://issues.apache.org/jira/browse/IMPALA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12182: Fix Version/s: Impala 4.3.0 > Add CPU utilization time series graph for RuntimeProfile's sampled values > - > > Key: IMPALA-12182 > URL: https://issues.apache.org/jira/browse/IMPALA-12182 > Project: IMPALA > Issue Type: New Feature >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: 23-07-10_T15_33_44.png, 23-07-10_T15_36_26.png, > 23-07-10_T15_39_01.png, 23-07-10_T15_39_31.png, 23-07-10_T15_40_42.png, > 23-07-10_T15_40_50.png, 23-07-10_T15_40_55.png, cpu_utilization.png, > cpu_utilization_test-1.png, cpu_utilization_test-2.png, query_timeline.mkv, > simplescreenrecorder-2023-07-10_21.10.58.mkv, > simplescreenrecorder-2023-07-10_22.10.18.mkv, three_nodes.png, > three_nodes_zoomed_out.png, timeseries_cpu_utilization_line_plot.mkv, > two_nodes.png > > > The RuntimeProfile contains samples of CPU utilization metrics for user, sys > and iowait clamped to 64 values (retrieved from the ChunkedTimeSeriesCounter, > but sampled similar to SamplingTimeSeriesCounter). > It would be helpful to see the recent aggregate CPU node utilization samples > for each of the different nodes. > These are sampled every `periodic_counter_update_period_ms`. > AggregatedRuntimeProfile used in the Thrift profile contains the complete > series of values from the ChunkedTimeSeriesCounter samples. But, as this > representation is difficult to provide in the JSON, they have been > downsampled to 64 values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12364: Fix Version/s: Impala 4.4.0 > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11915) Support timeline and graphical plan exports in the webUI
[ https://issues.apache.org/jira/browse/IMPALA-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11915: Fix Version/s: Impala 4.3.0 > Support timeline and graphical plan exports in the webUI > > > Key: IMPALA-11915 > URL: https://issues.apache.org/jira/browse/IMPALA-11915 > Project: IMPALA > Issue Type: New Feature >Reporter: Quanlong Huang >Assignee: Surya Hebbar >Priority: Major > Labels: supportability > Fix For: Impala 4.3.0 > > Attachments: export_button.png, export_modal.png, > export_plan_example_70b4ecc5f6aec963e_85221a3b_plan.html, > export_timeline_example_0b4ecc5f6aec963e_85221a3b_timeline.svg, > exported_plan.png, exported_timeline.png, plan_download.png, > plan_download_button.png, plan_export.png, plan_export_modal.png, > plan_export_text_selection.png, svg_wrapped_export.html, text_selection.png, > timeline_download-1.png, timeline_download.png, timeline_download_button.png, > timeline_export.png, timeline_export_modal.png, > timeline_export_text_selection-1.png, timeline_export_text_selection.png > > > The graphical plan in the web UI is useful. It'd be nice to provide a button > to download the svg picture. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12178) Refined alignment of timeticks in the webUI timeline
[ https://issues.apache.org/jira/browse/IMPALA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12178: Fix Version/s: Impala 4.3.0 > Refined alignment of timeticks in the webUI timeline > > > Key: IMPALA-12178 > URL: https://issues.apache.org/jira/browse/IMPALA-12178 > Project: IMPALA > Issue Type: Improvement >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > Fix For: Impala 4.3.0 > > Attachments: overflowed_timetick_label.png, timetick_label_fixed.png > > > The timeticks on the query timeline page in the WebUI were partially being > hidden due to the overflow for long timestamps after SVG rendering. > It would be better if the entire timtick label is displayed appropriately. > !overflowed_timetick_label.png|width=808,height=259! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13102. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13102. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Commented] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848508#comment-17848508 ] Quanlong Huang commented on IMPALA-12190: - Column masking and row filtering policies will also be messed up by RENAME. I think tag based policy will also be messed up if data lineages are not updated accordingly. +1 for a new Ranger API that returns all policies matching a given table (and optionally for a given user). We also need this to improve IMPALA-11501 to avoid loading the table schema from HMS. Currently, to check whether a user has a corresponding column masking policy on a table, we have to load the table to get all the column names and check whether there are policies on each column, which is inefficient. > Renaming table will cause losing privileges for non-admin users > --- > > Key: IMPALA-12190 > URL: https://issues.apache.org/jira/browse/IMPALA-12190 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Gabor Kaszab >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: alter-table, authorization, ranger > > Let's say user 'a' gets some privileges on table 't'. When this table gets > renamed (even by user 'a') then user 'a' loses its privileges on that table. > > Repro steps: > # Start impala with Ranger > # start impala-shell as admin (-u admin) > # create table tmp (i int, s string) stored as parquet; > # grant all on table tmp to user ; > # grant all on table tmp to user ; > {code:java} > Query: show grant user on table tmp > +++--+---++-+--+-+-+---+--+-+ > | principal_type | principal_name | database | table | column | uri | > storage_type | storage_uri | udf | privilege | grant_option | create_time | > +++--+---++-+--+-+-+---+--+-+ > | USER | | default | tmp | * | | > | | | all | false | NULL | > +++--+---++-+--+-+-+---+--+-+ > Fetched 1 row(s) in 0.01s {code} > # alter table tmp rename to tmp_1234; > # show grant user on table tmp_1234; > {code:java} > Query: show grant user on table tmp_1234 > Fetched 0 row(s) in 0.17s{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan
[ https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13074: Labels: ramp-up (was: ) > WRITE TO HDFS node is omitted from Web UI graphic plan > -- > > Key: IMPALA-13074 > URL: https://issues.apache.org/jira/browse/IMPALA-13074 > Project: IMPALA > Issue Type: Bug >Reporter: Noemi Pap-Takacs >Priority: Major > Labels: ramp-up > > The query plan shows the nodes that take part in the execution, forming a > tree structure. > It can be displayed in the CLI by issuing the EXPLAIN command. When > the actual query is executed, the plan tree can also be viewed in the Impala > Web UI in a graphic form. > However, the explain string and the graphic plan tree does not match: the top > node is missing from the Web UI. > This is especially confusing in case of DDL and DML statements, where the > Data Sink is not displayed. This makes a SELECT * FROM table > indistinguishable from a CREATE TABLE, since both only displays the SCAN node > and omit the WRITE_TO_HDFS and SELECT node. > It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan
[ https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848422#comment-17848422 ] Quanlong Huang commented on IMPALA-13074: - Names like "HDFS WRITER", "KUDU WRITER" will be consistent with the ExecSummary. > WRITE TO HDFS node is omitted from Web UI graphic plan > -- > > Key: IMPALA-13074 > URL: https://issues.apache.org/jira/browse/IMPALA-13074 > Project: IMPALA > Issue Type: Bug >Reporter: Noemi Pap-Takacs >Priority: Major > > The query plan shows the nodes that take part in the execution, forming a > tree structure. > It can be displayed in the CLI by issuing the EXPLAIN command. When > the actual query is executed, the plan tree can also be viewed in the Impala > Web UI in a graphic form. > However, the explain string and the graphic plan tree does not match: the top > node is missing from the Web UI. > This is especially confusing in case of DDL and DML statements, where the > Data Sink is not displayed. This makes a SELECT * FROM table > indistinguishable from a CREATE TABLE, since both only displays the SCAN node > and omit the WRITE_TO_HDFS and SELECT node. > It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848395#comment-17848395 ] Quanlong Huang commented on IMPALA-13102: - Uploaded a patch for review: https://gerrit.cloudera.org/c/21445/ > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported
Quanlong Huang created IMPALA-13103: --- Summary: Corrupt column stats are not reported Key: IMPALA-13103 URL: https://issues.apache.org/jira/browse/IMPALA-13103 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Impala will report corrupt table stats in the query plan. However, corrupt column stats are not reported. For instance, consider the following table: {code:sql} create table t1 (id int, name string); insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code} with the following stats: {code:sql} alter table t1 set tblproperties('numRows'='4'); alter table t1 set column stats name ('numNulls'='0');{code} Note that column "id" has missing stats and column "name" has missing/corrupt stats (ndv=-1, numNulls=0). Grouping by "id" will report the missing stats: {code:sql} explain select id, count(*) from t1 group by id; WARNING: The following tables are missing relevant table and/or column statistics. default.t1{code} However, grouping by "name" doesn't report the missing/corrupt stats: {noformat} explain select name, count(*) from t1 group by name; +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=38.00MB Threads=2 | | Per-Host Resource Estimates: Memory=144MB | | Codegen disabled by planner | | Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB thread-reservation=2 | | PLAN-ROOT SINK | | | output exprs: name, count(*) | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0| | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*) | | | group by: name | | | mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=20B cardinality=4 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [default.t1] | |HDFS partitions=1/1 files=1 size=24B | |stored statistics: | | table: rows=4 size=unavailable | | columns: all | |extrapolated-rows=disabled max-scan-range-rows=4 | |mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 | |tuple-ids=0 row-size=12B cardinality=4 | |in pipelines: 00(GETNEXT) | +---+ {noformat} CC [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported
Quanlong Huang created IMPALA-13103: --- Summary: Corrupt column stats are not reported Key: IMPALA-13103 URL: https://issues.apache.org/jira/browse/IMPALA-13103 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Impala will report corrupt table stats in the query plan. However, corrupt column stats are not reported. For instance, consider the following table: {code:sql} create table t1 (id int, name string); insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code} with the following stats: {code:sql} alter table t1 set tblproperties('numRows'='4'); alter table t1 set column stats name ('numNulls'='0');{code} Note that column "id" has missing stats and column "name" has missing/corrupt stats (ndv=-1, numNulls=0). Grouping by "id" will report the missing stats: {code:sql} explain select id, count(*) from t1 group by id; WARNING: The following tables are missing relevant table and/or column statistics. default.t1{code} However, grouping by "name" doesn't report the missing/corrupt stats: {noformat} explain select name, count(*) from t1 group by name; +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=38.00MB Threads=2 | | Per-Host Resource Estimates: Memory=144MB | | Codegen disabled by planner | | Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB thread-reservation=2 | | PLAN-ROOT SINK | | | output exprs: name, count(*) | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0| | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*) | | | group by: name | | | mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=20B cardinality=4 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [default.t1] | |HDFS partitions=1/1 files=1 size=24B | |stored statistics: | | table: rows=4 size=unavailable | | columns: all | |extrapolated-rows=disabled max-scan-range-rows=4 | |mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 | |tuple-ids=0 row-size=12B cardinality=4 | |in pipelines: 00(GETNEXT) | +---+ {noformat} CC [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847742#comment-17847742 ] Quanlong Huang commented on IMPALA-13102: - In the Impala dev env, I can set the stats directly in postgresql: {code:sql} psql -q -U hiveuser -d ${METASTORE_DB} HMS_home_quanlong_workspace_Impala_cdp=> select "TBL_ID" from "TBLS" where "TBL_NAME" = 'alltypes_bak'; TBL_ID 244931 (1 row) HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "TBL_ID" = 244931; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68767 | default | alltypes_bak | double_col |10 68766 | default | alltypes_bak | id | 7300 68765 | default | alltypes_bak | tinyint_col |10 68764 | default | alltypes_bak | timestamp_col | 7300 68763 | default | alltypes_bak | smallint_col|10 68762 | default | alltypes_bak | date_string_col | 736 68761 | default | alltypes_bak | string_col |10 68760 | default | alltypes_bak | float_col |10 68759 | default | alltypes_bak | bigint_col |10 68758 | default | alltypes_bak | year| 2 68757 | default | alltypes_bak | bool_col| 68756 | default | alltypes_bak | int_col |10 (12 rows) HMS_home_quanlong_workspace_Impala_cdp=> UPDATE "TAB_COL_STATS" SET "NUM_DISTINCTS" = -100 where "CS_ID" = 68766; HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "CS_ID" = 68766; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68766 | default | alltypes_bak | id | -100 (1 row) {code} > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at >
[jira] [Updated] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13102: Description: When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} CC [~VenuReddy] [~hemanth619] [~ngangam] was: When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0,
[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed
Quanlong Huang created IMPALA-13102: --- Summary: Loading tables with illegal stats failed Key: IMPALA-13102 URL: https://issues.apache.org/jira/browse/IMPALA-13102 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed
Quanlong Huang created IMPALA-13102: --- Summary: Loading tables with illegal stats failed Key: IMPALA-13102 URL: https://issues.apache.org/jira/browse/IMPALA-13102 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
Quanlong Huang created IMPALA-13094: --- Summary: Query links in /admission page of admissiond doesn't work Key: IMPALA-13094 URL: https://issues.apache.org/jira/browse/IMPALA-13094 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang Attachments: Selection_115.png, Selection_116.png In the /admission page, there are records for queued queries and running queries. The details links for these queries use the hostname of the admissiond. Instead, they should point to the corresponding coordinators. Clicking on the link will jump to the /query_plan endpoint of the admissiond which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. Attached the screenshots for reference. CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
Quanlong Huang created IMPALA-13094: --- Summary: Query links in /admission page of admissiond doesn't work Key: IMPALA-13094 URL: https://issues.apache.org/jira/browse/IMPALA-13094 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang Attachments: Selection_115.png, Selection_116.png In the /admission page, there are records for queued queries and running queries. The details links for these queries use the hostname of the admissiond. Instead, they should point to the corresponding coordinators. Clicking on the link will jump to the /query_plan endpoint of the admissiond which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. Attached the screenshots for reference. CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
[ https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13094: Attachment: Selection_116.png > Query links in /admission page of admissiond doesn't work > - > > Key: IMPALA-13094 > URL: https://issues.apache.org/jira/browse/IMPALA-13094 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: Selection_115.png, Selection_116.png > > > In the /admission page, there are records for queued queries and running > queries. The details links for these queries use the hostname of the > admissiond. Instead, they should point to the corresponding coordinators. > Clicking on the link will jump to the /query_plan endpoint of the admissiond > which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. > Attached the screenshots for reference. > CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
[ https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13094: Attachment: Selection_115.png > Query links in /admission page of admissiond doesn't work > - > > Key: IMPALA-13094 > URL: https://issues.apache.org/jira/browse/IMPALA-13094 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: Selection_115.png, Selection_116.png > > > In the /admission page, there are records for queued queries and running > queries. The details links for these queries use the hostname of the > admissiond. Instead, they should point to the corresponding coordinators. > Clicking on the link will jump to the /query_plan endpoint of the admissiond > which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. > Attached the screenshots for reference. > CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed
Quanlong Huang created IMPALA-13093: --- Summary: Insert into Huawei OBS table failed Key: IMPALA-13093 URL: https://issues.apache.org/jira/browse/IMPALA-13093 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.3.0 Reporter: Quanlong Huang Assignee: Quanlong Huang Insert into a table using Huawei OBS (Object Storage Service) as the storage will failed by the following error: {noformat} Query: insert into test_obs1 values (1, 'abc') ERROR: Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory {noformat} Looking into the logs: {noformat} I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory @ 0xfc6d44 impala::Status::Status() @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() @ 0x1c46569 impala::HdfsTableSink::Send() @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() @ 0x14efca3 impala::FragmentInstanceState::Exec() @ 0x148dc4c impala::QueryState::ExecFInstance() @ 0x1b3bab9 impala::Thread::SuperviseThread() @ 0x1b3cdb1 boost::detail::thread_data<>::run() @ 0x2474a87 thread_proxy @ 0x7fe5a562dea5 start_thread @ 0x7fe5a25ddb0d __clone{noformat} Note that impalad is started with {{--symbolize_stacktrace=true}} so the stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed
Quanlong Huang created IMPALA-13093: --- Summary: Insert into Huawei OBS table failed Key: IMPALA-13093 URL: https://issues.apache.org/jira/browse/IMPALA-13093 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.3.0 Reporter: Quanlong Huang Assignee: Quanlong Huang Insert into a table using Huawei OBS (Object Storage Service) as the storage will failed by the following error: {noformat} Query: insert into test_obs1 values (1, 'abc') ERROR: Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory {noformat} Looking into the logs: {noformat} I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory @ 0xfc6d44 impala::Status::Status() @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() @ 0x1c46569 impala::HdfsTableSink::Send() @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() @ 0x14efca3 impala::FragmentInstanceState::Exec() @ 0x148dc4c impala::QueryState::ExecFInstance() @ 0x1b3bab9 impala::Thread::SuperviseThread() @ 0x1b3cdb1 boost::detail::thread_data<>::run() @ 0x2474a87 thread_proxy @ 0x7fe5a562dea5 start_thread @ 0x7fe5a25ddb0d __clone{noformat} Note that impalad is started with {{--symbolize_stacktrace=true}} so the stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
[ https://issues.apache.org/jira/browse/IMPALA-13086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13086: Attachment: plan.txt > Cardinality estimate of AggregationNode should consider predicates on > group-by columns > -- > > Key: IMPALA-13086 > URL: https://issues.apache.org/jira/browse/IMPALA-13086 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Priority: Critical > Attachments: plan.txt > > > Consider the following tables: > {code:sql} > CREATE EXTERNAL TABLE t1( > t1_id bigint, > t5_id bigint, > t5_name string, > register_date string > ) stored as textfile; > CREATE EXTERNAL TABLE t2( > t1_id bigint, > t3_id bigint, > pay_time timestamp, > refund_time timestamp, > state_code int > ) stored as textfile; > CREATE EXTERNAL TABLE t3( > t3_id bigint, > t3_name string, > class_id int > ) stored as textfile; > CREATE EXTERNAL TABLE t5( > id bigint, > t5_id bigint, > t5_name string, > branch_id bigint, > branch_name string > ) stored as textfile; > alter table t1 set tblproperties('numRows'='6031170829'); > alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); > alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); > alter table t1 set column stats t5_name > ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); > alter table t1 set column stats register_date > ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); > alter table t2 set tblproperties('numRows'='864341085'); > alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); > alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); > alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); > alter table t2 set column stats refund_time > ('numDVs'='251658','numNulls'='791645118'); > alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); > alter table t3 set tblproperties('numRows'='4452'); > alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); > alter table t3 set column stats t3_name > ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); > alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); > alter table t5 set tblproperties('numRows'='2177245'); > alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); > alter table t5 set column stats t5_name > ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); > alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); > alter table t5 set column stats branch_name > ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); > {code} > Put a data file to each table to make the stats valid > {code:bash} > echo '2024' > data.txt > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 > {code} > REFRESH these tables after adding the data files. > The cardinality of AggregationNodes are overestimated in the following query: > {code:sql} > explain select > register_date, > t4.t5_id, > t5.t5_name, > t5.branch_name, > count(distinct t1_id), > count(distinct case when diff_day=0 then t1_id else null end ), > count(distinct case when diff_day<=3 then t1_id else null end ), > count(distinct case when diff_day<=7 then t1_id else null end ), > count(distinct case when diff_day<=14 then t1_id else null end ), > count(distinct case when diff_day<=30 then t1_id else null end ), > count(distinct case when diff_day<=60 then t1_id else null end ), > count(distinct case when pay_time is not null then t1_id else null end ) > from ( > select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, > datediff(pay_time,register_date) diff_day > from ( > select t1_id,pay_time,t3_id from t2 > where state_code = 0 and pay_time>=trunc(NOW(),'Y') > and cast(pay_time as date) <> cast(refund_time as date) > )t2 > join t3 on t2.t3_id=t3.t3_id > right join t1 on t1.t1_id=t2.t1_id > )t4 > left join t5 on t4.t5_id=t5.t5_id > where register_date='20230515' > group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} > One of the AggregationNode: > {noformat} > 17:AGGREGATE [FINALIZE] > | Class 0 > |output: count:merge(t1_id) > |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name > | Class 1 > |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) > |group
[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
Quanlong Huang created IMPALA-13086: --- Summary: Cardinality estimate of AggregationNode should consider predicates on group-by columns Key: IMPALA-13086 URL: https://issues.apache.org/jira/browse/IMPALA-13086 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Consider the following tables: {code:sql} CREATE EXTERNAL TABLE t1( t1_id bigint, t5_id bigint, t5_name string, register_date string ) stored as textfile; CREATE EXTERNAL TABLE t2( t1_id bigint, t3_id bigint, pay_time timestamp, refund_time timestamp, state_code int ) stored as textfile; CREATE EXTERNAL TABLE t3( t3_id bigint, t3_name string, class_id int ) stored as textfile; CREATE EXTERNAL TABLE t5( id bigint, t5_id bigint, t5_name string, branch_id bigint, branch_name string ) stored as textfile; alter table t1 set tblproperties('numRows'='6031170829'); alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); alter table t1 set column stats t5_name ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); alter table t1 set column stats register_date ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); alter table t2 set tblproperties('numRows'='864341085'); alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); alter table t2 set column stats refund_time ('numDVs'='251658','numNulls'='791645118'); alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); alter table t3 set tblproperties('numRows'='4452'); alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); alter table t3 set column stats t3_name ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); alter table t5 set tblproperties('numRows'='2177245'); alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); alter table t5 set column stats t5_name ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); alter table t5 set column stats branch_name ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); {code} Put a data file to each table to make the stats valid {code:bash} echo '2024' > data.txt hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 {code} REFRESH these tables after adding the data files. The cardinality of AggregationNodes are overestimated in the following query: {code:sql} explain select register_date, t4.t5_id, t5.t5_name, t5.branch_name, count(distinct t1_id), count(distinct case when diff_day=0 then t1_id else null end ), count(distinct case when diff_day<=3 then t1_id else null end ), count(distinct case when diff_day<=7 then t1_id else null end ), count(distinct case when diff_day<=14 then t1_id else null end ), count(distinct case when diff_day<=30 then t1_id else null end ), count(distinct case when diff_day<=60 then t1_id else null end ), count(distinct case when pay_time is not null then t1_id else null end ) from ( select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, datediff(pay_time,register_date) diff_day from ( select t1_id,pay_time,t3_id from t2 where state_code = 0 and pay_time>=trunc(NOW(),'Y') and cast(pay_time as date) <> cast(refund_time as date) )t2 join t3 on t2.t3_id=t3.t3_id right join t1 on t1.t1_id=t2.t1_id )t4 left join t5 on t4.t5_id=t5.t5_id where register_date='20230515' group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} One of the AggregationNode: {noformat} 17:AGGREGATE [FINALIZE] | Class 0 |output: count:merge(t1_id) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 1 |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 2 |output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 3 |output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 4 |output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id,
[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
Quanlong Huang created IMPALA-13086: --- Summary: Cardinality estimate of AggregationNode should consider predicates on group-by columns Key: IMPALA-13086 URL: https://issues.apache.org/jira/browse/IMPALA-13086 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Consider the following tables: {code:sql} CREATE EXTERNAL TABLE t1( t1_id bigint, t5_id bigint, t5_name string, register_date string ) stored as textfile; CREATE EXTERNAL TABLE t2( t1_id bigint, t3_id bigint, pay_time timestamp, refund_time timestamp, state_code int ) stored as textfile; CREATE EXTERNAL TABLE t3( t3_id bigint, t3_name string, class_id int ) stored as textfile; CREATE EXTERNAL TABLE t5( id bigint, t5_id bigint, t5_name string, branch_id bigint, branch_name string ) stored as textfile; alter table t1 set tblproperties('numRows'='6031170829'); alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); alter table t1 set column stats t5_name ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); alter table t1 set column stats register_date ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); alter table t2 set tblproperties('numRows'='864341085'); alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); alter table t2 set column stats refund_time ('numDVs'='251658','numNulls'='791645118'); alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); alter table t3 set tblproperties('numRows'='4452'); alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); alter table t3 set column stats t3_name ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); alter table t5 set tblproperties('numRows'='2177245'); alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); alter table t5 set column stats t5_name ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); alter table t5 set column stats branch_name ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); {code} Put a data file to each table to make the stats valid {code:bash} echo '2024' > data.txt hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 {code} REFRESH these tables after adding the data files. The cardinality of AggregationNodes are overestimated in the following query: {code:sql} explain select register_date, t4.t5_id, t5.t5_name, t5.branch_name, count(distinct t1_id), count(distinct case when diff_day=0 then t1_id else null end ), count(distinct case when diff_day<=3 then t1_id else null end ), count(distinct case when diff_day<=7 then t1_id else null end ), count(distinct case when diff_day<=14 then t1_id else null end ), count(distinct case when diff_day<=30 then t1_id else null end ), count(distinct case when diff_day<=60 then t1_id else null end ), count(distinct case when pay_time is not null then t1_id else null end ) from ( select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, datediff(pay_time,register_date) diff_day from ( select t1_id,pay_time,t3_id from t2 where state_code = 0 and pay_time>=trunc(NOW(),'Y') and cast(pay_time as date) <> cast(refund_time as date) )t2 join t3 on t2.t3_id=t3.t3_id right join t1 on t1.t1_id=t2.t1_id )t4 left join t5 on t4.t5_id=t5.t5_id where register_date='20230515' group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} One of the AggregationNode: {noformat} 17:AGGREGATE [FINALIZE] | Class 0 |output: count:merge(t1_id) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 1 |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 2 |output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 3 |output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 4 |output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id,
[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
[ https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846770#comment-17846770 ] Quanlong Huang commented on IMPALA-13077: - It seems doable: * catalogd always loads the HMS partition objects and 'numRows' is extracted from the parameters: [https://github.com/apache/impala/blob/f87c20800de9f7dc74e47aa9a8c0dc878f4f0840/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1415] * coordinator always loads all partitions when planning such queries. Pulling partition level column stats like NDVs will help more since they are more accurate than the table level column stats. But using the partition level 'numRows' already helps a lot in this case. > Equality predicate on partition column and uncorrelated subquery doesn't > reduce the cardinality estimate > > > Key: IMPALA-13077 > URL: https://issues.apache.org/jira/browse/IMPALA-13077 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. > Consider the following query: > {code:sql} > select xxx from part_tbl > where part_key=(select ... from dim_tbl); > {code} > Its query plan is a JoinNode with two ScanNodes. When estimating the > cardinality of the JoinNode, the planner is not aware that 'part_key' is the > partition column and the cardinality of the JoinNode should not be larger > than the max row count across partitions. > The recent work in IMPALA-12018 (Consider runtime filter for cardinality > reduction) helps in some cases since there are runtime filters on the > partition column. But there are still some cases that we overestimate the > cardinality. For instance, 'ss_sold_date_sk' is the only partition key of > tpcds.store_sales. The following query > {code:sql} > select count(*) from tpcds.store_sales > where ss_sold_date_sk=( > select min(d_date_sk) + 1000 from tpcds.date_dim);{code} > has query plan: > {noformat} > +-+ > | Explain String | > +-+ > | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | > | Per-Host Resource Estimates: Memory=243MB | > | | > | PLAN-ROOT SINK | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:AGGREGATE| > | | output: count(*) | > | | row-size=8B cardinality=1| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| > | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | > | | runtime filters: RF000 <- min(d_date_sk) + 1000 | > | | row-size=4B cardinality=2.88M < Should be max(numRows) across > partitions > | | | > | |--07:EXCHANGE [BROADCAST] | > | | || > | | 06:AGGREGATE [FINALIZE] | > | | | output: min:merge(d_date_sk) | > | | | row-size=4B cardinality=1 | > | | || > | | 05:EXCHANGE [UNPARTITIONED] | > | | || > | | 02:AGGREGATE | > | | | output: min(d_date_sk)| > | | | row-size=4B cardinality=1 | > | | || > | | 01:SCAN HDFS [tpcds.date_dim]| > | | HDFS partitions=1/1 files=1 size=9.84MB | > | | row-size=4B cardinality=73.05K| > | | | > | 00:SCAN HDFS [tpcds.store_sales]| > |HDFS